Terraform on Azure: 10 Best Practices for Enterprise-Scale Deployments
Battle-tested Terraform practices for Azure at scale — state management, modules, policy-as-code, CI/CD, and drift detection.
Most teams get Terraform working on Azure within a day. Keeping it working — reliably, securely, across dozens of subscriptions and multiple teams — is where the real engineering begins. After helping enterprises across DACH adopt Terraform at scale, here are ten practices we consider non-negotiable.
1. Use Remote State with Locking — No Exceptions
Local state files are a ticking time bomb in any team setting. Store state in an Azure Storage Account with blob-level locking enabled.
- Enable soft delete and versioning on the container so you can recover from accidental state corruption.
- Use a dedicated resource group for state storage, locked with
CanNotDeletemanagement locks. - Separate state files per environment and per logical component. A single monolithic state file for your entire landing zone will eventually time out during plans.
Tip: Prefix your state keys with the component name:
networking/terraform.tfstate,identity/terraform.tfstate. This makes storage account browsing intuitive.
2. Structure Modules Around Lifecycle, Not Resource Type
A common mistake is creating one module per Azure resource type (module "vnet", module "nsg", module "route_table"). Instead, group resources that deploy and change together.
- A "spoke network" module that contains the VNet, subnets, NSGs, route tables, and peering makes more operational sense than four separate modules.
- Keep modules small enough to reason about but large enough to capture a meaningful unit of infrastructure.
- Document each module with a
README.md, input/output descriptions, and an exampletfvarsfile.
3. Pin Provider and Module Versions
Leaving provider constraints open (>= without an upper bound) guarantees a surprise breaking change at the worst possible moment.
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.95" # allows 3.95.x patches only
}
}
}- Use pessimistic constraint operators (
~>) to allow patch updates while blocking minor/major bumps. - Run
terraform init -upgradein a dedicated PR weekly to evaluate new provider versions in isolation. - Track the AzureRM changelog — Microsoft's API changes frequently surface as provider-level breaking changes.
4. Implement Policy-as-Code from Day One
Azure Policy and Terraform serve different enforcement points. Use both.
- Terraform-side: Use tools like OPA/Conftest or Checkov to validate plans before apply. Catch issues like public storage accounts or missing tags during CI, not after deployment.
- Azure-side: Deploy Azure Policy assignments via Terraform itself. This ensures policy definitions are versioned, reviewed, and reproducible.
- Start with audit mode, review findings, then switch to deny. Going straight to deny creates friction that erodes trust in the platform.
5. Treat Variable Files as Configuration, Not Code
Your .tf files define what to deploy. Your .tfvars files define where and how much. Keep them separate.
- Store environment-specific
tfvarsin a dedicated directory:environments/dev.tfvars,environments/prod.tfvars. - Never hard-code environment specifics in modules. If a value differs between dev and prod, it should be a variable.
- Use variable validation blocks to catch invalid inputs early:
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}6. Automate Plans in Pull Requests
Every pull request that touches Terraform should trigger a plan and post the output as a PR comment. This is the single highest-value CI integration.
- Use Terraform Cloud, Spacelift, or a custom pipeline (GitHub Actions / Azure DevOps) to run plans.
- Configure the pipeline to fail the PR if the plan contains errors or policy violations.
- Require at least one approval from an infrastructure reviewer before
terraform applyruns.
Warning: Never run
terraform applywith auto-approve in production pipelines without a preceding, human-reviewed plan.
7. Manage Secrets Outside of State
Terraform state files contain sensitive values in plaintext. This is a known limitation, not a bug.
- Use Azure Key Vault for secrets and reference them via
datasources or theazurerm_key_vault_secretresource. - Mark sensitive outputs with
sensitive = trueto suppress them from CLI output (though they still appear in state). - Encrypt your state storage at rest (Azure Storage does this by default) and restrict access with Azure RBAC — not storage account keys.
8. Implement Drift Detection
Configuration drift — when Azure resources diverge from what Terraform expects — is inevitable in enterprise environments. Portal changes, support tickets, automated remediation scripts: they all cause drift.
- Schedule nightly
terraform planruns that report drift without applying changes. - Pipe drift reports to a Slack channel or Azure DevOps dashboard.
- Categorise drift: some is harmless (auto-generated tags), some is critical (changed firewall rules). Build a triage process, not just detection.
9. Use Workspaces Sparingly — Prefer Directory Separation
Terraform workspaces are tempting for environment separation but introduce hidden complexity.
- Workspaces share the same backend configuration, making it easy to accidentally target the wrong environment.
- Directory-based separation (one root module per environment) is more explicit, easier to audit, and simpler to reason about in CI/CD.
- If you do use workspaces, at minimum enforce workspace selection in your pipeline and add guardrails that prevent workspace confusion.
10. Design for Team Boundaries, Not Technical Boundaries
The most impactful architectural decision is how you split your Terraform codebase across teams.
- Align state boundaries to team ownership. If the networking team owns VNets and the application team owns app services, they should have separate state files and separate pipelines.
- Use data sources and remote state outputs for cross-team references rather than hard-coded resource IDs.
- Establish a module registry (even if it is just a Git repo with semantic versioning) so teams consume shared modules without copy-pasting.
Bringing It All Together
These practices are not theoretical — they are the patterns we deploy in client landing zones across Azure. The common thread is reducing blast radius: smaller state files, scoped permissions, automated guardrails, and clear ownership boundaries.
If your team is scaling Terraform beyond a handful of subscriptions and finding it increasingly brittle, the issue is rarely Terraform itself. It is almost always the operational practices around it.
Related Resources
- Infrastructure as Code Strategy — The strategic foundation for your Terraform practice.
- GitOps for Kubernetes with Flux — Extend IaC principles to your Kubernetes deployments.
- DevSecOps Pipeline Design — Embed security scanning into your Terraform CI/CD pipelines.
Need help designing your Azure Terraform architecture? Get in touch with our team — we have done this across industries, from financial services to automotive.