Skip to main content
All posts
DevSecOps5 min read

Terraform on Azure: 10 Best Practices for Enterprise-Scale Deployments

Battle-tested Terraform practices for Azure at scale — state management, modules, policy-as-code, CI/CD, and drift detection.

Most teams get Terraform working on Azure within a day. Keeping it working — reliably, securely, across dozens of subscriptions and multiple teams — is where the real engineering begins. After helping enterprises across DACH adopt Terraform at scale, here are ten practices we consider non-negotiable.

1. Use Remote State with Locking — No Exceptions

Local state files are a ticking time bomb in any team setting. Store state in an Azure Storage Account with blob-level locking enabled.

  • Enable soft delete and versioning on the container so you can recover from accidental state corruption.
  • Use a dedicated resource group for state storage, locked with CanNotDelete management locks.
  • Separate state files per environment and per logical component. A single monolithic state file for your entire landing zone will eventually time out during plans.

Tip: Prefix your state keys with the component name: networking/terraform.tfstate, identity/terraform.tfstate. This makes storage account browsing intuitive.

2. Structure Modules Around Lifecycle, Not Resource Type

A common mistake is creating one module per Azure resource type (module "vnet", module "nsg", module "route_table"). Instead, group resources that deploy and change together.

  • A "spoke network" module that contains the VNet, subnets, NSGs, route tables, and peering makes more operational sense than four separate modules.
  • Keep modules small enough to reason about but large enough to capture a meaningful unit of infrastructure.
  • Document each module with a README.md, input/output descriptions, and an example tfvars file.

3. Pin Provider and Module Versions

Leaving provider constraints open (>= without an upper bound) guarantees a surprise breaking change at the worst possible moment.

HCL
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.95"  # allows 3.95.x patches only
    }
  }
}
  • Use pessimistic constraint operators (~>) to allow patch updates while blocking minor/major bumps.
  • Run terraform init -upgrade in a dedicated PR weekly to evaluate new provider versions in isolation.
  • Track the AzureRM changelog — Microsoft's API changes frequently surface as provider-level breaking changes.

4. Implement Policy-as-Code from Day One

Azure Policy and Terraform serve different enforcement points. Use both.

  • Terraform-side: Use tools like OPA/Conftest or Checkov to validate plans before apply. Catch issues like public storage accounts or missing tags during CI, not after deployment.
  • Azure-side: Deploy Azure Policy assignments via Terraform itself. This ensures policy definitions are versioned, reviewed, and reproducible.
  • Start with audit mode, review findings, then switch to deny. Going straight to deny creates friction that erodes trust in the platform.

5. Treat Variable Files as Configuration, Not Code

Your .tf files define what to deploy. Your .tfvars files define where and how much. Keep them separate.

  • Store environment-specific tfvars in a dedicated directory: environments/dev.tfvars, environments/prod.tfvars.
  • Never hard-code environment specifics in modules. If a value differs between dev and prod, it should be a variable.
  • Use variable validation blocks to catch invalid inputs early:
HCL
variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

6. Automate Plans in Pull Requests

Every pull request that touches Terraform should trigger a plan and post the output as a PR comment. This is the single highest-value CI integration.

  • Use Terraform Cloud, Spacelift, or a custom pipeline (GitHub Actions / Azure DevOps) to run plans.
  • Configure the pipeline to fail the PR if the plan contains errors or policy violations.
  • Require at least one approval from an infrastructure reviewer before terraform apply runs.

Warning: Never run terraform apply with auto-approve in production pipelines without a preceding, human-reviewed plan.

7. Manage Secrets Outside of State

Terraform state files contain sensitive values in plaintext. This is a known limitation, not a bug.

  • Use Azure Key Vault for secrets and reference them via data sources or the azurerm_key_vault_secret resource.
  • Mark sensitive outputs with sensitive = true to suppress them from CLI output (though they still appear in state).
  • Encrypt your state storage at rest (Azure Storage does this by default) and restrict access with Azure RBAC — not storage account keys.

8. Implement Drift Detection

Configuration drift — when Azure resources diverge from what Terraform expects — is inevitable in enterprise environments. Portal changes, support tickets, automated remediation scripts: they all cause drift.

  • Schedule nightly terraform plan runs that report drift without applying changes.
  • Pipe drift reports to a Slack channel or Azure DevOps dashboard.
  • Categorise drift: some is harmless (auto-generated tags), some is critical (changed firewall rules). Build a triage process, not just detection.

9. Use Workspaces Sparingly — Prefer Directory Separation

Terraform workspaces are tempting for environment separation but introduce hidden complexity.

  • Workspaces share the same backend configuration, making it easy to accidentally target the wrong environment.
  • Directory-based separation (one root module per environment) is more explicit, easier to audit, and simpler to reason about in CI/CD.
  • If you do use workspaces, at minimum enforce workspace selection in your pipeline and add guardrails that prevent workspace confusion.

10. Design for Team Boundaries, Not Technical Boundaries

The most impactful architectural decision is how you split your Terraform codebase across teams.

  • Align state boundaries to team ownership. If the networking team owns VNets and the application team owns app services, they should have separate state files and separate pipelines.
  • Use data sources and remote state outputs for cross-team references rather than hard-coded resource IDs.
  • Establish a module registry (even if it is just a Git repo with semantic versioning) so teams consume shared modules without copy-pasting.

Bringing It All Together

These practices are not theoretical — they are the patterns we deploy in client landing zones across Azure. The common thread is reducing blast radius: smaller state files, scoped permissions, automated guardrails, and clear ownership boundaries.

If your team is scaling Terraform beyond a handful of subscriptions and finding it increasingly brittle, the issue is rarely Terraform itself. It is almost always the operational practices around it.

Related Resources

Need help designing your Azure Terraform architecture? Get in touch with our team — we have done this across industries, from financial services to automotive.

terraform azureinfrastructure as codeenterprise terraformazure landing zoneterraform best practices

Need expert guidance?

Our team specializes in cloud architecture, security, AI platforms, and DevSecOps. Let's discuss how we can help your organization.

Related articles