GitOps with Flux on AKS: A Practical Implementation Guide
Step-by-step guide to implementing GitOps with Flux v2 on Azure Kubernetes Service — Helm, Kustomize, SOPS, and multi-cluster.
GitOps sounds simple: store your desired cluster state in Git, and let a controller reconcile reality to match. In practice, getting a production-grade GitOps setup working on AKS requires decisions about repository structure, secret management, multi-cluster promotion, and progressive delivery that the "getting started" tutorials never cover.
This guide walks through a real-world Flux v2 implementation on AKS, based on patterns we deploy for enterprise clients.
Why Flux (and Why Not ArgoCD)
Both Flux and ArgoCD are CNCF-graduated GitOps controllers. We use Flux for AKS deployments for several practical reasons:
- Native Helm and Kustomize support without requiring a UI or additional server components.
- Multi-tenancy via namespaces — Flux's controller-per-namespace model maps cleanly to team boundaries.
- Azure integration — Flux is supported as a first-party AKS extension (
microsoft.flux), which means Microsoft handles upgrades and support. - Pull-based architecture — no need to expose cluster APIs to your CI system.
ArgoCD excels when you need a rich UI for cluster visualization. If that is a priority, ArgoCD is a fine choice. For headless, API-driven GitOps, Flux is more lightweight.
Repository Structure
The single most impactful decision in a GitOps setup is how you structure your Git repositories. We recommend a two-repository pattern:
App Repository (per service)
Contains application source code and a deploy/ directory with raw manifests or Helm chart values.
my-service/
src/
Dockerfile
deploy/
base/
deployment.yaml
service.yaml
kustomization.yaml
overlays/
dev/
kustomization.yaml
patches.yaml
staging/
kustomization.yaml
prod/
kustomization.yamlFleet Repository (one per platform)
Contains the cluster configuration — which applications are deployed where, with what configuration.
fleet/
clusters/
dev-westeurope/
flux-system/
infrastructure/
apps/
prod-westeurope/
flux-system/
infrastructure/
apps/
infrastructure/
sources/
cert-manager/
ingress-nginx/
external-secrets/
apps/
my-service/
base/
kustomization.yaml
helmrelease.yaml
overlays/
dev/
prod/Why two repos? Separation of concerns. Application developers own the app repo and its deploy manifests. The platform team owns the fleet repo and controls what runs on each cluster. This prevents a single commit from accidentally affecting production.
Bootstrapping Flux on AKS
With the AKS Flux extension, bootstrapping is straightforward:
az k8s-configuration flux create \
--resource-group rg-aks-prod \
--cluster-name aks-prod-westeurope \
--name flux-system \
--namespace flux-system \
--scope cluster \
--url https://dev.azure.com/org/project/_git/fleet \
--branch main \
--kustomization name=cluster path=./clusters/prod-westeuropeFor more control, bootstrap directly with the Flux CLI:
flux bootstrap git \
--url=ssh://git@dev.azure.com/v3/org/project/fleet \
--branch=main \
--path=./clusters/prod-westeurope \
--private-key-file=./identityEither way, Flux installs its controllers and begins reconciling the cluster state against the Git repository.
Kustomize Overlays for Environment Promotion
Kustomize overlays are the backbone of environment-specific configuration. The base layer defines the common deployment spec, and overlays patch per environment.
A typical overlay for production might include:
# overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- path: patches.yaml# overlays/prod/patches.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-service
spec:
replicas: 3
template:
spec:
containers:
- name: my-service
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "1"
memory: 1GiKey practice: Never promote by merging branches (e.g., dev branch into prod branch). Promote by updating the overlay values (image tag, replica count) in the fleet repo via a PR. This keeps your Git history linear and auditable.
Helm Releases with Flux
For third-party software and complex applications, Flux's HelmRelease CRD manages Helm chart deployments declaratively:
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: ingress-nginx
namespace: ingress-system
spec:
interval: 30m
chart:
spec:
chart: ingress-nginx
version: "4.9.x"
sourceRef:
kind: HelmRepository
name: ingress-nginx
values:
controller:
replicaCount: 2
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"- Pin chart versions with pessimistic constraints (
4.9.x) to allow patches but prevent breaking changes. - Set the
intervalto control how frequently Flux checks for chart updates. - Use
valuesFromto pull values from ConfigMaps or Secrets for environment-specific overrides.
Secret Management with SOPS
Storing secrets in Git is a requirement for GitOps but a security concern if done naively. Mozilla SOPS with Azure Key Vault solves this elegantly.
Setup
- Create an Azure Key Vault with an RSA key for SOPS encryption.
- Grant the Flux service account (via Workload Identity)
decryptpermissions on the key. - Configure a
.sops.yamlin your fleet repo:
creation_rules:
- path_regex: .*\.enc\.yaml$
azure_keyvault: https://kv-flux-prod.vault.azure.net/keys/sops-key/abc123- Encrypt secrets before committing:
sops --encrypt --in-place secrets.enc.yamlFlux automatically decrypts SOPS-encrypted files during reconciliation using the Kustomize controller's built-in SOPS support.
Important: Use separate SOPS keys per environment. A dev cluster should not be able to decrypt production secrets, even if someone accidentally copies the encrypted file.
Multi-Cluster Management
For multi-cluster setups (dev, staging, production; or multiple regions), the fleet repo's directory structure does the heavy lifting.
Each cluster gets its own directory under clusters/, which defines:
- Infrastructure components common to all clusters (cert-manager, ingress) — referenced from a shared
infrastructure/directory. - Application deployments specific to that cluster's environment — referenced from environment-specific overlays.
# clusters/prod-westeurope/apps/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../../apps/my-service/overlays/prod
- ../../../apps/api-gateway/overlays/prodPromotion flow: A PR that updates the image tag in apps/my-service/overlays/dev triggers deployment to dev. Once validated, a second PR updates overlays/staging, then overlays/prod. Each step is a reviewed, auditable Git commit.
Progressive Delivery with Flagger
For production deployments, combine Flux with Flagger to enable canary releases or blue-green deployments.
Flagger watches a Deployment, creates a canary, gradually shifts traffic, and automatically promotes or rolls back based on metrics:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-service
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
progressDeadlineSeconds: 600
service:
port: 80
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1mThis configuration gradually shifts traffic from 0% to 50% to the canary in 10% increments, rolling back automatically if the success rate drops below 99%.
Operational Practices
Monitoring Reconciliation
- Use
flux get kustomizationsandflux get helmreleasesto check reconciliation status. - Export Flux metrics to Prometheus and create Grafana dashboards for reconciliation latency, failures, and drift events.
- Set up alerts for sustained reconciliation failures — these indicate either a broken manifest in Git or a cluster-side issue.
Handling Drift
Flux continuously reconciles, which means manual kubectl changes will be reverted. This is a feature, not a bug — but communicate it clearly to your teams.
- Use Flux's field manager annotations to exclude specific fields from reconciliation when manual overrides are genuinely needed (e.g., HPA-managed replicas).
- Audit reverted changes in Flux logs to identify teams or processes that are still making manual changes.
Disaster Recovery
- Your fleet repo is your disaster recovery plan. To rebuild a cluster, bootstrap Flux against the same repo path.
- Test this regularly. Spin up a fresh AKS cluster, bootstrap Flux, and verify that all workloads come up. If they do not, your fleet repo has implicit dependencies you need to make explicit.
Key Takeaways
GitOps with Flux on AKS is not just a deployment mechanism — it is an operational model that enforces auditability, reproducibility, and declarative infrastructure. The investment is in the repository structure and promotion workflows, not the tooling itself.
Planning a GitOps implementation on AKS? Contact our team — we design and implement GitOps platforms for enterprise Kubernetes environments.