Building a Zero-Trust AI Platform on Azure Databricks
How we designed a fully private Azure Databricks platform with hub-spoke networking, forced-tunnel firewall, private endpoints, and managed identities — no public IPs, no stored secrets.
Most Databricks deployments start with the default configuration: public endpoints, permissive networking, and service principal secrets stored in environment variables. It works for a proof of concept. It does not work for enterprises handling sensitive data, operating under regulatory obligations, or facing audit scrutiny.
This post walks through the security architecture of our open-source enterprise Databricks platform, explaining how each layer enforces Zero Trust principles.
The Security Problem with Default Databricks
A default Azure Databricks workspace has several security gaps:
- Public endpoints for the workspace UI and API
- Compute nodes with public IPs that can reach the internet directly
- Shared infrastructure with other Databricks customers in the control plane
- Service principal secrets that need to be stored and rotated somewhere
For regulated industries — financial services, healthcare, government, critical infrastructure — any of these is a compliance blocker. For NIS2-scoped organisations, the combination is untenable.
Layer 1: Hub-Spoke Network Topology
The foundation is a hub-spoke network design where shared security services live in the hub and workloads live in isolated spokes.
Hub VNet contains:
- Azure Firewall (forced tunneling subnet)
- Azure Bastion (management access without public RDP/SSH)
- VPN Gateway with Entra ID authentication
- Private DNS Resolver for VPN client name resolution
- 7 Private DNS Zones for Azure services
Spoke VNet contains:
- Databricks public and private subnets (VNet injection)
- Azure Functions integration subnet
- Private endpoint subnet
- NAT Gateway for controlled outbound connectivity
VNet peering connects hub to spoke. User-Defined Routes on the spoke subnets force all egress through the hub firewall. No spoke resource can reach the internet without passing through firewall inspection.
Layer 2: Firewall-Forced Tunneling
Azure Firewall sits at the network boundary with explicit allow rules:
Application rules whitelist FQDN targets:
- Databricks control plane endpoints
- PyPI (for library installation)
- Ubuntu package repositories (for Databricks Runtime)
- GitHub (for CI/CD)
Network rules allow specific IP ranges for Databricks infrastructure services.
Everything else is denied by default. If a compromised node attempts to exfiltrate data to an unknown endpoint, the firewall blocks it and logs the attempt.
Firewall diagnostic logs feed into Log Analytics with scheduled query alerts. A spike in denied connections triggers investigation.
Layer 3: Private Endpoints Everywhere
Every PaaS service is accessed exclusively through private endpoints:
| Service | Private Endpoints | Why |
|---|---|---|
| Databricks | 4 (UI/API, browser auth, DFS, blob) | Eliminates public workspace and DBFS access |
| ADLS Gen2 | 2 (blob, DFS) | Data lake accessible only from VNet |
| Key Vault | 1 | Secrets never traverse public networks |
| Container Registry | 1 | Container images pulled over private network |
Each private endpoint has a corresponding Private DNS Zone linked to both hub and spoke VNets. When a service references *.blob.core.windows.net, DNS resolves to a private IP within the VNet — not a public endpoint.
Public network access is disabled on every service. Even with the correct credentials, you cannot reach these services from the public internet.
Layer 4: Managed Identities — No Secrets
The platform uses three user-assigned managed identities:
- Databricks identity — Used by the Access Connector for Unity Catalog to reach ADLS Gen2
- Functions identity — Used by Azure Functions to interact with Service Bus and Key Vault
- CI/CD identity — Used by GitHub Actions via OIDC federation
No identity has a client secret. Authentication happens through Azure's internal token service. The CI/CD identity deserves special attention: GitHub Actions exchanges an OIDC token for an Azure access token at runtime — no secrets are stored in GitHub, no credentials to rotate, no risk of secret leakage in logs.
RBAC assignments follow least privilege:
- Databricks identity gets
Storage Blob Data Contributoron the data lake — nothing else - Functions identity gets
Key Vault Secrets UserandService Bus Data Sender/Receiver— nothing else - CI/CD identity gets
Contributorscoped to the resource group — not the subscription
Layer 5: Data Encryption
At rest: ADLS Gen2 uses Customer-Managed Keys stored in Key Vault with infrastructure-level double encryption. This means data is encrypted twice — once by the storage service and once at the infrastructure layer.
In transit: All connections use TLS. Private endpoints ensure traffic never leaves the Azure backbone. HSTS headers enforce HTTPS on all web interfaces.
Key management: Key Vault has soft-delete (90 days) and purge protection enabled. Even an administrator cannot permanently delete an encryption key without waiting through the retention period.
Layer 6: Azure Policy Enforcement
The landing zone module deploys Azure Policy assignments that act as guardrails:
- Require tags on all resources (environment, project, owner, cost-centre)
- Restrict locations to approved Azure regions
- Enforce HTTPS on storage accounts
- Require diagnostics to be sent to Log Analytics
These policies prevent configuration drift. If someone creates a storage account without HTTPS, the deployment fails. If a resource is missing required tags, it is flagged for remediation.
Layer 7: Monitoring and Alerting
Security without visibility is security theatre. The platform deploys:
- Log Analytics workspace aggregating diagnostics from all services
- Scheduled query alerts for anomalous patterns (firewall denies, authentication failures, unexpected storage access)
- Configurable budget alerts (default $1,000, adjustable per environment) — because runaway compute is a security incident too
- Application Insights for Azure Functions telemetry
Firewall deny logs, Key Vault access logs, storage authentication failure logs, and Databricks job outcome logs all feed into a single pane of glass.
Mapping to Compliance Frameworks
This architecture directly supports:
- NIS2 — Network segmentation, incident detection (firewall + monitoring), access control (managed identities + RBAC), supply chain security (private endpoints)
- ISO 27001 — Asset management (tags + policy), access control (RBAC), cryptography (CMK), operations security (monitoring), communications security (private endpoints)
- EU AI Act — Logging and traceability (Log Analytics + MLflow), cybersecurity (the entire stack), robustness (drift detection + auto-mitigation)
The WAF alignment documentation in the repository maps every resource to specific compliance controls.
Getting Started
The full implementation is available at github.com/MedGhassen/databricks-enterprise-ai-platform under the MIT license. Start with the modules/networking and modules/firewall directories to understand the network security foundation.
Related Resources
- Zero Trust Architecture: From Buzzword to Production in 6 Months — The strategic framework behind this implementation.
- NIS2 Compliance: A Technical Roadmap for IT Leaders — How this architecture maps to NIS2 requirements.
- We Open-Sourced Our Enterprise Databricks AI Platform Blueprint — Overview of the full platform beyond security.
Questions about implementing Zero Trust for your Databricks environment? Contact us — this is what we do.