Why does Databricks need VNet injection for Zero Trust?

Without VNet injection, Databricks compute nodes communicate over the public internet to the control plane. VNet injection places compute in your own subnets, allowing you to route all traffic through your firewall, enforce NSGs, and eliminate public IP addresses on worker nodes via Secure Cluster Connectivity.

How many private endpoints does the platform use?

The platform deploys private endpoints for Databricks (4 endpoints: combined UI/API, browser authentication, DFS, and blob for DBFS), ADLS Gen2 (blob and DFS), Key Vault, and Container Registry — each with a corresponding Private DNS Zone for name resolution.

Can I use this architecture with Databricks on AWS?

The Zero Trust principles apply universally, but the implementation is Azure-specific. On AWS, equivalent concepts include VPC with PrivateLink, AWS Network Firewall, and IAM roles instead of managed identities. The architectural patterns transfer; the Terraform code does not.

Does the forced-tunnel firewall add latency to Databricks jobs?

Azure Firewall adds approximately 1-2ms of latency per connection. For Databricks workloads, the overhead is negligible compared to job execution time. The security benefit of inspecting and controlling all egress traffic far outweighs the minimal performance impact.

Building a Zero-Trust AI Platform on Azure Databricks

Most Databricks deployments start with the default configuration: public endpoints, permissive networking, and service principal secrets stored in environment variables. It works for a proof of concept. It does not work for enterprises handling sensitive data, operating under regulatory obligations, or facing audit scrutiny.

This post walks through the security architecture of our open-source enterprise Databricks platform, explaining how each layer enforces Zero Trust principles.

The Security Problem with Default Databricks

A default Azure Databricks workspace has several security gaps:

Public endpoints for the workspace UI and API
Compute nodes with public IPs that can reach the internet directly
Shared infrastructure with other Databricks customers in the control plane
Service principal secrets that need to be stored and rotated somewhere

For regulated industries — financial services, healthcare, government, critical infrastructure — any of these is a compliance blocker. For NIS2-scoped organisations, the combination is untenable.

Layer 1: Hub-Spoke Network Topology

The foundation is a hub-spoke network design where shared security services live in the hub and workloads live in isolated spokes.

Hub VNet contains:

Azure Firewall (forced tunneling subnet)
Azure Bastion (management access without public RDP/SSH)
VPN Gateway with Entra ID authentication
Private DNS Resolver for VPN client name resolution
7 Private DNS Zones for Azure services

Spoke VNet contains:

Databricks public and private subnets (VNet injection)
Azure Functions integration subnet
Private endpoint subnet
NAT Gateway for controlled outbound connectivity

VNet peering connects hub to spoke. User-Defined Routes on the spoke subnets force all egress through the hub firewall. No spoke resource can reach the internet without passing through firewall inspection.

Layer 2: Firewall-Forced Tunneling

Azure Firewall sits at the network boundary with explicit allow rules:

Application rules whitelist FQDN targets:

Databricks control plane endpoints
PyPI (for library installation)
Ubuntu package repositories (for Databricks Runtime)
GitHub (for CI/CD)

Network rules allow specific IP ranges for Databricks infrastructure services.

Everything else is denied by default. If a compromised node attempts to exfiltrate data to an unknown endpoint, the firewall blocks it and logs the attempt.

Firewall diagnostic logs feed into Log Analytics with scheduled query alerts. A spike in denied connections triggers investigation.

Layer 3: Private Endpoints Everywhere

Every PaaS service is accessed exclusively through private endpoints:

Service	Private Endpoints	Why
Databricks	4 (UI/API, browser auth, DFS, blob)	Eliminates public workspace and DBFS access
ADLS Gen2	2 (blob, DFS)	Data lake accessible only from VNet
Key Vault	1	Secrets never traverse public networks
Container Registry	1	Container images pulled over private network

Each private endpoint has a corresponding Private DNS Zone linked to both hub and spoke VNets. When a service references *.blob.core.windows.net, DNS resolves to a private IP within the VNet — not a public endpoint.

Public network access is disabled on every service. Even with the correct credentials, you cannot reach these services from the public internet.

Layer 4: Managed Identities — No Secrets

The platform uses three user-assigned managed identities:

Databricks identity — Used by the Access Connector for Unity Catalog to reach ADLS Gen2
Functions identity — Used by Azure Functions to interact with Service Bus and Key Vault
CI/CD identity — Used by GitHub Actions via OIDC federation

No identity has a client secret. Authentication happens through Azure's internal token service. The CI/CD identity deserves special attention: GitHub Actions exchanges an OIDC token for an Azure access token at runtime — no secrets are stored in GitHub, no credentials to rotate, no risk of secret leakage in logs.

RBAC assignments follow least privilege:

Databricks identity gets Storage Blob Data Contributor on the data lake — nothing else
Functions identity gets Key Vault Secrets User and Service Bus Data Sender/Receiver — nothing else
CI/CD identity gets Contributor scoped to the resource group — not the subscription

Layer 5: Data Encryption

At rest: ADLS Gen2 uses Customer-Managed Keys stored in Key Vault with infrastructure-level double encryption. This means data is encrypted twice — once by the storage service and once at the infrastructure layer.

In transit: All connections use TLS. Private endpoints ensure traffic never leaves the Azure backbone. HSTS headers enforce HTTPS on all web interfaces.

Key management: Key Vault has soft-delete (90 days) and purge protection enabled. Even an administrator cannot permanently delete an encryption key without waiting through the retention period.

Layer 6: Azure Policy Enforcement

The landing zone module deploys Azure Policy assignments that act as guardrails:

Require tags on all resources (environment, project, owner, cost-centre)
Restrict locations to approved Azure regions
Enforce HTTPS on storage accounts
Require diagnostics to be sent to Log Analytics

These policies prevent configuration drift. If someone creates a storage account without HTTPS, the deployment fails. If a resource is missing required tags, it is flagged for remediation.

Layer 7: Monitoring and Alerting

Security without visibility is security theatre. The platform deploys:

Log Analytics workspace aggregating diagnostics from all services
Scheduled query alerts for anomalous patterns (firewall denies, authentication failures, unexpected storage access)
Configurable budget alerts (default $1,000, adjustable per environment) — because runaway compute is a security incident too
Application Insights for Azure Functions telemetry

Firewall deny logs, Key Vault access logs, storage authentication failure logs, and Databricks job outcome logs all feed into a single pane of glass.

Mapping to Compliance Frameworks

This architecture directly supports:

NIS2 — Network segmentation, incident detection (firewall + monitoring), access control (managed identities + RBAC), supply chain security (private endpoints)
ISO 27001 — Asset management (tags + policy), access control (RBAC), cryptography (CMK), operations security (monitoring), communications security (private endpoints)
EU AI Act — Logging and traceability (Log Analytics + MLflow), cybersecurity (the entire stack), robustness (drift detection + auto-mitigation)

The WAF alignment documentation in the repository maps every resource to specific compliance controls.

Getting Started

The full implementation is available at github.com/MedGhassen/databricks-enterprise-ai-platform under the MIT license. Start with the modules/networking and modules/firewall directories to understand the network security foundation.

Related Resources

Zero Trust Architecture: From Buzzword to Production in 6 Months — The strategic framework behind this implementation.
NIS2 Compliance: A Technical Roadmap for IT Leaders — How this architecture maps to NIS2 requirements.
We Open-Sourced Our Enterprise Databricks AI Platform Blueprint — Overview of the full platform beyond security.

Questions about implementing Zero Trust for your Databricks environment? Contact us — this is what we do.

Building a Zero-Trust AI Platform on Azure Databricks

The Security Problem with Default Databricks

Layer 1: Hub-Spoke Network Topology

Layer 2: Firewall-Forced Tunneling

Layer 3: Private Endpoints Everywhere

Layer 4: Managed Identities — No Secrets

Layer 5: Data Encryption

Layer 6: Azure Policy Enforcement

Layer 7: Monitoring and Alerting

Mapping to Compliance Frameworks

Getting Started

Related Resources

Frequently Asked Questions

Need expert guidance?

Related articles

Post-Quantum Migration with Azure Key Vault

Crypto-Agility Roadmap for the Post-Quantum Era

Cryptographic Discovery: Find Where Crypto Lives