Skip to main content
All posts
AI & Data6 min read

Modern Data Lakehouse on Azure: Architecture Decision Guide

Compare Microsoft Fabric, Databricks, and Synapse for your data lakehouse — with cost, governance, and architecture trade-offs.

The data lakehouse has replaced the traditional data warehouse vs. data lake debate. By combining the structure and governance of a warehouse with the flexibility and cost profile of a lake, lakehouses offer a pragmatic middle ground. On Azure, three platforms compete for this role — and choosing between them is one of the highest-impact architecture decisions a data leader will make this year.

This guide compares the options with the honesty that vendor documentation rarely provides.

The Core Architecture: Medallion Pattern

Regardless of platform choice, the medallion architecture has become the standard layering approach:

  • Bronze (raw) — Ingested data in its original format. Append-only, immutable. This is your system of record.
  • Silver (cleansed) — Deduplicated, schema-enforced, joined across sources. This is where most data engineering effort goes.
  • Gold (business) — Aggregated, business-aligned datasets optimized for consumption by analysts and ML models.

All three Azure platforms support this pattern. The differences lie in how they implement it, what they charge, and where they add friction.

Platform Comparison

Microsoft Fabric

Microsoft's unified analytics platform, built on OneLake (a single-copy data layer built on ADLS Gen2).

Strengths:

  • Single pane of glass. Data engineering, data science, real-time analytics, and BI in one platform with shared governance
  • OneLake shortcuts. Reference data across workspaces without copying — genuinely useful for reducing data duplication
  • Deep Power BI integration. Direct Lake mode eliminates the need for data imports into Power BI datasets
  • Capacity-based pricing. Predictable costs if your workloads are steady; auto-pause on idle

Weaknesses:

  • Maturity. Fabric is younger than Databricks. Some features (e.g., Git integration for pipelines, fine-grained RBAC) are still evolving
  • Vendor lock-in. OneLake is proprietary. While data is stored as Delta/Parquet, the orchestration and compute layers are deeply Microsoft-specific
  • Limited open-source ecosystem. Fabric notebooks support PySpark but the broader ecosystem integration (MLflow, custom libraries) is more constrained than Databricks

Best for: Organizations heavily invested in the Microsoft 365/Power BI ecosystem that want a consolidated platform and can tolerate some feature gaps.

Databricks on Azure

The original lakehouse platform, now running natively on Azure with tight integration into Azure AD and networking.

Strengths:

  • Technical depth. Unity Catalog for governance, Delta Sharing for cross-organization data exchange, MLflow for ML lifecycle
  • Performance. Photon engine delivers consistently strong query performance. Databricks regularly leads TPC benchmarks.
  • Open standards. Delta Lake, MLflow, and Apache Spark are open source. Your investment in code and skills is portable.
  • Community and ecosystem. Largest Spark community, extensive partner integrations, and comprehensive documentation

Weaknesses:

  • Cost complexity. DBU-based pricing with different rates per workload type. Costs can spiral quickly without careful cluster management and auto-termination policies.
  • Two control planes. You manage both the Azure portal and the Databricks workspace. RBAC mapping between them requires careful design.
  • Power BI integration. Functional but not seamless. Direct Lake mode is not available; you need import or DirectQuery.

Best for: Organizations with strong data engineering teams that need maximum flexibility, best-in-class ML support, and prefer open standards.

Azure Synapse Analytics

Microsoft's pre-Fabric analytics service, still available and supported.

Our honest assessment: Synapse is effectively in maintenance mode. Microsoft's investment has shifted to Fabric. While existing Synapse deployments continue to work, we would not recommend starting new greenfield projects on Synapse in 2026.

When it still makes sense: If you have significant Synapse investments and no immediate need to migrate, continue operating. Plan a migration to Fabric on your own timeline.

Decision Framework

CriterionFabricDatabricksSynapse
Time to valueFast (wizard-driven)Moderate (more setup)Moderate
Engineering flexibilityMediumHighMedium
ML/AI supportGrowingBest-in-classLimited
GovernanceOneLake + PurviewUnity CatalogPurview integration
Cost predictabilityHigh (capacity SKU)Lower (DBU-based)Medium
Power BI integrationExcellentGoodGood
Open standardsPartialStrongPartial
Greenfield recommendationYesYesNo

Delta Lake: The Common Foundation

Regardless of platform, Delta Lake has won the storage format battle on Azure. Both Fabric and Databricks use it natively. Key features that matter for enterprise:

  • ACID transactions — No more corrupted datasets from failed writes
  • Time travel — Query previous versions of your data. Invaluable for debugging and audit
  • Schema enforcement and evolution — Prevent bad data from entering your silver layer while allowing controlled schema changes
  • Z-ordering and data skipping — Physical optimization that can reduce query times by 10-100x on large tables

Practical tip: Implement a VACUUM policy from day one. Without it, time travel history accumulates and storage costs grow silently. We recommend 30-day retention for bronze, 7 days for silver and gold.

Governance: The Differentiator

Data governance is where platform choice has the deepest long-term impact:

Microsoft Purview + Fabric

  • Automatic metadata scanning across OneLake
  • Classification and sensitivity labeling inherited from Microsoft 365
  • Lineage tracking across Fabric items
  • Gap: Cross-platform lineage (e.g., data flowing from Fabric to non-Microsoft tools) is limited

Databricks Unity Catalog

  • Fine-grained access control at table, column, and row level
  • Attribute-based access control for dynamic data masking
  • Cross-workspace governance with a single metastore
  • Delta Sharing for governed data exchange with external partners
  • Gap: Requires Databricks Premium tier; adds significant cost

Cost Comparison: A Realistic Scenario

For a mid-sized data platform (10 TB raw data, 50 concurrent users, 8 hours active daily):

ComponentFabric (F64 capacity)Databricks + ADLS
Compute~$6,000/month~$8,000-12,000/month
StorageIncluded in capacity~$200/month (ADLS)
BI layerIncluded (Power BI)Power BI Pro licenses separate
GovernancePurview (separate)Unity Catalog (Premium tier)
Estimated total$7,000-8,000/month$10,000-15,000/month

Caveat: These are rough estimates. Actual costs depend heavily on workload patterns, cluster sizing, and auto-scaling configuration. We strongly recommend running a 4-week proof of concept with realistic workloads before committing.

Our Recommendations

  1. If you are a Microsoft-first shop with strong Power BI usage: Start with Fabric. The integration advantages are real, and the platform is maturing rapidly.
  2. If you have complex ML/AI workloads or multi-cloud requirements: Choose Databricks. The engineering flexibility and open standards commitment are worth the additional cost.
  3. If you are on Synapse today: Begin planning a migration to Fabric. The migration tooling is improving, and the longer you wait, the larger the gap between platforms.
  4. Regardless of platform: Invest in governance from day one. Retrofitting data classification and access control is exponentially harder than building it in.

Getting Started

Before choosing a platform, answer these questions:

  • What percentage of your analytics consumers use Power BI vs. notebooks vs. SQL?
  • Do you have existing Spark/Python skills on the team?
  • Is multi-cloud a current requirement or a theoretical future need?
  • What is your data residency and sovereignty posture?

The answers will point you clearly toward one platform. If they do not, talk to us — platform selection is one of the most common engagements we run, and we can typically help a team decide in a one-week assessment.

data lakehouse AzureMicrosoft Fabric vs Databricksmedallion architectureDelta Lake enterpriseAzure data platform comparison

Need expert guidance?

Our team specializes in cloud architecture, security, AI platforms, and DevSecOps. Let's discuss how we can help your organization.

Related articles