Medallion Architecture in Microsoft Fabric, Done Right
How to design a bronze, silver, gold medallion architecture in Microsoft Fabric and OneLake — layering, governance, and 2026 AI-ready patterns that scale.
Most enterprise data platforms do not fail because the technology is wrong. They fail because nobody can say, with confidence, where a number on a dashboard came from, whether it can be trusted, and what would break if it changed. The medallion architecture — bronze, silver, gold — exists to answer exactly those questions, and Microsoft Fabric makes it the default way to build a lakehouse. The pattern is simple to draw on a whiteboard and surprisingly easy to get wrong in practice.
This is the architecture we use at CC Conceptualise when we build Fabric data platforms for European enterprises, and it is the foundation everything else — Data Agents, real-time AI, governed analytics — depends on. Below is how we design the layers, where teams go wrong, and what changes in the 2026 Fabric estate.
TL;DR / Key takeaways
- Bronze is an immutable landing zone, not an analytics layer. Land raw data faithfully, never query it for business reporting, and keep it reprocessable.
- Silver is where data becomes trustworthy. Clean, deduplicate, conform schemas, apply data quality gates, and mask sensitive fields here.
- Gold is business-ready and modelled for consumption. Aggregates, star schemas, and semantic models live here — and this is what your AI and Copilot ground on.
- OneLake and open Delta-Parquet make the layers portable. Shortcuts and open formats let you bring in Azure Databricks and Azure ML assets without copies.
- The medallion pattern is a governance tool. Each layer transition is where you apply masking, lineage, and access control — critical for GDPR and auditability.
What the medallion architecture actually is
The medallion architecture organises a lakehouse into progressively refined layers. Data flows in one direction, from raw to refined, and each layer has a distinct purpose and a distinct audience.
| Layer | Purpose | Typical state | Primary audience |
|---|---|---|---|
| Bronze | Faithful landing of source data | Raw, append-only, source-aligned | Data engineers only |
| Silver | Cleaned, conformed, deduplicated | Validated Delta tables, enterprise schema | Data engineers, advanced analysts |
| Gold | Business-ready, aggregated, modelled | Star schemas, aggregates, semantic models | Analysts, BI, AI agents, business users |
The discipline the pattern enforces is more valuable than the three-box diagram. Because data only ever moves forward and bronze is never mutated, you can always reprocess silver and gold from source. Because each layer has a clear contract, a change in a source system surfaces as a bronze ingestion change rather than a mysterious shift in a board report. And because the audiences differ, you can grant most of the organisation access only to gold.
Bronze: land it, do not touch it
Bronze is the raw, faithful copy of source data. The cardinal rule is that bronze is immutable and append-only — you capture what the source sent, including the warts, and you do not clean it here. In Fabric you can land files in their native format (JSON, CSV, Parquet) in a lakehouse Files area, or write append-only Delta tables. Add ingestion metadata — load timestamp, source system, batch identifier — so you can trace and reprocess later.
The most common bronze mistake is treating it as a place analysts query directly. It is not. Bronze exists so that if your silver logic was wrong, you can rebuild from a faithful record without going back to the source system, which may no longer hold the history.
Silver: where data becomes trustworthy
Silver is the layer that earns trust. Here you clean, deduplicate, validate, conform data types, resolve keys, and apply enterprise-wide schemas so that "customer" means the same thing across every source. This is where data quality gates belong: rows that fail validation are quarantined, not silently dropped, and the failure is observable.
Silver is also the correct place to apply privacy controls. For European clients, moving from bronze to silver is the natural point to pseudonymise or mask personal data, so that the broadly accessible downstream layers never carry raw identifiers. Silver should hold clean, atomic, conformed data — not yet shaped for a specific report.
Gold: business-ready and modelled for consumption
Gold is what the business actually uses. Here you build aggregates, star schemas, and the dimensional models that power Power BI and feed AI. Gold tables are shaped for consumption: a finance gold model looks different from a supply-chain one because each serves a specific analytical purpose.
Crucially, in 2026 gold is also what your AI grounds on. Fabric Data Agents, Copilot, and natural-language querying produce far better answers against a curated, well-modelled gold layer than against raw data. The Power BI Copilot Tooling Format, which reached general availability in May 2026, makes the semantic models built on gold Git-friendly and text-based — so the business logic that sits on top of gold becomes reviewable and version-controlled like code.
Implementing the layers in Fabric and OneLake
Fabric materialises the medallion architecture on OneLake, the single tenant-wide data lake. Everything is stored in open Delta-Parquet, which is why other engines can read Fabric data without copies, and why you can use shortcuts to reference data in place rather than duplicating it.
A pragmatic implementation sequence:
- Decide your boundary model. Use one workspace per layer when you need clean least-privilege access, capacity isolation, and separate deployment pipelines — typical for regulated enterprises. Use a single workspace with three lakehouses for smaller estates or proofs of concept.
- Stand up bronze ingestion. Use Data Factory pipelines, Dataflows Gen2, or shortcuts to land source data append-only, capturing ingestion metadata. Never transform here.
- Build silver transformations. Use Spark notebooks or Dataflows to clean, deduplicate, conform, and validate into Delta tables, with explicit data quality gates and quarantine handling.
- Model gold for consumption. Build aggregates and star schemas, then layer Direct Lake semantic models on top so Power BI reads gold without import or refresh lag.
- Wire governance across all three. Apply sensitivity labels, lineage, and access policies in OneLake from the start, restricting most users to gold.
- Add AI on the curated layer. Point Data Agents, Copilot grounding, and Eventhouse-based real-time querying at gold and silver — never at raw bronze.
Where real-time and AI fit
Not all data is batch. For event and telemetry workloads, Fabric's Eventhouse and the remote MCP that lets agents query real-time data with natural language and KQL slot alongside the medallion layers rather than replacing them. A common pattern is to treat the real-time path as its own bronze-equivalent landing for events, then promote conformed, aggregated state into silver and gold so that batch and streaming reconcile into one trusted gold layer.
On the machine-learning side, cross-workspace MLflow logging in 2026 enables end-to-end MLOps and lets you bring assets from Azure Databricks and Azure Machine Learning into Fabric. The medallion layers give those models a stable, governed feature source: train and serve from silver and gold, not from raw bronze that can change shape without warning.
Decision guide: how much structure do you actually need?
The medallion pattern is a means to traceability and trust, not a badge to wear. Match the rigour to the stakes.
| Situation | Recommended structure |
|---|---|
| Single small dataset, one team, low risk | One curated lakehouse may be enough; do not over-engineer |
| Multiple sources, shared analytics | Full bronze/silver/gold in one workspace, three lakehouses |
| Regulated data, multiple teams, audit needs | Workspace-per-layer with enforced least-privilege and pipelines |
| Real-time plus batch | Medallion layers plus an Eventhouse real-time path reconciled into gold |
| AI agents and Copilot on the data | Invest heavily in a clean, well-modelled gold layer |
Common mistakes we see
In our delivery work, the same anti-patterns recur. The most frequent is querying bronze directly — the moment analysts build reports on raw data, you lose the ability to reprocess and the line between raw and trusted blurs. Close behind is logic in the wrong layer: when business rules creep into silver, gold has nothing distinct to do and the separation collapses. Teams also routinely skip data quality gates, so bad data flows silently to gold, and defer governance until an audit forces a painful retrofit. Finally, the opposite failure — over-engineering for tiny datasets — wastes effort on ceremony that adds no traceability value.
Why this matters for European enterprises
For organisations subject to GDPR and broader EU data governance, the medallion architecture is not just good engineering hygiene — it is a compliance asset. The bronze-to-silver transition is where pseudonymisation and masking belong; lineage captured across the layers supports accountability; and restricting most users to gold operationalises data minimisation and purpose limitation. OneLake's tenant-wide governance, sensitivity labels, and Purview integration let you enforce these consistently rather than per-pipeline. When a regulator or auditor asks where a figure came from, a disciplined medallion estate lets you answer in minutes rather than weeks.
Getting it right
A medallion architecture done right is quiet. Numbers reconcile, lineage is obvious, reprocessing is routine, and AI grounds on data the business already trusts. Done wrong, it becomes three folders with the right names and none of the discipline. The difference is entirely in the contracts between layers and the governance applied at each transition.
If you are designing or remediating a Fabric data platform and want a partner who has delivered governed medallion estates for European enterprises, our AI and Data Platform Engineering team is happy to help — architecture review, build, or a second opinion on the layering you already have.
FAQ
What is the medallion architecture in Microsoft Fabric?
The medallion architecture is a data design pattern that organises a lakehouse into three progressive layers: bronze for raw ingested data, silver for cleaned and conformed data, and gold for business-ready, aggregated data. In Microsoft Fabric these layers live on OneLake, the single tenant-wide data lake, and are typically realised as separate lakehouses or schemas. The pattern gives you traceability, reprocessability, and a clear contract between data engineering and analytics teams.
Should bronze, silver, and gold be separate workspaces or separate lakehouses?
There is no single right answer, but a common and defensible pattern is one workspace per layer for clear security and lifecycle boundaries, or a single workspace with three lakehouses for smaller estates. Separate workspaces make least-privilege access, capacity isolation, and deployment pipelines cleaner, which matters once auditors and multiple teams are involved. For a small team or a proof of concept, three lakehouses in one workspace keeps things simple without losing the layering.
Does Fabric force me to use Delta and Parquet for the medallion layers?
Fabric standardises on the Delta Lake format over Parquet for lakehouse tables, and OneLake stores everything in open Delta-Parquet so other engines can read it without copies. You can land truly raw files such as JSON or CSV in bronze, but you convert to Delta tables as you move into silver and gold. This open format is what enables shortcuts, cross-engine access, and bringing assets from Azure Databricks or Azure Machine Learning into Fabric without duplication.
How does the medallion architecture support AI and agents in Fabric in 2026?
A clean gold layer is what makes AI trustworthy. Fabric Data Agents and Copilot grounding work far better against curated, well-modelled gold tables than against raw bronze. In 2026 capabilities such as the Eventhouse remote MCP let agents query real-time data with natural language and KQL, and cross-workspace MLflow logging supports end-to-end MLOps — but all of this assumes the underlying data has been conformed and governed through the medallion layers first.
What are the most common mistakes when implementing a Fabric medallion architecture?
The recurring mistakes are treating bronze as a queryable analytics layer instead of an immutable landing zone, baking business logic into silver so gold has nothing distinct to do, skipping data quality gates between layers, and ignoring governance until late. Another frequent error is over-engineering the layering for tiny datasets where a single curated lakehouse would suffice. The medallion pattern is a means to traceability and trust, not a goal in itself.
How does medallion architecture relate to EU data governance such as GDPR?
The layering directly supports compliance because each transition is an opportunity to apply controls: pseudonymisation or masking moving from bronze to silver, lineage capture for accountability, and access restriction so most users only ever see gold. OneLake's tenant-wide governance, sensitivity labels, and Purview integration let you enforce these controls consistently. For European enterprises subject to GDPR, the clear separation between raw personal data and business-ready aggregates makes data-minimisation and purpose-limitation obligations far easier to evidence.
Topics