End-to-End Fabric MLOps with Cross-Workspace MLflow

Most teams can train a model in a Fabric notebook within an afternoon. Getting that model through a governed lifecycle — versioned, reproducible, promoted across environments, and traceable back to the exact data it learned from — is where the real work begins. That gap is what Fabric MLOps closes, and in 2026 the key enabler is cross-workspace MLflow.

This post is a practitioner's walkthrough of how we build end-to-end machine learning Fabric pipelines that satisfy enterprise governance without drowning data scientists in process.

TL;DR / Key takeaways

Cross-workspace MLflow decouples where you train from where you govern, giving you clean dev → staging → production boundaries inside Microsoft Fabric.
The model registry Fabric provides — backed by OneLake — keeps models, lineage, and training data in one place, which is decisive for EU audit and reproducibility requirements.
You can bring Azure Databricks and Azure ML assets into Fabric via MLflow logging interop, standardising governance without re-training.
Source data must come from the gold layer of a medallion architecture so every prediction traces back to validated inputs.
With the Power BI Copilot Tooling Format now GA, the consumption layer (semantic models, reports) moves through CI/CD alongside your model code.

Why cross-workspace MLflow changes the picture

For most of Fabric's history, MLflow tracking was effectively workspace-bound: you logged runs and registered models in the same workspace where the notebook executed. That is fine for a proof of concept and actively harmful for production. It means your experimentation sandbox and your production model registry share the same blast radius, the same access list, and the same capacity.

Cross-workspace MLflow breaks that coupling. A run logged in a data science workspace can be referenced, registered, and promoted from a separate governance workspace. The practical consequence is that Fabric MLflow can now express the same separation of duties that mature teams take for granted on dedicated ML platforms — but without leaving the OneLake estate where your analytics, Power BI, and Data Agents already live.

The other half of the story is interop. Fabric supports MLflow logging that brings assets from Azure Databricks and Azure ML into Fabric. If your data engineering team runs heavy Spark training on Databricks, you no longer have to choose between their tooling and Fabric's governance — you train where it is cheapest and most capable, then register and serve where it is most governable.

Reference architecture: three workspaces, one registry discipline

The architecture we deploy for clients maps environments to workspaces and MLflow registry stages to promotion gates.

Loading diagram...

Workspace	Purpose	MLflow role	Who has write access
`ml-dev`	Experimentation, feature engineering, training	Logs runs and experiments freely	Data scientists
`ml-staging`	Validation, integration tests, shadow scoring	Holds registered candidate versions	ML engineers + CI
`ml-prod`	Serving, batch scoring, monitoring	Holds the production alias only	Deployment pipeline only

The discipline is simple: humans log; pipelines promote. A data scientist never registers a model directly into production. Instead, a run in ml-dev is logged with its signature and metrics, a pipeline registers the best candidate into ml-staging, automated gates validate it, and only then does a promotion step move the production alias in ml-prod. Cross-workspace MLflow is what makes this reference clean rather than a set of brittle copy scripts.

Where the training data comes from

A model is only as trustworthy as its inputs. Training data must come from the gold layer — or a curated silver layer — of a medallion architecture in OneLake, never from raw bronze. This gives you three things at once: reproducible feature inputs, validated quality, and a lineage chain that runs from source system through bronze/silver/gold to the MLflow run that consumed it. When an auditor asks "what data trained the model that made this decision," the answer is a query, not an archaeology project.

Building the pipeline step by step

Here is the sequence we follow on a typical engagement.

Establish workspaces and capacity. Create ml-dev, ml-staging, and ml-prod. Assign capacity deliberately — production scoring should not contend with experimentation.
Point training at gold. Read features from the gold layer via OneLake shortcuts. Pin the data version so the run is reproducible.
Log everything with MLflow. Capture parameters, metrics, the model signature, and input examples. The signature is not optional — it is what lets staging validate inputs automatically.
Register the candidate cross-workspace. From the dev run, register the chosen version into the ml-staging registry. This is the cross-workspace step that older Fabric could not do cleanly.
Run automated gates. In staging, validate the signature, compare metrics against the incumbent, run bias and drift checks, and score a holdout. Fail the build on regression.
Promote by alias, not by copy. Move the production alias in ml-prod to the validated version. Rollback is then a one-line alias change.
Wire the consumption layer to CI/CD. Export Power BI semantic models in the Copilot Tooling Format so reports surfacing the predictions are versioned and deployed alongside the model.
Close the loop with monitoring. Log production inputs and outputs back into OneLake, and watch for data drift against the gold-layer training distribution.

In a recent post-merger data consolidation, we used exactly this pattern to bring two acquired teams — one on Azure Databricks, one already in Fabric — onto a single governed registry without forcing either to abandon their training stack. The Databricks team kept their Spark notebooks; their models flowed into the Fabric ml-staging registry via MLflow logging, and from there everyone shared one promotion process.

Promotion gates that actually hold

A promotion gate is only useful if it can say no. The gates we consider mandatory for an enterprise Fabric MLOps pipeline:

Gate	What it checks	Blocks promotion when
Signature match	Model input/output schema	Schema differs from the registered contract
Metric regression	Primary metric vs. incumbent	New version underperforms beyond tolerance
Data drift	Training vs. recent production distribution	Drift exceeds threshold
Fairness / bias	Subgroup performance	Disparity exceeds policy limit
Lineage completeness	Source → gold → run chain	Any link is missing or unversioned

These gates are not bureaucracy for its own sake. Under emerging EU expectations — including the documentation and traceability themes running through the EU AI Act — being able to demonstrate why a given model version reached production is fast becoming a baseline requirement, not a nice-to-have. The lineage-completeness gate, in particular, is what turns "we think this is reproducible" into "here is the exact chain."

Real-time scoring and agentic consumption

Batch scoring is the common case, but Fabric's 2026 capabilities push further. Models registered through this pipeline can be consumed by autonomous workflows — Fabric Data Agents and Copilot Cowork can invoke governed models as part of an executed plan. And where predictions need to act on live data, the Eventhouse remote MCP lets agents query real-time streams via natural language and KQL, feeding fresh features to a model that was trained and registered through the very pipeline described above.

The point is that the registry is not a dead end. A model promoted to the ml-prod alias is addressable by the rest of the Fabric platform — analytics, agents, and reports — precisely because it lives in OneLake rather than a siloed ML service.

Common mistakes we see

One workspace for everything. Convenient on day one, ungovernable by day ninety. Separate environments early.
Training from bronze. Skipping the gold layer destroys reproducibility and lineage. Always train from curated data.
Promoting by copying artifacts. Use registry aliases. Copies drift; aliases are atomic and reversible.
Skipping the model signature. Without it, staging cannot validate inputs and you discover schema breaks in production.
Forgetting the consumption layer. A perfectly governed model feeding an unversioned report is only half an MLOps story. Put semantic models in the Copilot Tooling Format under source control.

Conclusion

Cross-workspace MLflow is the piece that finally lets Microsoft Fabric host a credible, audit-ready MLOps lifecycle without bolting on a separate ML platform. Train where it makes sense — including Azure Databricks or Azure ML — register and govern in Fabric, promote through gates that can say no, and keep the whole chain from gold-layer data to deployed prediction traceable in OneLake.

If you are designing a Fabric MLOps platform or consolidating fragmented ML estates onto a governed registry, our AI & Data Platform engineering team has delivered exactly this for European enterprises. We are happy to review your architecture.

FAQ

What is cross-workspace MLflow in Microsoft Fabric?

Cross-workspace MLflow lets you log experiments, runs, and models in one Fabric workspace and reference or register them from another. It decouples the workspace where training happens from the workspace where models are governed and promoted, which is the foundation of a proper MLOps lifecycle with separate dev, staging, and production boundaries.

How does Fabric MLflow compare to Azure ML for the model registry?

Both implement the MLflow tracking and registry APIs, so the SDK calls are largely the same. The difference is gravity: Fabric keeps models, lineage, and the data they were trained on inside OneLake alongside your Power BI and analytics estate, while Azure ML is a standalone ML platform. With cross-workspace MLflow and MLflow logging interop, you can train in Azure ML or Azure Databricks and bring those assets into Fabric for serving and governance.

Can I bring models trained in Azure Databricks or Azure ML into Fabric?

Yes. Fabric's cross-workspace MLflow logging supports end-to-end MLOps where assets from Azure Databricks or Azure ML are registered and managed inside Fabric. The model artifact and its MLflow metadata move with it, so lineage and signatures are preserved. This avoids re-training and lets you standardise governance in one place.

How does the medallion architecture relate to Fabric MLOps?

Training data should come from the gold (or curated silver) layer of a medallion architecture in OneLake, never from raw bronze. This gives you reproducible, validated feature inputs and clean lineage from source to model. The medallion layering and the MLflow run together form an auditable chain from raw data to deployed prediction.

What is the Power BI Copilot Tooling Format and why does it matter for MLOps?

The Copilot Tooling Format reached GA in May 2026 and is a Git-friendly, text-based metadata format for Power BI semantic models. For MLOps it matters because it lets the downstream consumption layer — the reports and semantic models that surface predictions — live in source control next to your model code, so the whole analytics-to-ML estate can move through CI/CD together.

Do I need three separate Fabric workspaces for dev, staging, and production?

Three workspaces is the cleanest mapping to MLflow registry stages and to deployment pipelines, but it is not mandatory. Smaller teams can run dev and a combined staging/production split with strict registry aliases and access controls. The important principle is that promotion is a governed, gated transition between environments, not an in-place edit.

End-to-End Fabric MLOps with Cross-Workspace MLflow

Why cross-workspace MLflow changes the picture

Reference architecture: three workspaces, one registry discipline

Where the training data comes from

Building the pipeline step by step

Promotion gates that actually hold

Real-time scoring and agentic consumption

Common mistakes we see

Conclusion

FAQ

Frequently Asked Questions

Need expert guidance?

Related articles

A2A Protocol: Patterns for Multi-Agent Systems

AI Agent Cost Governance: Control Token Spend at Scale

AI Agent Evaluation: Building a Testing Harness