Agentic AI in Production: Three Patterns with Azure Functions and Databricks
Three production-ready agentic AI patterns — durable orchestrator, multi-agent Planner-Executor-Critic, and monitoring responder — running on Azure Functions with Databricks integration.
Everyone is building AI agents. Most of those agents run in notebooks, require manual triggers, and fail silently when something goes wrong. That is fine for demos. It is not fine when your ML pipeline needs to retrain models at 3am, evaluate whether the new version is better, and decide autonomously whether to promote or rollback.
This post describes three agentic AI patterns we run in production, all included in our open-source enterprise Databricks platform. They run on Azure Functions with Durable Functions for orchestration, Service Bus for event-driven activation, and Databricks for compute.
Why Azure Functions for AI Agents?
Before diving into the patterns, the runtime choice matters:
VNet integration — Our agents interact with Databricks and ADLS Gen2 through private endpoints. Azure Functions Premium (EP1) supports VNet integration, meaning agents operate inside the same secure perimeter as the data and compute they orchestrate. No public endpoints, no firewall exceptions.
Durable Functions — The orchestrator pattern needs reliable, checkpointed execution. If the function host restarts mid-workflow, Durable Functions replays from the last checkpoint. This is critical for long-running ML jobs that can take hours.
Service Bus triggers — The monitoring responder needs event-driven activation. Service Bus provides exactly-once delivery with dead-letter queues for failed messages. No polling loops, no cron-based checking.
Managed identity — Functions authenticate to all Azure services via a user-assigned managed identity. No secrets, no tokens, no rotation schedules.
Pattern 1: Durable Orchestrator
The most common failure mode in ML pipelines is the gap between "the training job finished" and "the model is safely in production." The durable orchestrator closes that gap with a five-step chain:
The Workflow
validate_data → trigger_databricks_job → poll_job_status → evaluate_metrics → decide_actionStep 1 — validate_data: Checks that the input dataset meets minimum quality thresholds (row count, null percentage, schema conformance). If validation fails, the orchestrator halts and sends an alert. No point training on garbage data.
Step 2 — trigger_databricks_job: Submits a training run to a Databricks job cluster via the Jobs API. The job cluster configuration is defined in the Terraform module — instance type, max workers, autotermination timeout, and Spark configuration are all version-controlled.
Step 3 — poll_job_status: Polls the Databricks job status with exponential backoff. Durable Functions handles the timer management — the orchestrator sleeps between polls without consuming compute. If the job fails, the orchestrator captures the error and routes to alerting.
Step 4 — evaluate_metrics: Reads the training metrics from MLflow (RMSE, MAE, R2, MAPE for the revenue forecast; anomaly rate for the detector). Compares against the current production model's metrics. This is where the agent makes its decision — is the new model better?
Step 5 — decide_action: If the new model improves on all tracked metrics, promote it by updating the MLflow model alias from challenger to champion. If it regresses, reject the promotion and log the comparison. If metrics are mixed (better on some, worse on others), flag for human review.
Why This Beats Cron Jobs
A cron-based pipeline either promotes blindly or requires a human to check metrics manually. The orchestrator evaluates metrics programmatically using the same thresholds your team would apply. It also handles partial failures gracefully — if the Databricks job fails, the orchestrator retries with backoff rather than silently failing until someone notices next Monday.
Pattern 2: Multi-Agent (Planner-Executor-Critic)
Some workflows cannot be decomposed into a fixed sequence. Model improvement, for example, might require iterating on feature engineering, hyperparameter tuning, or data augmentation — and the right approach depends on the specific failure mode.
The Loop
Planner → Executor → Critic → (retry or accept)Planner agent analyzes the current state (model metrics, data drift indicators, recent alert history) and produces a plan: "Retrain with updated features" or "Adjust hyperparameters" or "Augment training data with recent samples."
Executor agent runs the plan against Databricks — submitting jobs, updating configurations, or modifying data pipelines as specified by the Planner.
Critic agent evaluates the Executor's output against defined acceptance criteria. It returns one of three verdicts:
- Accept — Output meets criteria. Proceed with promotion.
- Reject — Output does not meet criteria after maximum attempts. Alert for human review.
- Retry — Output shows improvement but does not meet criteria. Feed feedback to the Planner for another iteration.
The loop runs up to three iterations (configurable). Each iteration includes the Critic's feedback, so the Planner can adjust its strategy rather than repeating the same approach.
When to Use This Pattern
Use the Planner-Executor-Critic loop when:
- The problem space has multiple valid approaches and the best one is not known upfront
- You need iterative refinement based on intermediate results
- Simple sequential orchestration produces suboptimal outcomes
- Human review is expensive and should be reserved for genuine edge cases
Do not use it when a fixed sequence suffices. The orchestrator pattern is simpler, faster, and easier to debug.
Pattern 3: Monitoring Responder
Production ML systems degrade. Data drift, schema changes, upstream pipeline failures, and model staleness are not exceptions — they are the normal operating conditions of any long-lived ML system.
The monitoring responder is an event-driven agent that reacts to alerts automatically.
How It Works
- Alert sources — Log Analytics scheduled queries detect anomalies: spike in model error rate, data drift exceeding PSI threshold, pipeline failure, budget overrun
- Service Bus routing — Alerts are published as messages to Service Bus queues, categorized by type
- Classification — The responder reads the message and classifies severity (critical, warning, informational)
- Auto-mitigation — Based on severity and alert type:
- Critical model degradation → Rollback to the previous production model version via MLflow alias swap
- Data drift detected → Trigger a retraining job via the durable orchestrator
- Pipeline failure → Create an incident record and notify the on-call team
- Budget alert → Log the event and pause non-essential compute
The Key Insight
The monitoring responder does not replace human judgment — it handles the first five minutes. A model rollback at 3am is better than waiting until someone checks the dashboard at 9am. The responder buys time for human experts to investigate root causes without customers experiencing degraded predictions during the gap.
How the Three Patterns Work Together
In practice, the three patterns form a self-managing ML lifecycle:
- New data arrives in the bronze container → Event Grid triggers the Durable Orchestrator
- The orchestrator trains, evaluates, and promotes (or rejects) the new model
- If the model does not meet criteria after basic evaluation, the Multi-Agent loop iterates on improvement
- In production, the Monitoring Responder watches for drift and degradation, triggering retraining or rollback as needed
The entire cycle can run without human intervention for routine cases. Humans are called in for genuine edge cases — mixed metric results, repeated Critic rejections, or novel alert patterns.
Running the Agents
All three patterns are included in the databricks-enterprise-ai-platform repository. The modules/compute_integration Terraform module provisions the Azure Functions runtime, Service Bus queues, and Event Grid subscriptions. The Python function code lives in the agents/ directory.
Related Resources
- We Open-Sourced Our Enterprise Databricks AI Platform Blueprint — The full platform context for these agent patterns.
- Building Enterprise RAG Pipelines: Architecture, Pitfalls, and Best Practices — Another AI architecture pattern for enterprise LLM deployments.
- Deploying LLMs in the Enterprise: Security, Cost, and Governance — Governance for the LLM layer these agents interact with.
Want to deploy autonomous ML agents in your infrastructure? Contact us — we help teams build self-managing AI platforms.