What is agentic AI in a production context?

Agentic AI refers to autonomous software agents that can plan, execute, evaluate, and adapt their actions without human intervention at each step. In production, this means agents that trigger ML training jobs, evaluate model quality, promote or rollback models, and respond to monitoring alerts — all without manual approval for routine decisions.

Why use Azure Functions for AI agents instead of a dedicated agent framework?

Azure Functions provides VNet integration (critical for security), Durable Functions for reliable orchestration with checkpointing, Service Bus triggers for event-driven activation, and consumption-based billing. For enterprise environments that already have Azure infrastructure, Functions avoids introducing another runtime to manage.

What happens if the Planner-Executor-Critic loop exceeds the maximum retry count?

After three failed attempts (configurable), the multi-agent pattern returns a final rejection verdict with the accumulated feedback from all Critic evaluations. This triggers an alert for human review rather than continuing indefinitely, preventing runaway compute costs and ensuring human oversight for edge cases.

Can these patterns work with models outside Databricks?

Yes. The orchestrator and multi-agent patterns interact with Databricks through REST APIs. The same patterns can trigger SageMaker jobs, Vertex AI pipelines, or any ML platform with an API. The monitoring responder is even more portable — it reacts to Service Bus messages regardless of their source.

Agentic AI in Production: Three Patterns with Azure Functions and Databricks

Everyone is building AI agents. Most of those agents run in notebooks, require manual triggers, and fail silently when something goes wrong. That is fine for demos. It is not fine when your ML pipeline needs to retrain models at 3am, evaluate whether the new version is better, and decide autonomously whether to promote or rollback.

This post describes three agentic AI patterns we run in production, all included in our open-source enterprise Databricks platform. They run on Azure Functions with Durable Functions for orchestration, Service Bus for event-driven activation, and Databricks for compute.

Why Azure Functions for AI Agents?

Before diving into the patterns, the runtime choice matters:

VNet integration — Our agents interact with Databricks and ADLS Gen2 through private endpoints. Azure Functions Premium (EP1) supports VNet integration, meaning agents operate inside the same secure perimeter as the data and compute they orchestrate. No public endpoints, no firewall exceptions.

Durable Functions — The orchestrator pattern needs reliable, checkpointed execution. If the function host restarts mid-workflow, Durable Functions replays from the last checkpoint. This is critical for long-running ML jobs that can take hours.

Service Bus triggers — The monitoring responder needs event-driven activation. Service Bus provides exactly-once delivery with dead-letter queues for failed messages. No polling loops, no cron-based checking.

Managed identity — Functions authenticate to all Azure services via a user-assigned managed identity. No secrets, no tokens, no rotation schedules.

Pattern 1: Durable Orchestrator

The most common failure mode in ML pipelines is the gap between "the training job finished" and "the model is safely in production." The durable orchestrator closes that gap with a five-step chain:

The Workflow

Code

validate_data → trigger_databricks_job → poll_job_status → evaluate_metrics → decide_action

Step 1 — validate_data: Checks that the input dataset meets minimum quality thresholds (row count, null percentage, schema conformance). If validation fails, the orchestrator halts and sends an alert. No point training on garbage data.

Step 2 — trigger_databricks_job: Submits a training run to a Databricks job cluster via the Jobs API. The job cluster configuration is defined in the Terraform module — instance type, max workers, autotermination timeout, and Spark configuration are all version-controlled.

Step 3 — poll_job_status: Polls the Databricks job status with exponential backoff. Durable Functions handles the timer management — the orchestrator sleeps between polls without consuming compute. If the job fails, the orchestrator captures the error and routes to alerting.

Step 4 — evaluate_metrics: Reads the training metrics from MLflow (RMSE, MAE, R2, MAPE for the revenue forecast; anomaly rate for the detector). Compares against the current production model's metrics. This is where the agent makes its decision — is the new model better?

Step 5 — decide_action: If the new model improves on all tracked metrics, promote it by updating the MLflow model alias from challenger to champion. If it regresses, reject the promotion and log the comparison. If metrics are mixed (better on some, worse on others), flag for human review.

Why This Beats Cron Jobs

A cron-based pipeline either promotes blindly or requires a human to check metrics manually. The orchestrator evaluates metrics programmatically using the same thresholds your team would apply. It also handles partial failures gracefully — if the Databricks job fails, the orchestrator retries with backoff rather than silently failing until someone notices next Monday.

Pattern 2: Multi-Agent (Planner-Executor-Critic)

Some workflows cannot be decomposed into a fixed sequence. Model improvement, for example, might require iterating on feature engineering, hyperparameter tuning, or data augmentation — and the right approach depends on the specific failure mode.

The Loop

Code

Planner → Executor → Critic → (retry or accept)

Planner agent analyzes the current state (model metrics, data drift indicators, recent alert history) and produces a plan: "Retrain with updated features" or "Adjust hyperparameters" or "Augment training data with recent samples."

Executor agent runs the plan against Databricks — submitting jobs, updating configurations, or modifying data pipelines as specified by the Planner.

Critic agent evaluates the Executor's output against defined acceptance criteria. It returns one of three verdicts:

Accept — Output meets criteria. Proceed with promotion.
Reject — Output does not meet criteria after maximum attempts. Alert for human review.
Retry — Output shows improvement but does not meet criteria. Feed feedback to the Planner for another iteration.

The loop runs up to three iterations (configurable). Each iteration includes the Critic's feedback, so the Planner can adjust its strategy rather than repeating the same approach.

When to Use This Pattern

Use the Planner-Executor-Critic loop when:

The problem space has multiple valid approaches and the best one is not known upfront
You need iterative refinement based on intermediate results
Simple sequential orchestration produces suboptimal outcomes
Human review is expensive and should be reserved for genuine edge cases

Do not use it when a fixed sequence suffices. The orchestrator pattern is simpler, faster, and easier to debug.

Pattern 3: Monitoring Responder

Production ML systems degrade. Data drift, schema changes, upstream pipeline failures, and model staleness are not exceptions — they are the normal operating conditions of any long-lived ML system.

The monitoring responder is an event-driven agent that reacts to alerts automatically.

How It Works

Alert sources — Log Analytics scheduled queries detect anomalies: spike in model error rate, data drift exceeding PSI threshold, pipeline failure, budget overrun
Service Bus routing — Alerts are published as messages to Service Bus queues, categorized by type
Classification — The responder reads the message and classifies severity (critical, warning, informational)
Auto-mitigation — Based on severity and alert type:
- Critical model degradation → Rollback to the previous production model version via MLflow alias swap
- Data drift detected → Trigger a retraining job via the durable orchestrator
- Pipeline failure → Create an incident record and notify the on-call team
- Budget alert → Log the event and pause non-essential compute

The Key Insight

The monitoring responder does not replace human judgment — it handles the first five minutes. A model rollback at 3am is better than waiting until someone checks the dashboard at 9am. The responder buys time for human experts to investigate root causes without customers experiencing degraded predictions during the gap.

How the Three Patterns Work Together

In practice, the three patterns form a self-managing ML lifecycle:

New data arrives in the bronze container → Event Grid triggers the Durable Orchestrator
The orchestrator trains, evaluates, and promotes (or rejects) the new model
If the model does not meet criteria after basic evaluation, the Multi-Agent loop iterates on improvement
In production, the Monitoring Responder watches for drift and degradation, triggering retraining or rollback as needed

The entire cycle can run without human intervention for routine cases. Humans are called in for genuine edge cases — mixed metric results, repeated Critic rejections, or novel alert patterns.

Running the Agents

All three patterns are included in the databricks-enterprise-ai-platform repository. The modules/compute_integration Terraform module provisions the Azure Functions runtime, Service Bus queues, and Event Grid subscriptions. The Python function code lives in the agents/ directory.

Related Resources

We Open-Sourced Our Enterprise Databricks AI Platform Blueprint — The full platform context for these agent patterns.
Building Enterprise RAG Pipelines: Architecture, Pitfalls, and Best Practices — Another AI architecture pattern for enterprise LLM deployments.
Deploying LLMs in the Enterprise: Security, Cost, and Governance — Governance for the LLM layer these agents interact with.

Want to deploy autonomous ML agents in your infrastructure? Contact us — we help teams build self-managing AI platforms.