Event-Driven Architecture on Azure: When, Why, and How

Event-driven architecture (EDA) is not a universal solution — but when the problem fits, it is transformative. The challenge for enterprise teams is knowing when EDA is the right choice, which Azure services to use for which scenario, and how to avoid the pitfalls that turn event-driven systems into debugging nightmares.

This guide is the condensed version of what we cover in our architecture workshops, drawn from production systems processing millions of events daily.

When Event-Driven Architecture Makes Sense

EDA adds complexity. That complexity is justified when you have:

Temporal decoupling needs: the producer should not wait for the consumer, and they may operate on different schedules.
Multiple consumers for the same event: one action triggers reactions in several bounded contexts (e.g., an order placed triggers inventory, shipping, notifications, and analytics).
Spike absorption: traffic is bursty and downstream systems cannot scale as quickly as upstream producers.
Audit requirements: you need a complete, immutable history of what happened and when.

When EDA is the wrong choice:

Simple CRUD applications with a single database
Scenarios requiring synchronous, immediate responses
Teams without operational maturity for distributed debugging
Domains where eventual consistency is unacceptable

Rule of thumb: If your system needs to respond "yes, this is done" within the same HTTP request, event-driven is the wrong default. Use synchronous processing and consider events only for side effects.

Choosing the Right Azure Service

Azure offers three core messaging services. Choosing the wrong one is a common and expensive mistake.

Azure Event Grid

What it is: Lightweight event routing service optimized for reactive, event-driven scenarios.

Use when:

You need to react to Azure resource events (blob created, resource group changed)
You want simple pub/sub with filtering by event type or subject
Events are notifications ("something happened"), not commands ("do this")
You need near-real-time delivery with low latency

Limitations: No ordering guarantee, limited retry capabilities, maximum event size of 1 MB. Not suitable for high-throughput data streaming.

Azure Service Bus

What it is: Enterprise message broker with full queuing and pub/sub capabilities.

Use when:

You need guaranteed ordered delivery (sessions)
You need exactly-once processing semantics
Messages are commands that must be processed reliably
You need dead-letter queues, scheduled delivery, or message deferral
Transaction support across multiple queues/topics is required

This is the default choice for most enterprise scenarios. Its feature set covers 80% of use cases.

Azure Event Hubs

What it is: High-throughput data streaming platform (comparable to Apache Kafka).

Use when:

You are ingesting telemetry, logs, or IoT data at massive scale (millions of events per second)
You need consumer groups that read at their own pace
You want to replay events from a specific point in time
Event ordering within a partition is required

Key distinction: Event Hubs retains events for a configurable period (up to 90 days). Consumers pull events — the broker does not push.

Decision Matrix

Requirement	Event Grid	Service Bus	Event Hubs
Pub/sub notifications	Best	Good	Overkill
Command processing	No	Best	No
Ordered delivery	No	Yes (sessions)	Yes (partition)
High throughput streaming	No	Moderate	Best
Dead-letter handling	Limited	Full	Manual
Event replay	No	No	Yes

Implementing CQRS: Separating Reads from Writes

Command Query Responsibility Segregation (CQRS) pairs naturally with EDA. The core idea: use different models for reading and writing data.

When CQRS adds value:

Read and write workloads have different scaling requirements
The read model needs to be denormalized for query performance
Multiple bounded contexts need different views of the same data

Practical implementation on Azure:

Write side: Commands are processed, domain events are published to Service Bus.
Event handlers: Subscribe to events, update read-optimized projections (e.g., in Azure Cosmos DB or Azure SQL).
Read side: Queries hit the denormalized read store directly — no joins, no complex queries.

Critical considerations:

Eventual consistency is inherent. The read model will lag behind the write model. Design your UI to handle this (e.g., redirect to a confirmation page rather than immediately querying the read model).
Do not apply CQRS everywhere. Simple domains with low read/write asymmetry gain nothing but complexity.
Projection rebuilds must be possible. Store events durably so you can rebuild read models from scratch when the schema changes.

Event Sourcing: When the Event Log Becomes the Truth

Event sourcing goes further than CQRS: instead of storing current state, you store every event that changed the state.

Compelling use cases:

Financial systems where auditability is non-negotiable
Domains where understanding "how we got here" is as important as "where we are"
Systems that need to reconstruct state at any point in time

Implementation realities:

Event store: Use Azure Cosmos DB with a change feed, or a dedicated event store library like Marten (PostgreSQL).
Snapshots: For aggregates with thousands of events, periodically create snapshots to avoid replaying the entire event stream.
Schema evolution: Events are immutable. When your event schema needs to change, use upcasting (transforming old events to new format at read time).
Storage costs: Events accumulate. Plan for archival — move events older than N months to cold storage.

Warning: Event sourcing is powerful but adds significant complexity. We recommend it only when the business value of a complete audit trail justifies the engineering investment.

The Saga Pattern: Managing Distributed Transactions

In a microservices world, you cannot use database transactions across service boundaries. The saga pattern coordinates multi-step business processes through events.

Choreography (preferred for simple sagas):

Each service listens for events and publishes its own events. No central coordinator.

Code

OrderService -> OrderPlaced
InventoryService (listens) -> InventoryReserved
PaymentService (listens) -> PaymentProcessed
ShippingService (listens) -> ShipmentScheduled

Orchestration (preferred for complex sagas):

A central saga orchestrator manages the flow and compensation logic.

Code

SagaOrchestrator:
  1. Send ReserveInventory command -> wait for response
  2. Send ProcessPayment command -> wait for response
  3. Send ScheduleShipment command -> wait for response
  On failure at any step -> send compensating commands

Practical guidance:

Choreography works for 3-4 steps. Beyond that, the implicit flow becomes impossible to reason about.
Orchestration is better for complex flows with conditional logic, timeouts, and human intervention steps.
MassTransit provides excellent saga state machine support for .NET, including persistence to Azure SQL or Cosmos DB.
Always define compensating actions. If payment fails after inventory is reserved, the saga must release the reservation.

Dead-Letter Handling: The Safety Net

Dead-letter queues (DLQ) are where messages go when they cannot be processed. Ignoring them is ignoring production failures.

A robust dead-letter strategy:

Monitor DLQ depth with Azure Monitor alerts. A growing DLQ is a symptom of a bug, not a feature.
Build a dead-letter processing pipeline: inspect, diagnose, fix, and replay. Do not manually re-send messages.
Categorize failures: transient (retry), poison (investigate and discard), and schema mismatch (fix consumer and replay).
Set appropriate MaxDeliveryCount: too low and you dead-letter on transient failures; too high and you delay detection. We default to 5-10.
Preserve context: log the exception, the message body, and the trace ID when moving to DLQ.

Tooling recommendation: Build an internal "DLQ dashboard" that operations teams can use to inspect, replay, or discard dead-lettered messages. This pays for itself within weeks.

Architecture Checklist

Before adopting event-driven architecture on Azure:

Identified clear event producers and consumers
Chosen the correct Azure messaging service per scenario
Defined event schemas with versioning strategy
Designed compensation logic for distributed transactions
Established dead-letter monitoring and replay procedures
Accepted eventual consistency and designed the UI accordingly
Implemented distributed tracing across all event flows

At CC Conceptualise, we design and implement event-driven architectures on Azure that handle real-world complexity. If you are evaluating EDA for your enterprise, reach out.

Event-Driven Architecture on Azure: When, Why, and How

When Event-Driven Architecture Makes Sense

Choosing the Right Azure Service

Azure Event Grid

Azure Service Bus

Azure Event Hubs

Decision Matrix

Implementing CQRS: Separating Reads from Writes

Event Sourcing: When the Event Log Becomes the Truth

The Saga Pattern: Managing Distributed Transactions

Dead-Letter Handling: The Safety Net

Architecture Checklist

Need expert guidance?

Related articles

Modular Monolith: The Architecture Pattern Enterprises Should Consider Before Microservices

Event Sourcing and CQRS in .NET 9: When the Complexity Is Worth It

Microservices with .NET 9: Architecture Patterns for Enterprise Systems