Azure OpenAI Audit Logging for the EU AI Act

European enterprises are racing toward a date most boardrooms still treat as abstract. From 2 August 2026, high-risk AI systems under Annex III of the EU AI Act must satisfy a concrete set of obligations — conformity assessment, registration, technical documentation, human oversight, and post-market monitoring. Buried inside that list is a requirement that quietly determines whether the rest is even provable: automatic logging. If you run Azure OpenAI in any system that touches employment, credit, critical infrastructure, or other Annex III domains, your audit log is the spine of your compliance story. Get it wrong and conformity assessment becomes guesswork; get it right and most other obligations become reportable by-products of good engineering.

TL;DR / Key takeaways

Article 12 of the EU AI Act requires automatic event logging over the lifetime of high-risk systems — for Azure OpenAI deployers this is an architecture requirement, not a checkbox.
High-risk Annex III obligations apply from 2 August 2026; your logging design needs to be in place and tested before then, not after.
Azure provides the raw telemetry (diagnostic settings, Log Analytics, content-filter annotations, Purview), but the semantic decision log that ties a model call to a business decision and a human reviewer is something you build.
The legally meaningful log is a traceable decision record, not high-volume token telemetry — separate the two to control cost and retention.
ISO/IEC 42001 gives you the management-system scaffolding to make AI Act logging auditable; map one evidence set onto both.

Why logging is the load-bearing obligation

Most discussion of the AI Act fixates on fines — up to EUR 35M or 7% of global turnover for prohibited practices, up to EUR 15M or 3% for high-risk non-compliance. Those numbers get attention. But the mechanism that actually exposes you to them is traceability. A market surveillance authority does not fine you for a model being wrong; it acts when you cannot demonstrate that you operated the system as documented, with human oversight and appropriate risk controls.

Article 12 is explicit: high-risk AI systems must technically allow the automatic recording of events (logs) throughout their lifetime, to a degree appropriate to the intended purpose. The recitals tie this directly to traceability and post-market monitoring. In other words, the log is the evidence that everything else in your conformity file is true. We have delivered governance layers for Azure OpenAI deployments where the customer had excellent policies on paper and no way to prove, for a specific contested output six weeks earlier, which model version produced it or whether a human reviewed it. That gap is the single most common finding we encounter.

If you have not yet confirmed whether your system is in scope, start with classification — our Annex III high-risk classification guide walks through the decision logic. The logging burden scales directly with that answer.

What the log actually has to contain

There is a persistent misconception that "we send everything to Log Analytics" satisfies the AI Act. It does not. Operational telemetry — latency, token counts, error rates — is useful for running the system but largely irrelevant to a regulator. What the Act cares about is whether you can reconstruct a specific use of the system after the fact.

The minimum semantically meaningful record for a high-risk Azure OpenAI interaction looks like this:

Field	Why it matters	Source
Correlation / decision ID	Ties the model call to a business transaction	Your orchestration layer
Timestamp and period of use	Required for traceability under Art. 12	Application + Azure Monitor
Prompt / input reference	Reconstructs what the system was asked	Application (hash or reference, not always raw PII)
Model and version	Output reproducibility and accountability	Azure OpenAI deployment metadata
Output returned	The actual decision or generation	Application
Content-filter / safety verdict	Evidence of risk controls operating	Azure OpenAI content-filter annotations
Human reviewer / override	Proves human oversight occurred	Application workflow
Retention class	Drives lifecycle and legal hold	Logging policy

Notice how little of this comes for free from the platform. Azure reliably gives you the model deployment metadata, content-filter annotations, and request-level diagnostics. The correlation ID, the link to a business decision, and the human-oversight action are produced by your code. This is the part teams systematically underestimate, and it is why "turn on diagnostic settings" is necessary but nowhere near sufficient.

A second subtlety: prompts frequently contain personal data, which means your audit log is itself a GDPR processing activity. You cannot satisfy the AI Act by hoarding raw prompts forever. The mature pattern is to log a reference or hash to the input plus a controlled, access-restricted copy of the legally required fields, with retention and minimisation applied per category.

A reference architecture on Azure

The architecture we deploy separates three concerns deliberately: platform telemetry, the decision trail, and governance.

1. Platform telemetry (Azure-native)

Enable diagnostic settings on every Azure OpenAI resource and route logs and metrics to a dedicated Log Analytics workspace. This captures request metadata, throttling, and the content-filter annotations that prove your safety controls were active. This layer is cheap to turn on and should be considered table stakes.

2. The decision trail (you build this)

Your orchestration code — whether a custom service, an AI gateway, or a framework like Semantic Kernel — emits a structured, append-only log event per high-risk interaction containing the fields in the table above. Route it to an immutable store (Log Analytics with immutability, or an append-only blob/table with WORM where retention is strict). The decision trail is lower volume than telemetry and carries the legal weight, so it justifies stronger guarantees.

3. Governance and review (Purview + monitoring)

Use Microsoft Purview for data classification, lineage, and access governance over the log stores, and build Azure Monitor dashboards and alerts for drift, anomalous filter events, and oversight gaps. Crucially, the logs must feed a real post-market monitoring loop — a documented process where someone reviews them. An archive nobody reads is not monitoring.

Architecture at a glance

Loading diagram...

Retention: keep what the law needs, not everything forever

The AI Act requires logs to be retained for a period appropriate to the intended purpose and at least six months unless other Union or national law requires longer. For most enterprises the binding constraint is not the AI Act at all but sector law. A bank running an Annex III system also lives under DORA; a health provider under sector retention rules. Those usually exceed six months.

The engineering implication is to define retention per log category, not globally:

Decision trail — retain for the longest applicable legal period, immutable, access-controlled.
Content-filter / safety events — retain alongside the decision trail; they are part of the risk-control evidence.
Operational telemetry — short retention (often 30–90 days), because it is for engineering, not auditors.
Personal data in inputs — minimise aggressively; reference rather than store raw where feasible.

Conflating these is how organisations end up paying to store terabytes of useless telemetry for years while still failing an audit because the one field that mattered — the human reviewer — was never captured.

A pragmatic readiness checklist

For teams with a high-risk Azure OpenAI system live or planned before August 2026, work this sequence:

Classify the system against Annex III and document the reasoning.
Define the decision record schema before writing any logging code.
Enable Azure diagnostic settings and confirm content-filter annotations are flowing.
Instrument the orchestration layer with a stable correlation ID linking model call, business transaction, and human oversight.
Apply per-category retention and immutability; wire Purview governance.
Stand up dashboards and a documented review process so logs feed post-market monitoring.
Map the evidence onto ISO/IEC 42001 controls so one dataset serves regulator and certification auditor.

This dovetails with the broader timeline in our August 2026 readiness checklist, and the log itself becomes a primary input to your conformity assessment.

ISO/IEC 42001 as the connective tissue

The AI Act tells you what to prove; ISO/IEC 42001 helps you operate the proving. As the recognised AI management-system standard, it provides roles, risk-treatment processes, monitoring requirements, and evidence handling. When we build governance for Azure OpenAI, we map AI Act logging duties directly onto 42001 controls so the same structured decision trail satisfies both the market surveillance authority under the Act and a certification auditor under the standard. The payoff is real: you build the logging once and amortise it across regulatory, certification, and internal-assurance needs rather than rebuilding evidence three times.

The bottom line

Audit logging is not the glamorous part of EU AI Act readiness, but it is the part that determines whether everything else holds up under scrutiny. Azure gives you a strong foundation; the decision trail that makes a high-risk system genuinely traceable is engineering you own. With the August 2026 deadline now closer than most programmes assume, the time to design — and test — that logging layer is now.

If you are standing up or hardening Azure OpenAI for a high-risk use case and want a logging and governance architecture that survives an audit, our AI and data platform engineering team has delivered exactly this for European enterprises. We are happy to review your current design.

FAQ

Does the EU AI Act actually require logging for Azure OpenAI systems?

The Act does not name Azure OpenAI, but it requires automatic recording of events (logs) over the lifetime of high-risk AI systems under Article 12, plus traceability and post-market monitoring. If your application is classified as high-risk under Annex III, the underlying Azure OpenAI calls fall inside that obligation. General-purpose model providers also face transparency and documentation duties, but the operational logging burden sits primarily with you as the deployer.

What exactly has to be in an AI Act audit log?

At minimum you need to reconstruct what the system did and why: the period of each use, the input reference or prompt, the model and version invoked, the output returned, the human who reviewed or overrode it, and any safety or content-filter events. The standard is that a competent authority or your own auditors can trace a specific decision after the fact. Personal data in prompts must be handled under GDPR, so logs need retention limits and access controls.

When do these logging obligations actually start to apply?

High-risk obligations under Annex III, including the Article 12 logging and record-keeping duties, apply from 2 August 2026. GPAI obligations have applied since 2 August 2025, and GPAI models placed on the market before that date must comply by 2 August 2027. If you are deploying a high-risk Azure OpenAI application, you should treat August 2026 as a hard deadline for your logging architecture.

Can Azure-native services cover the requirement, or do I need third-party tooling?

Azure gives you most of the raw material: diagnostic settings, Azure Monitor, Log Analytics, content-filter annotations, and Microsoft Purview for data governance. What Azure does not give you out of the box is the application-level semantic log that ties a business decision to a specific model interaction and human reviewer. That layer you build yourself, usually as structured logging emitted by your orchestration code into Log Analytics or an immutable store.

How long do we have to keep AI Act logs?

The Act requires logs to be kept for a period appropriate to the intended purpose, and at least six months unless other law specifies otherwise. In regulated sectors such as finance under DORA or health, longer sector retention usually dominates. We recommend defining retention per log category, separating high-volume telemetry from the legally required decision trail so you are not paying to store everything for years.

How does ISO/IEC 42001 relate to AI Act logging?

ISO/IEC 42001 is the AI management-system standard. It does not replace the AI Act, but implementing it gives you the governance scaffolding — roles, risk controls, monitoring, and evidence — that makes Article 12 logging defensible. We typically map AI Act logging requirements onto ISO/IEC 42001 controls so a single evidence set serves both the regulator and the certification auditor.