Designing Enterprise MCP Servers: Security and Scope

The defining enterprise AI shift of 2026 is the move from pilots to production. With the General Availability of Microsoft Agent Framework 1.0 on 3 April 2026 and Azure AI Foundry maturing as the central platform for building and governing agents, the question is no longer can we build an agent but can we run one safely against our real systems. The Model Context Protocol (MCP) is where that question gets answered, because the MCP server is the boundary where an autonomous model meets your production data.

This post is about designing that boundary well. We have built and reviewed MCP servers for regulated European clients, and the failures we see are rarely about the protocol itself — they are about treating an MCP server as a developer toy rather than the security-critical infrastructure it actually is.

TL;DR / Key takeaways

An enterprise MCP server is a security boundary, not a convenience layer: authentication, authorization, and audit belong on the server, never in the prompt.
Prefer many small, domain-scoped servers over one monolith — least privilege and audit scope follow the server boundary.
The caller is a non-deterministic model, so validate every argument server-side and scope tokens to the acting human, not the agent.
High-impact tools (write, delete, pay, send) need allow-lists and human-in-the-loop approval, not model discretion.
MCP tool calls are exactly the traceable evidence NIS2 and the EU AI Act expect — design observability and logging in from day one.

What an MCP server actually is

The Model Context Protocol standardizes how an AI agent discovers and invokes capabilities. An MCP server exposes three primitives: tools (actions the model can call), resources (data the model can read), and prompts (reusable templates). Microsoft Agent Framework 1.0 ships MCP integration alongside the A2A protocol, so agents can both call your servers and delegate to one another.

What makes this enterprise-relevant — and risky — is that the caller is a language model. A REST API is invoked by deterministic code that a reviewer can read. An MCP tool is invoked because a model reasoned, in natural language, that it should be. That single difference reshapes every design decision below.

Scope: the most important design decision

Scope is where most MCP designs go wrong. The temptation is to build one powerful server that exposes everything, because it is convenient for the agent. Resist it.

We design MCP servers around bounded domains, each with its own identity, RBAC model, and blast radius. A monolith concentrates risk: a single prompt-injection success or a single over-broad token compromises everything. Domain-scoped servers keep least privilege enforceable and audits tractable.

Design dimension	Monolithic MCP server	Domain-scoped MCP servers
Blast radius	Entire system	One domain
Least privilege	Hard to enforce	Natural per server
Audit scope	Tangled	Per-domain, clean
Versioning / decommission	Risky, coupled	Independent
Identity model	One over-permissioned identity	One identity per server
Governance	Negotiation	Policy

Within a server, scope each tool to the narrowest useful operation. A tool called query_invoices(customer_id, date_range) is auditable and bounded; a tool called run_sql(query) is a liability that hands the model your database.

Authentication and authorization

Two rules govern security here, and both are non-negotiable.

Authorization lives on the server. The model can request anything — that is its job. The server decides what executes. Never encode access rules in the system prompt; a prompt is a suggestion, not a control. This mirrors the broader patterns we cover in agent-to-agent A2A protocol design, where trust between components must be explicit rather than assumed.

Tokens are scoped to the human, not the agent. When an agent acts on behalf of a user, the MCP server should receive a token that carries that user's effective permissions, not a broad service principal that can do everything for everyone. On Azure this means on-behalf-of token flows and Entra ID, so the finance agent acting for a junior analyst cannot read what only the CFO may see.

A practical checklist for each tool:

Loading diagram...

Authenticate the caller and resolve the acting human identity.
Authorize the specific tool against that human's effective permissions.
Validate every argument server-side — types, ranges, allow-lists, injection-safe.
Classify the tool's impact (read / write / irreversible / external).
Gate high-impact tools behind explicit allow-lists and, where warranted, human approval.
Log the request, the model's stated intent, the arguments, and the result.

Defending against the model itself

Prompt injection is the attack that has no equivalent in traditional APIs. A document the agent reads, an email it summarizes, or a record it queries can contain instructions that hijack the model into calling tools maliciously. You cannot solve this in the model. You contain it at the server.

The controls that hold up in practice:

Impact tiering. Read-only tools can be liberal. Write, delete, payment, and external-send tools require allow-lists, rate limits, and approval. The model never gets unmediated access to irreversible actions.
Argument validation as a security control. Treat every tool argument as hostile input. Reject anything outside a known schema. This is your strongest single defense against injected payloads.
Human-in-the-loop for the dangerous tier. A confirmation step on a payment or a mass-delete is not friction; it is the difference between a contained incident and a headline.
Output filtering. Strip secrets and PII from tool responses before they re-enter the model context, so a compromised prompt cannot exfiltrate what it should not see.

Observability and cost governance

You cannot govern what you cannot see. An MCP server in production needs end-to-end tracing that links the user request, the agent's reasoning, each tool call, its arguments, and its result. We instrument with OpenTelemetry and route into Azure AI Foundry and the client's SIEM — the same discipline we detail in AI agent observability and tracing.

Capture, per tool: latency, error rate, and cost. Agents can loop, retry, and fan out in ways that quietly multiply token spend, so cost-per-tool and cost-per-conversation belong on a dashboard with budget alerts. Equally important, alert on anomalous patterns — a sudden surge in write operations or an agent calling a tool it has never used is an early signal of misbehavior or compromise. For the broader runtime picture, see how this fits into Microsoft Agent Framework 1.0 architecture.

Governance and EU regulatory alignment

For European enterprises, MCP server design is not only an engineering matter — it is a compliance one.

NIS2. If your agents touch essential or important services, the MCP server falls within your risk-management measures (Risikomanagementmaßnahmen), and accountability sits with senior management (Geschäftsleitung). Tool-call logs are part of your evidence of control.
EU AI Act. Where the agent system is high-risk, the MCP server contributes to the logging, human-oversight, and technical-documentation obligations you must demonstrate in a conformity assessment (Konformitätsbewertung). Auditors expect traceable records (Nachweispflichten) — and MCP tool calls are exactly that, if you designed them in.
Supply chain (Lieferkette). Third-party MCP servers are dependencies. Vet them, pin versions, and review their tool surface as you would any vendor component.

A pragmatic governance baseline:

Maintain a registry of every MCP server, its owner, its tools, and their impact tier.
Require a review gate before any new write-capable tool ships.
Enforce retention on tool-call logs that meets your audit and incident-reporting obligations.
Run periodic access reviews on the scoped identities each server uses.

Where to start

If you are moving from pilot to production, start with one bounded domain. Build a single domain-scoped server, get authentication, server-side authorization, argument validation, and tracing right, then template that pattern across domains. The discipline you establish on the first server is what makes the tenth one safe.

FAQ

What is an MCP server in an enterprise context? An MCP (Model Context Protocol) server exposes tools, resources, and prompts to AI agents through a standardized interface. In an enterprise context it sits between an agent runtime and your internal systems — databases, APIs, ticketing, ERP — and becomes the controlled boundary where authentication, authorization, scoping, and audit logging are enforced. Treat it as production infrastructure, not a developer convenience.

How is MCP security different from securing a normal REST API? A REST API is called by deterministic code; an MCP server is called by a non-deterministic model that chooses tools based on natural-language reasoning. That means you cannot rely on the caller to behave predictably. You must enforce least-privilege per tool, validate every argument server-side, scope tokens to the acting user rather than the agent, and log the model's intent alongside the action. Prompt injection turns tool exposure into an attack surface that a normal API does not have.

Should each team build one big MCP server or many small ones? Prefer many small, domain-scoped servers over one monolith. A finance MCP server, an HR MCP server, and an observability MCP server each carry their own identity, RBAC, and blast radius. This keeps least-privilege enforceable, makes audits tractable, and lets you decommission or version a capability without touching unrelated tools. A monolithic server concentrates risk and makes governance a negotiation rather than a policy.

How do you stop an agent from calling tools it should not? Authorization happens on the server, never in the prompt. Each tool checks the caller's scoped token and the effective permissions of the human on whose behalf the agent acts. High-impact tools (write, delete, payment, external send) require explicit allow-lists and, where appropriate, human-in-the-loop approval. The model can request anything; the server decides what actually executes.

How does MCP server design map to EU regulation like NIS2 and the EU AI Act? An MCP server that touches important or essential services falls under your NIS2 risk-management measures and the accountability of senior management (Geschäftsleitung). If the agent system is high-risk under the EU AI Act, the server is part of the logging, human-oversight, and technical-documentation evidence you must produce in a conformity assessment. Tool calls are exactly the kind of traceable record auditors expect, so design audit logging in from day one.

What does observability look like for MCP servers in production? You need end-to-end tracing that links the user request, the agent's reasoning, each tool invocation, the arguments passed, and the result — typically through OpenTelemetry into Azure AI Foundry or your existing SIEM. Capture latency, error rates, and cost per tool, and alert on anomalous call patterns such as a sudden spike in write operations. Without this, a misbehaving agent is invisible until it causes damage.

CC Conceptualise designs secure, governed MCP servers and agent platforms for European enterprises moving from pilot to production. If you are putting agents in front of your real systems, see our AI & Data Platform Engineering services or reach us at mbrahim@conceptualise.de.

Designing Enterprise MCP Servers: Security and Scope

What an MCP server actually is

Scope: the most important design decision

Authentication and authorization

Defending against the model itself

Observability and cost governance

Governance and EU regulatory alignment

Where to start

FAQ

Frequently Asked Questions

Need expert guidance?

Related articles

A2A Protocol: Patterns for Multi-Agent Systems

AI Agent Cost Governance: Control Token Spend at Scale

AI Agent Evaluation: Building a Testing Harness