Building a Modern SOC with Microsoft Sentinel: Architecture and Playbooks

Building a Security Operations Centre (SOC) used to mean seven-figure investments in on-premises hardware, a team of 20, and months of integration work. Microsoft Sentinel changes that equation dramatically — but only if you architect it correctly. A poorly designed Sentinel deployment leads to runaway costs, alert fatigue, and a false sense of security. This guide covers how to get it right.

Architecture Decisions That Matter

Workspace Design

Your Log Analytics workspace topology is the most consequential architectural decision. Get it wrong and you will spend months restructuring.

Recommended approach for most enterprises:

Single workspace for all security data — Sentinel works best with correlated data in one place
Use Azure Lighthouse or multi-workspace queries if you must support multiple tenants (MSSP scenarios)
Separate operational workspaces (for non-security IT operations) from the Sentinel workspace to control cost and access
Implement workspace-level RBAC and table-level RBAC for data access segregation

Avoid the common mistake of creating one workspace per subscription or per team. This fragments your security data and makes cross-correlation nearly impossible.

Data Connector Strategy

Not all data is equally valuable. The key is to ingest what matters for detection and investigation while controlling cost.

Tier 1 — Essential (enable immediately):

Microsoft Entra ID sign-in and audit logs
Microsoft Defender XDR incidents and raw alerts
Microsoft Defender for Cloud security alerts
Azure Activity logs
Office 365 audit logs (Exchange, SharePoint, Teams)

Tier 2 — High value (enable within 30 days):

Microsoft Defender for Endpoint raw events (DeviceProcessEvents, DeviceNetworkEvents)
Azure Firewall and NSG Flow Logs (for network threat detection)
DNS logs (critical for C2 detection)
Azure Key Vault audit logs

Tier 3 — Contextual (enable as detection matures):

Syslog/CEF from on-premises firewalls and network devices
Threat Intelligence connectors (TAXII feeds, Microsoft TI)
AWS CloudTrail or GCP audit logs (for multi-cloud environments)
Custom application logs via the Log Analytics agent or Data Collection Rules

Cost principle: Ingest data that you will actively use for detection rules or investigation. If you are ingesting data with no analytics rules or hunting queries referencing it, you are paying for storage, not security.

Building Effective Analytics Rules

Analytics rules are the heart of Sentinel. They transform raw log data into actionable alerts. The difference between a useful SOC and an overwhelmed one comes down to rule quality.

Rule Types and When to Use Them

Scheduled rules: Run KQL queries on a defined interval. Use for custom detections specific to your environment.
Microsoft incident creation rules: Automatically create Sentinel incidents from Defender XDR, Defender for Cloud, or other Microsoft security products. Use as your baseline — these leverage Microsoft's detection engineering.
Fusion rules: ML-based multi-stage attack detection. Enable the built-in Fusion rule — it correlates signals across data sources automatically.
Near-real-time (NRT) rules: Run every minute for critical detections that cannot wait for scheduled intervals.

Writing KQL That Works

Good detection rules are specific, tested, and tuned. Here are patterns that work in production:

Impossible travel detection (custom, more flexible than built-in):

Kql

SigninLogs
| where ResultType == 0
| summarize Locations = make_set(Location), 
            IPs = make_set(IPAddress),
            MinTime = min(TimeGenerated), 
            MaxTime = max(TimeGenerated) 
  by UserPrincipalName, bin(TimeGenerated, 1h)
| where array_length(Locations) > 1
| extend TimeDiffMinutes = datetime_diff('minute', MaxTime, MinTime)
| where TimeDiffMinutes < 60

Anomalous process execution on servers:

Kql

DeviceProcessEvents
| where Timestamp > ago(1h)
| where DeviceName has_any ("srv", "server", "dc")
| where FileName !in~ ("svchost.exe", "services.exe", "lsass.exe", "csrss.exe")
| summarize ExecutionCount = count() by FileName, DeviceName
| join kind=leftanti (
    DeviceProcessEvents
    | where Timestamp between (ago(30d) .. ago(1d))
    | summarize by FileName, DeviceName
) on FileName, DeviceName
| where ExecutionCount < 5

Sensitive Azure role assignments:

Kql

AzureActivity
| where OperationNameValue == "Microsoft.Authorization/roleAssignments/write"
| where ActivityStatusValue == "Success"
| extend RoleDefinitionId = tostring(parse_json(Properties).requestbody)
| where RoleDefinitionId has_any ("Owner", "Contributor", "User Access Administrator")
| project TimeGenerated, Caller, ResourceGroup, RoleDefinitionId

Tuning: The Ongoing Discipline

Every analytics rule should have:

A documented threshold that was tuned against at least two weeks of baseline data
Entity mapping (account, host, IP) so incidents can be enriched and correlated
A severity that reflects business impact, not technical interestingness
Suppression configured to prevent duplicate incidents for the same event

Review your analytics rules monthly. If a rule generates more than 50 alerts per week and fewer than 5% result in true positive investigations, it needs tuning or removal.

SOAR Automation: Playbooks That Save Hours

Automation is what separates a sustainable SOC from a burnout factory. Sentinel integrates with Azure Logic Apps for playbook automation.

High-Impact Playbooks to Implement First

1. Automated enrichment on incident creation:

Look up IP reputation via VirusTotal or AbuseIPDB
Query Microsoft Graph for user details (department, manager, recent sign-ins)
Add enrichment as comments to the incident
ROI: Saves 5–10 minutes per incident on manual lookups

2. Automated response to confirmed compromised account:

Disable the user account in Entra ID
Revoke all refresh tokens
Block the user's IP via Conditional Access named location
Notify the user's manager via email
Create a ServiceNow ticket
ROI: Reduces response time from 30+ minutes to under 60 seconds

3. Automated triage for low-severity alerts:

Check if the alert entity (IP, user, host) has been seen in previous closed-as-benign incidents
If yes, auto-close with a comment referencing the prior investigation
If no, escalate to analyst queue
ROI: Reduces alert volume by 20–40% for mature environments

4. Threat intelligence matching:

When a new IOC is received from TI feed, retroactively search Sentinel logs
If matches are found, create an incident with full context
ROI: Catches threats that entered the environment before the IOC was published

Cost Optimisation

Sentinel costs are driven by data ingestion volume. Here is how to control them without sacrificing detection capability:

Use Basic Logs for high-volume, low-security-value tables (e.g., NetFlow data, verbose application logs). Basic Logs cost significantly less but support limited KQL and no analytics rules.
Configure Data Collection Rules (DCRs) to filter data at ingestion — drop fields you never query, transform verbose logs into concise records
Set retention policies per table: 90 days interactive for security tables, 30 days for operational tables, archive to cold storage for compliance requirements
Use the Commitment Tier pricing if your daily ingestion is predictable — 100 GB/day commitment tier saves ~30% vs. pay-as-you-go
Monitor ingestion via the Usage table: Usage | summarize sum(Quantity) by DataType | sort by sum_Quantity desc — know exactly which data sources drive your costs

Rule of thumb: Your Sentinel cost should be proportional to the security value you extract. If you are spending 60% of your budget ingesting firewall traffic logs but only 10% of your detection rules reference them, restructure.

Combating Alert Fatigue

Alert fatigue is the number one SOC killer. When analysts are overwhelmed, they start ignoring alerts — including real threats.

Strategies that work:

Incident grouping: Configure analytics rules to group related alerts into a single incident (by entity, by time window)
Automation rules: Auto-close known false positives, auto-assign incidents based on MITRE ATT&CK tactic, auto-set severity based on asset criticality
Watchlists: Maintain exception lists (known scanner IPs, service accounts, maintenance windows) and reference them in your KQL queries to exclude noise at the detection layer
Severity discipline: Reserve "High" severity for alerts that require human investigation within 4 hours. If everything is high, nothing is.

Measuring SOC Effectiveness

Track these metrics to understand whether your Sentinel deployment is delivering value:

Mean time to detect (MTTD): Time from threat occurrence to alert creation
Mean time to respond (MTTR): Time from alert creation to containment action
True positive rate: Percentage of incidents that result in genuine security action
Automation rate: Percentage of incidents handled entirely by playbooks
Coverage: Percentage of MITRE ATT&CK techniques covered by your analytics rules

Final Thought

Microsoft Sentinel gives you enterprise-grade SIEM/SOAR without the infrastructure overhead of traditional solutions. But the technology is only as good as the architecture, detection logic, and operational processes you build around it. Start with high-value data sources, write precise analytics rules, automate aggressively, and treat alert tuning as a continuous discipline — not a one-time setup task.

Ready to build or optimise your Sentinel deployment? Contact our team — we design and operate SOC environments for enterprises that need real security outcomes, not just dashboards.