Incident Response Playbook for Azure Environments: From Detection to Recovery
A comprehensive six-phase incident response playbook for Azure environments with Sentinel detection rules, containment runbooks, and recovery procedures.
A security incident in a cloud environment moves faster than on-premises. An attacker who compromises a service principal can traverse your entire Azure estate in minutes. Your incident response playbook must account for this speed, leverage cloud-native tooling, and provide clear, executable runbooks that your team can follow under pressure.
This playbook follows the NIST six-phase IR model, adapted for Azure-specific tooling and cloud-native attack patterns.
Phase 1: Preparation
Preparation is the only phase you control before an incident occurs. Every hour invested here saves ten during an active incident.
Tooling Baseline
Ensure these services are deployed and configured before you need them:
Detection and alerting:
- Microsoft Sentinel with connectors for Entra ID, Azure Activity, Defender for Cloud, Microsoft 365, and all critical application logs
- Defender for Cloud with all Defender plans enabled (Servers, Containers, SQL, Storage, Key Vault, DNS, Resource Manager)
- Azure Monitor alerts for resource health and availability
Investigation:
- Log Analytics workspace with 90-day hot retention and 365-day archive
- Sentinel Notebooks configured with pre-built investigation templates
- Network Watcher with NSG flow logs, packet capture enabled in key regions
Response automation:
- Sentinel Playbooks (Logic Apps) for common containment actions
- Azure Automation runbooks for infrastructure-level responses
- Pre-approved change tickets for emergency containment actions
Roles and Escalation Matrix
Define these roles before an incident:
| Role | Responsibility | On-call coverage |
|---|---|---|
| IR Lead | Coordinates response, makes containment decisions | 24/7 rotation |
| Security Analyst | Investigates alerts, performs forensic analysis | 24/7 rotation |
| Platform Engineer | Executes infrastructure containment and recovery | Business hours + on-call |
| Communications Lead | Manages stakeholder and regulatory notifications | Business hours + on-call |
| Legal/Compliance | Advises on regulatory obligations, evidence preservation | On-call |
Communication Templates
Prepare templates before you need them:
Internal escalation template:
SECURITY INCIDENT — [SEVERITY]
Time detected: [TIMESTAMP]
Affected systems: [SYSTEMS]
Current impact: [DESCRIPTION]
Containment status: [IN PROGRESS / CONTAINED / NOT CONTAINED]
Next update: [TIME]
IR Lead: [NAME / CONTACT]Regulatory notification template (for DORA, NIS2, or GDPR):
INITIAL NOTIFICATION — ICT-RELATED INCIDENT
Entity: [LEGAL ENTITY NAME]
Competent Authority: [AUTHORITY]
Date/time of detection: [TIMESTAMP]
Date/time of classification: [TIMESTAMP]
Description: [BRIEF DESCRIPTION]
Affected services: [SERVICES]
Estimated client impact: [NUMBER / DESCRIPTION]
Actions taken: [SUMMARY]Pre-Approved Containment Actions
Get these approved by management in advance so your team can execute without waiting for authorization during a live incident:
- Disable a compromised user account
- Revoke all refresh tokens for a user or service principal
- Isolate a VM by applying a deny-all NSG
- Disable a compromised service principal
- Block an IP address in Azure Firewall
- Rotate secrets in Key Vault
- Disable external access to a storage account
Document the approval in your IR policy with the conditions under which each action can be taken without further authorization.
Phase 2: Identification
Identification is about confirming that a security event is a genuine incident, determining its scope, and classifying its severity.
Detection Sources in Azure
Prioritise these alert sources by fidelity:
High fidelity (investigate immediately):
- Defender for Cloud high-severity alerts
- Sentinel fusion alerts (multi-stage attack detection)
- Sentinel analytics rules matching known attack patterns
- Defender for Identity alerts (lateral movement, credential theft)
Medium fidelity (investigate within SLA):
- Defender for Cloud medium-severity alerts
- Custom Sentinel analytics rules
- Azure AD Identity Protection risky sign-in detections (medium+ risk)
Lower fidelity (triage and correlate):
- Defender for Cloud informational/low alerts
- Azure Activity Log anomalies
- Custom log queries surfacing unusual patterns
Initial Triage Runbook
When an alert fires, the on-call analyst should follow this sequence:
-
Read the alert details. Open the Sentinel incident. Review the entities (users, IPs, hosts, resources), the timeline, and the mapped MITRE ATT&CK techniques.
-
Check for related incidents. Use Sentinel's investigation graph to identify connected entities. A single compromised user may have triggered alerts across multiple resources.
-
Assess blast radius. Query Log Analytics for all activity by the affected entities in the last 24 hours:
union SigninLogs, AuditLogs, AzureActivity
| where TimeGenerated > ago(24h)
| where Identity == "compromised-user@domain.com"
or CallerIpAddress == "suspicious-ip"
| project TimeGenerated, OperationName, Identity, CallerIpAddress, ResourceGroup, Result
| sort by TimeGenerated asc- Classify severity. Use your organisation's severity matrix. A reasonable starting point:
| Severity | Criteria |
|---|---|
| Critical | Active data exfiltration, ransomware execution, compromise of admin accounts |
| High | Confirmed compromise of user accounts, lateral movement detected, critical system affected |
| Medium | Suspicious activity confirmed malicious but contained to non-critical systems |
| Low | Policy violation, minor misconfiguration exploited, no data impact |
- Declare incident. If severity is Medium or above, formally declare an incident and begin Phase 3.
Phase 3: Containment
Containment must be fast and decisive. In cloud environments, containment is primarily achieved through identity controls and network segmentation.
Short-Term Containment Actions
Execute these immediately to stop the bleeding:
Compromised user account:
# Disable the account
Set-AzureADUser -ObjectId "user@domain.com" -AccountEnabled $false
# Revoke all sessions
Revoke-AzureADUserAllRefreshToken -ObjectId "user@domain.com"
# Reset password (generates new credential)
Set-AzureADUserPassword -ObjectId "user@domain.com" `
-Password (ConvertTo-SecureString "TempP@ss$(Get-Random)" -AsPlainText -Force) `
-ForceChangePasswordNextLogin $trueCompromised service principal:
# Remove all credentials from the service principal
Get-AzADServicePrincipal -DisplayName "compromised-sp" |
Get-AzADSpCredential | ForEach-Object {
Remove-AzADSpCredential -ObjectId $_.ObjectId -KeyId $_.KeyId
}
# Disable the application
Set-AzureADApplication -ObjectId "app-object-id" -AccountEnabled $falseCompromised VM:
# Apply deny-all NSG to isolate the VM (preserves evidence)
$nsg = New-AzNetworkSecurityGroup -Name "IR-Quarantine-NSG" `
-ResourceGroupName "IR-RG" -Location "westeurope" `
-SecurityRules @(
New-AzNetworkSecurityRuleConfig -Name "DenyAllInbound" -Priority 100 `
-Direction Inbound -Access Deny -Protocol * `
-SourceAddressPrefix * -SourcePortRange * `
-DestinationAddressPrefix * -DestinationPortRange *
New-AzNetworkSecurityRuleConfig -Name "DenyAllOutbound" -Priority 100 `
-Direction Outbound -Access Deny -Protocol * `
-SourceAddressPrefix * -SourcePortRange * `
-DestinationAddressPrefix * -DestinationPortRange *
)
# Associate with the VM's NIC
$nic = Get-AzNetworkInterface -Name "compromised-vm-nic"
$nic.NetworkSecurityGroup = $nsg
Set-AzNetworkInterface -NetworkInterface $nicEvidence Preservation
Before eradication, preserve evidence:
- Snapshot VM disks before any remediation. Create snapshots of OS and data disks and store them in a separate, locked-down resource group.
- Export Sentinel incident with all related events, entities, and timeline.
- Capture NSG flow logs for the timeframe of interest.
- Export Azure Activity Logs for the affected subscriptions.
- Preserve Key Vault audit logs if secrets may have been accessed.
Use an immutable storage account (with legal hold) for evidence storage. This ensures chain of custody for potential legal proceedings.
Phase 4: Eradication
Eradication removes the attacker's foothold from your environment.
Common Eradication Actions
- Rotate all credentials that the compromised identity had access to. This includes Key Vault secrets, storage account keys, SQL passwords, and service principal secrets.
- Remove persistence mechanisms. Check for: newly created service principals, modified Conditional Access policies, new OAuth app consents, added federation trusts, modified Automation runbooks, new Azure Function deployments with suspicious code.
- Patch the vulnerability that enabled initial access. If it was a phishing attack, review email security rules. If it was an exposed service, close the exposure.
- Review Entra ID audit logs for any configuration changes made by the compromised identity.
Eradication Verification
Run these queries to confirm eradication:
// Check for any new service principals created during incident window
AuditLogs
| where TimeGenerated between (datetime("2026-04-10") .. datetime("2026-04-15"))
| where OperationName == "Add service principal"
| project TimeGenerated, InitiatedBy, TargetResources
// Check for OAuth consent grants during incident window
AuditLogs
| where TimeGenerated between (datetime("2026-04-10") .. datetime("2026-04-15"))
| where OperationName has "Consent to application"
| project TimeGenerated, InitiatedBy, TargetResourcesPhase 5: Recovery
Recovery restores normal operations. Move deliberately — rushing recovery often reintroduces the threat.
Recovery Checklist
- Re-enable accounts with fresh credentials and enhanced monitoring
- Restore services from known-good backups or redeploy from infrastructure-as-code
- Validate security controls — confirm Defender alerts are firing, Sentinel rules are active, Conditional Access policies are enforced
- Implement enhanced monitoring for the next 30 days: lower alert thresholds, add custom analytics rules for the specific TTPs observed
- Communicate resolution to stakeholders using your pre-built template
- Submit regulatory reports if applicable (DORA intermediate and final reports, GDPR breach notification, NIS2 notification)
Post-Recovery Monitoring
Create temporary Sentinel analytics rules that watch for the specific indicators of compromise observed during the incident. Run these for at least 30 days. If they trigger, escalate immediately — the attacker may have maintained access through an undiscovered persistence mechanism.
Phase 6: Lessons Learned
Conduct a post-incident review within 5 business days of recovery. This is not optional — it is where your security programme improves.
Post-Incident Review Template
Timeline reconstruction: Build a minute-by-minute timeline from detection to recovery. Identify delays and their causes.
Root cause analysis: What was the initial access vector? What made the attack possible? What detection gaps existed?
What worked well: Which detection rules fired? Which containment actions were effective? Where did automation help?
What needs improvement: Where were delays? What manual steps should be automated? What detection gaps existed?
Action items with owners and deadlines:
- Action item 1 — Owner — Deadline
- Action item 2 — Owner — Deadline
- Action item 3 — Owner — Deadline
Metrics:
- Time to detect (TTD): How long from initial compromise to first alert?
- Time to contain (TTC): How long from first alert to containment?
- Time to eradicate (TTE): How long from containment to eradication?
- Time to recover (TTR): How long from eradication to normal operations?
Track these metrics over time. Improvement in these numbers is the best measure of your IR programme maturity.
Sentinel Automation: Putting It Together
Build Sentinel Playbooks that automate the high-confidence containment actions. A recommended starting set:
| Playbook | Trigger | Action |
|---|---|---|
| Isolate-VM | High-severity Defender for Servers alert | Apply quarantine NSG, snapshot disks, notify IR team |
| Disable-User | Confirmed compromised user identity | Disable account, revoke tokens, notify IR team |
| Block-IP | Threat intelligence match on firewall logs | Add IP to Azure Firewall deny list, create Sentinel incident |
| Enrich-Incident | Any new Sentinel incident | Add geo-IP data, user risk score, device compliance state |
| Notify-IR-Team | Critical or high severity incident created | Send Teams message, email, and create PagerDuty incident |
Each playbook should be tested monthly as part of your IR readiness exercises.
Conclusion
An incident response playbook that lives in a document management system and never gets tested is worse than useless — it gives false confidence. Build your playbook into your tooling. Automate what can be automated. Test what cannot. And review relentlessly.
If you need help building or stress-testing your Azure incident response capability, contact us at mbrahim@conceptualise.de. We bring real-world IR experience to Azure environments and help teams prepare for incidents before they happen.
Topics