Skip to main content
All posts
Cybersecurity10 min read

Incident Response Playbook for Azure Environments: From Detection to Recovery

A comprehensive six-phase incident response playbook for Azure environments with Sentinel detection rules, containment runbooks, and recovery procedures.

Published

A security incident in a cloud environment moves faster than on-premises. An attacker who compromises a service principal can traverse your entire Azure estate in minutes. Your incident response playbook must account for this speed, leverage cloud-native tooling, and provide clear, executable runbooks that your team can follow under pressure.

This playbook follows the NIST six-phase IR model, adapted for Azure-specific tooling and cloud-native attack patterns.

Loading diagram...

Phase 1: Preparation

Preparation is the only phase you control before an incident occurs. Every hour invested here saves ten during an active incident.

Tooling Baseline

Ensure these services are deployed and configured before you need them:

Detection and alerting:

  • Microsoft Sentinel with connectors for Entra ID, Azure Activity, Defender for Cloud, Microsoft 365, and all critical application logs
  • Defender for Cloud with all Defender plans enabled (Servers, Containers, SQL, Storage, Key Vault, DNS, Resource Manager)
  • Azure Monitor alerts for resource health and availability

Investigation:

  • Log Analytics workspace with 90-day hot retention and 365-day archive
  • Sentinel Notebooks configured with pre-built investigation templates
  • Network Watcher with NSG flow logs, packet capture enabled in key regions

Response automation:

  • Sentinel Playbooks (Logic Apps) for common containment actions
  • Azure Automation runbooks for infrastructure-level responses
  • Pre-approved change tickets for emergency containment actions

Roles and Escalation Matrix

Define these roles before an incident:

RoleResponsibilityOn-call coverage
IR LeadCoordinates response, makes containment decisions24/7 rotation
Security AnalystInvestigates alerts, performs forensic analysis24/7 rotation
Platform EngineerExecutes infrastructure containment and recoveryBusiness hours + on-call
Communications LeadManages stakeholder and regulatory notificationsBusiness hours + on-call
Legal/ComplianceAdvises on regulatory obligations, evidence preservationOn-call

Communication Templates

Prepare templates before you need them:

Internal escalation template:

Code
SECURITY INCIDENT — [SEVERITY]
Time detected: [TIMESTAMP]
Affected systems: [SYSTEMS]
Current impact: [DESCRIPTION]
Containment status: [IN PROGRESS / CONTAINED / NOT CONTAINED]
Next update: [TIME]
IR Lead: [NAME / CONTACT]

Regulatory notification template (for DORA, NIS2, or GDPR):

Code
INITIAL NOTIFICATION — ICT-RELATED INCIDENT
Entity: [LEGAL ENTITY NAME]
Competent Authority: [AUTHORITY]
Date/time of detection: [TIMESTAMP]
Date/time of classification: [TIMESTAMP]
Description: [BRIEF DESCRIPTION]
Affected services: [SERVICES]
Estimated client impact: [NUMBER / DESCRIPTION]
Actions taken: [SUMMARY]

Pre-Approved Containment Actions

Get these approved by management in advance so your team can execute without waiting for authorization during a live incident:

  • Disable a compromised user account
  • Revoke all refresh tokens for a user or service principal
  • Isolate a VM by applying a deny-all NSG
  • Disable a compromised service principal
  • Block an IP address in Azure Firewall
  • Rotate secrets in Key Vault
  • Disable external access to a storage account

Document the approval in your IR policy with the conditions under which each action can be taken without further authorization.

Phase 2: Identification

Identification is about confirming that a security event is a genuine incident, determining its scope, and classifying its severity.

Detection Sources in Azure

Prioritise these alert sources by fidelity:

High fidelity (investigate immediately):

  • Defender for Cloud high-severity alerts
  • Sentinel fusion alerts (multi-stage attack detection)
  • Sentinel analytics rules matching known attack patterns
  • Defender for Identity alerts (lateral movement, credential theft)

Medium fidelity (investigate within SLA):

  • Defender for Cloud medium-severity alerts
  • Custom Sentinel analytics rules
  • Azure AD Identity Protection risky sign-in detections (medium+ risk)

Lower fidelity (triage and correlate):

  • Defender for Cloud informational/low alerts
  • Azure Activity Log anomalies
  • Custom log queries surfacing unusual patterns

Initial Triage Runbook

When an alert fires, the on-call analyst should follow this sequence:

  1. Read the alert details. Open the Sentinel incident. Review the entities (users, IPs, hosts, resources), the timeline, and the mapped MITRE ATT&CK techniques.

  2. Check for related incidents. Use Sentinel's investigation graph to identify connected entities. A single compromised user may have triggered alerts across multiple resources.

  3. Assess blast radius. Query Log Analytics for all activity by the affected entities in the last 24 hours:

Kql
union SigninLogs, AuditLogs, AzureActivity
| where TimeGenerated > ago(24h)
| where Identity == "compromised-user@domain.com"
    or CallerIpAddress == "suspicious-ip"
| project TimeGenerated, OperationName, Identity, CallerIpAddress, ResourceGroup, Result
| sort by TimeGenerated asc
  1. Classify severity. Use your organisation's severity matrix. A reasonable starting point:
SeverityCriteria
CriticalActive data exfiltration, ransomware execution, compromise of admin accounts
HighConfirmed compromise of user accounts, lateral movement detected, critical system affected
MediumSuspicious activity confirmed malicious but contained to non-critical systems
LowPolicy violation, minor misconfiguration exploited, no data impact
  1. Declare incident. If severity is Medium or above, formally declare an incident and begin Phase 3.

Phase 3: Containment

Containment must be fast and decisive. In cloud environments, containment is primarily achieved through identity controls and network segmentation.

Short-Term Containment Actions

Loading diagram...

Execute these immediately to stop the bleeding:

Compromised user account:

Powershell
# Disable the account
Set-AzureADUser -ObjectId "user@domain.com" -AccountEnabled $false

# Revoke all sessions
Revoke-AzureADUserAllRefreshToken -ObjectId "user@domain.com"

# Reset password (generates new credential)
Set-AzureADUserPassword -ObjectId "user@domain.com" `
  -Password (ConvertTo-SecureString "TempP@ss$(Get-Random)" -AsPlainText -Force) `
  -ForceChangePasswordNextLogin $true

Compromised service principal:

Powershell
# Remove all credentials from the service principal
Get-AzADServicePrincipal -DisplayName "compromised-sp" |
  Get-AzADSpCredential | ForEach-Object {
    Remove-AzADSpCredential -ObjectId $_.ObjectId -KeyId $_.KeyId
}

# Disable the application
Set-AzureADApplication -ObjectId "app-object-id" -AccountEnabled $false

Compromised VM:

Powershell
# Apply deny-all NSG to isolate the VM (preserves evidence)
$nsg = New-AzNetworkSecurityGroup -Name "IR-Quarantine-NSG" `
  -ResourceGroupName "IR-RG" -Location "westeurope" `
  -SecurityRules @(
    New-AzNetworkSecurityRuleConfig -Name "DenyAllInbound" -Priority 100 `
      -Direction Inbound -Access Deny -Protocol * `
      -SourceAddressPrefix * -SourcePortRange * `
      -DestinationAddressPrefix * -DestinationPortRange *
    New-AzNetworkSecurityRuleConfig -Name "DenyAllOutbound" -Priority 100 `
      -Direction Outbound -Access Deny -Protocol * `
      -SourceAddressPrefix * -SourcePortRange * `
      -DestinationAddressPrefix * -DestinationPortRange *
)
# Associate with the VM's NIC
$nic = Get-AzNetworkInterface -Name "compromised-vm-nic"
$nic.NetworkSecurityGroup = $nsg
Set-AzNetworkInterface -NetworkInterface $nic

Evidence Preservation

Before eradication, preserve evidence:

  • Snapshot VM disks before any remediation. Create snapshots of OS and data disks and store them in a separate, locked-down resource group.
  • Export Sentinel incident with all related events, entities, and timeline.
  • Capture NSG flow logs for the timeframe of interest.
  • Export Azure Activity Logs for the affected subscriptions.
  • Preserve Key Vault audit logs if secrets may have been accessed.

Use an immutable storage account (with legal hold) for evidence storage. This ensures chain of custody for potential legal proceedings.

Phase 4: Eradication

Eradication removes the attacker's foothold from your environment.

Common Eradication Actions

  • Rotate all credentials that the compromised identity had access to. This includes Key Vault secrets, storage account keys, SQL passwords, and service principal secrets.
  • Remove persistence mechanisms. Check for: newly created service principals, modified Conditional Access policies, new OAuth app consents, added federation trusts, modified Automation runbooks, new Azure Function deployments with suspicious code.
  • Patch the vulnerability that enabled initial access. If it was a phishing attack, review email security rules. If it was an exposed service, close the exposure.
  • Review Entra ID audit logs for any configuration changes made by the compromised identity.

Eradication Verification

Run these queries to confirm eradication:

Kql
// Check for any new service principals created during incident window
AuditLogs
| where TimeGenerated between (datetime("2026-04-10") .. datetime("2026-04-15"))
| where OperationName == "Add service principal"
| project TimeGenerated, InitiatedBy, TargetResources

// Check for OAuth consent grants during incident window
AuditLogs
| where TimeGenerated between (datetime("2026-04-10") .. datetime("2026-04-15"))
| where OperationName has "Consent to application"
| project TimeGenerated, InitiatedBy, TargetResources

Phase 5: Recovery

Recovery restores normal operations. Move deliberately — rushing recovery often reintroduces the threat.

Recovery Checklist

  1. Re-enable accounts with fresh credentials and enhanced monitoring
  2. Restore services from known-good backups or redeploy from infrastructure-as-code
  3. Validate security controls — confirm Defender alerts are firing, Sentinel rules are active, Conditional Access policies are enforced
  4. Implement enhanced monitoring for the next 30 days: lower alert thresholds, add custom analytics rules for the specific TTPs observed
  5. Communicate resolution to stakeholders using your pre-built template
  6. Submit regulatory reports if applicable (DORA intermediate and final reports, GDPR breach notification, NIS2 notification)

Post-Recovery Monitoring

Create temporary Sentinel analytics rules that watch for the specific indicators of compromise observed during the incident. Run these for at least 30 days. If they trigger, escalate immediately — the attacker may have maintained access through an undiscovered persistence mechanism.

Phase 6: Lessons Learned

Conduct a post-incident review within 5 business days of recovery. This is not optional — it is where your security programme improves.

Post-Incident Review Template

Timeline reconstruction: Build a minute-by-minute timeline from detection to recovery. Identify delays and their causes.

Root cause analysis: What was the initial access vector? What made the attack possible? What detection gaps existed?

What worked well: Which detection rules fired? Which containment actions were effective? Where did automation help?

What needs improvement: Where were delays? What manual steps should be automated? What detection gaps existed?

Action items with owners and deadlines:

  • Action item 1 — Owner — Deadline
  • Action item 2 — Owner — Deadline
  • Action item 3 — Owner — Deadline

Metrics:

  • Time to detect (TTD): How long from initial compromise to first alert?
  • Time to contain (TTC): How long from first alert to containment?
  • Time to eradicate (TTE): How long from containment to eradication?
  • Time to recover (TTR): How long from eradication to normal operations?

Track these metrics over time. Improvement in these numbers is the best measure of your IR programme maturity.

Sentinel Automation: Putting It Together

Build Sentinel Playbooks that automate the high-confidence containment actions. A recommended starting set:

PlaybookTriggerAction
Isolate-VMHigh-severity Defender for Servers alertApply quarantine NSG, snapshot disks, notify IR team
Disable-UserConfirmed compromised user identityDisable account, revoke tokens, notify IR team
Block-IPThreat intelligence match on firewall logsAdd IP to Azure Firewall deny list, create Sentinel incident
Enrich-IncidentAny new Sentinel incidentAdd geo-IP data, user risk score, device compliance state
Notify-IR-TeamCritical or high severity incident createdSend Teams message, email, and create PagerDuty incident

Each playbook should be tested monthly as part of your IR readiness exercises.

Conclusion

An incident response playbook that lives in a document management system and never gets tested is worse than useless — it gives false confidence. Build your playbook into your tooling. Automate what can be automated. Test what cannot. And review relentlessly.

If you need help building or stress-testing your Azure incident response capability, contact us at mbrahim@conceptualise.de. We bring real-world IR experience to Azure environments and help teams prepare for incidents before they happen.

Topics

incident response AzureAzure Sentinel playbooksecurity incident managementIR runbookcloud incident response

Frequently Asked Questions

The six phases are Preparation, Identification, Containment, Eradication, Recovery, and Lessons Learned. Each phase has specific activities, tooling requirements, and handoff criteria before moving to the next phase.

Expert engagement

Need expert guidance?

Our team specializes in cloud architecture, security, AI platforms, and DevSecOps. Let's discuss how we can help your organization.

Get in touchNo commitment · No sales pressure

Related articles

All posts