Awesome-omni-skill Incident Response

Guide structured incident response following severity-based protocols

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/incident-response" ~/.claude/skills/diegosouzapw-awesome-omni-skill-incident-response-fdc399 && rm -rf "$T"

manifest: skills/tools/incident-response/SKILL.md

source content

Incident Response Skill

Guide structured incident response following severity-based protocols.

Trigger Conditions

An incident is detected or reported
An alert fires with severity S1 or S2
User invokes with "incident response" or "start incident"

Input Contract

Required: Incident description or alert payload
Required: Severity level (S1-S4)
Optional: Affected services, customer impact estimate

Output Contract

Incident timeline with structured updates
Mitigation steps and status
Communication templates for stakeholders
Evidence preservation checklist

Tool Permissions

Read: Logs, metrics, alerts, runbooks, deployment history
Write: Incident reports, status page updates
Execute: Diagnostic commands, rollback procedures

Execution Steps

Classify severity using defined criteria
Identify incident commander and assign roles
Establish communication channels (war room)
Preserve evidence (logs, metrics, config state, recent deploys)
Focus on mitigation first, root cause second
Provide structured updates at cadence defined by severity
Assess regulatory notification requirements
Document resolution and schedule postmortem

Success Criteria

Severity correctly classified within 5 minutes
Communication cadence maintained per severity SLA
Service restored within MTTR target
Evidence preserved before mitigation changes state

Escalation Rules

Auto-escalate if MTTR exceeds threshold per severity
Escalate if regulatory notification may be required
Escalate if customer impact exceeds threshold

Example Invocations

Input: "Payment processing is returning 500 errors for 30% of requests"

Output: S1 incident declared. Mitigation: rollback last deploy (payment-service v2.3.1 → v2.3.0). Comms: internal Slack update every 15min, status page updated, customer notification drafted. Evidence: deploy log, error rate graph, and config diff preserved.