Awesome-claude-corporate-skills incident-response

Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.

install
source · Clone the upstream repo
git clone https://github.com/w95/awesome-claude-corporate-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/w95/awesome-claude-corporate-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/08-it-engineering/incident-response" ~/.claude/skills/w95-awesome-claude-corporate-skills-incident-response && rm -rf "$T"
manifest: 08-it-engineering/incident-response/SKILL.md
source content

Incident Response

Guide incident response from detection through resolution and postmortem.

Severity Classification

LevelCriteriaResponse Time
SEV1Service down, all users affectedImmediate, all-hands
SEV2Major feature degraded, many users affectedWithin 15 min
SEV3Minor feature issue, some users affectedWithin 1 hour
SEV4Cosmetic or low-impact issueNext business day

Response Framework

  1. Triage: Classify severity, identify scope, assign incident commander
  2. Communicate: Status page, internal updates, customer comms if needed
  3. Mitigate: Stop the bleeding first, root cause later
  4. Resolve: Implement fix, verify, confirm resolution
  5. Postmortem: Blameless review, 5 whys, action items

Communication Templates

Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.

Postmortem Format

Blameless. Focus on systems and processes. Include timeline, root cause analysis (5 whys), what went well, what went poorly, and action items with owners and due dates.