Skillshub adobe-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/adobe-incident-runbook" ~/.claude/skills/comeonoliver-skillshub-adobe-incident-runbook && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/adobe-incident-runbook/SKILL.mdsource content
Adobe Incident Runbook
Overview
Rapid incident response procedures for Adobe API-related outages, covering IMS authentication failures, Firefly/Photoshop API downtime, PDF Services quota exhaustion, and I/O Events delivery failures.
Prerequisites
- Access to Adobe Developer Console and Admin Console
- Access to application monitoring (Grafana, Datadog, etc.)
- kubectl access to production cluster (if applicable)
- Communication channels (Slack, PagerDuty)
Severity Matrix
| Level | Definition | Response Time | Example |
|---|---|---|---|
| P1 | Complete Adobe integration failure | < 15 min | IMS auth broken, all APIs down |
| P2 | Single API degraded | < 1 hour | Firefly 429s, Photoshop timeouts |
| P3 | Minor impact | < 4 hours | Webhook delays, slow PDF extraction |
| P4 | No user impact | Next business day | Monitoring gap, metric anomaly |
Quick Triage (Run These First)
# 1. Is Adobe itself down? curl -s -o /dev/null -w "Adobe Status: %{http_code}\n" https://status.adobe.com # 2. Can we generate an access token? curl -s -o /dev/null -w "IMS Auth: %{http_code}\n" -X POST \ 'https://ims-na1.adobelogin.com/ims/token/v3' \ -d "client_id=${ADOBE_CLIENT_ID}&client_secret=${ADOBE_CLIENT_SECRET}&grant_type=client_credentials&scope=${ADOBE_SCOPES}" # 3. Can we reach each API endpoint? for endpoint in firefly-api.adobe.io image.adobe.io pdf-services.adobe.io; do CODE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "https://$endpoint" 2>/dev/null || echo "UNREACHABLE") echo "$endpoint: $CODE" done # 4. Check our app health curl -sf https://your-app.com/health | python3 -m json.tool # 5. Recent errors in our logs (last 5 min) kubectl logs -l app=adobe-service --since=5m 2>/dev/null | grep -i "error\|failed\|429\|401\|500" | tail -20
Decision Tree
Adobe APIs returning errors? ├── YES: Is status.adobe.com reporting an incident? │ ├── YES → Adobe-side outage. Enable fallback mode. Monitor status page. │ └── NO → Check our credentials and config. │ ├── 401 errors → Credentials expired/rotated. See "Auth Recovery" below. │ ├── 429 errors → Rate limited. See "Rate Limit Recovery" below. │ └── 500/503 errors → Adobe server issue (unreported). Open support ticket. └── NO: Is our application healthy? ├── YES → Likely resolved or intermittent. Continue monitoring. └── NO → Our infrastructure issue. Check pods, memory, network.
Recovery Procedures
Auth Recovery (401/403)
# 1. Verify credentials are still valid in Developer Console # https://developer.adobe.com/console → Your Project → Credentials # 2. Test credential directly curl -v -X POST 'https://ims-na1.adobelogin.com/ims/token/v3' \ -d "client_id=${ADOBE_CLIENT_ID}&client_secret=${ADOBE_CLIENT_SECRET}&grant_type=client_credentials&scope=${ADOBE_SCOPES}" 2>&1 | grep -E "HTTP|error" # 3. If credentials were rotated, update in secret manager gcloud secrets versions add adobe-client-secret --data-file=- <<< "new_p8_secret" # OR aws secretsmanager update-secret --secret-id adobe/production/credentials \ --secret-string '{"client_id":"...","client_secret":"new_secret"}' # 4. Restart application to clear cached token kubectl rollout restart deployment/adobe-service # 5. Verify recovery curl -sf https://your-app.com/health | jq '.services.adobe'
Rate Limit Recovery (429)
# 1. Check if rate limiting is transient or sustained # Look at 429 error rate over last 30 min # 2. Reduce throughput immediately # Option A: Scale down workers kubectl scale deployment/adobe-batch-worker --replicas=1 # Option B: Enable rate limit queue mode kubectl set env deployment/adobe-service ADOBE_RATE_LIMIT_MODE=queue # 3. For sustained rate limiting, contact Adobe for limit increase # Include: client_id, typical request volume, business justification
Fallback Mode
# Enable fallback mode (app continues working without Adobe) kubectl set env deployment/adobe-service ADOBE_FALLBACK_MODE=true # Verify fallback is working curl -sf https://your-app.com/health | jq '.services.adobe' # Should return { "status": "degraded", "mode": "fallback" }
Communication Templates
Internal (Slack)
P[1-4] INCIDENT: Adobe [API Name] Integration Status: INVESTIGATING / IDENTIFIED / MONITORING / RESOLVED Impact: [User-facing description] Root cause: [Adobe outage / credential issue / rate limit / our bug] Current action: [What you're doing right now] Next update: [Time] Commander: @[name]
Postmortem Template
## Incident: Adobe [API] [Error Type] **Date:** YYYY-MM-DD **Duration:** X hours Y minutes **Severity:** P[1-4] ### Summary [1-2 sentence description of what happened] ### Timeline - HH:MM UTC — Alert fired: adobe_api_errors_total spike - HH:MM UTC — On-call acknowledged, began triage - HH:MM UTC — Root cause identified: [description] - HH:MM UTC — Mitigation applied: [action taken] - HH:MM UTC — Full recovery confirmed ### Root Cause [Technical explanation — was it Adobe-side, credential issue, our bug?] ### Impact - Users affected: N - API calls failed: N - Revenue impact: $X (if applicable) ### Action Items - [ ] [Preventive measure] — Owner — Due date - [ ] [Monitoring improvement] — Owner — Due date - [ ] [Documentation update] — Owner — Due date
Output
- Incident severity classified
- Root cause identified via decision tree
- Recovery procedure executed
- Stakeholders notified with template
- Evidence collected for postmortem
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Can't reach status.adobe.com | Network issue | Use mobile data or check @AdobeCare on Twitter |
| kubectl auth expired | Token timeout | Re-authenticate with cloud provider |
| Secret manager access denied | IAM policy | Use break-glass admin account |
| Fallback mode not implemented | Missing code path | Return cached/default data |
Resources
Next Steps
For data handling, see
adobe-data-handling.