Claude-code-plugins-plus-skills intercom-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-intercom-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook/SKILL.mdsource content
Intercom Incident Runbook
Overview
Rapid incident response procedures for Intercom integration failures, including triage by HTTP status code, mitigation steps, and postmortem template.
Severity Levels
| Level | Definition | Response Time | Example |
|---|---|---|---|
| P1 | All Intercom API calls failing | < 15 min | 401 auth failures, API unreachable |
| P2 | Degraded service | < 1 hour | High latency, rate limited (429) |
| P3 | Partial impact | < 4 hours | Webhook delays, search timeouts |
| P4 | No user impact | Next business day | Monitoring gaps, stale cache |
Quick Triage (Copy-Paste)
#!/bin/bash echo "=== Intercom Incident Triage ===" # 1. Is Intercom's API responding? echo -n "1. API reachable: " curl -s -o /dev/null -w "%{http_code}" \ -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \ https://api.intercom.io/me echo "" # 2. Is there a platform-wide incident? echo -n "2. Intercom status: " curl -s https://status.intercom.com/api/v2/status.json | jq -r '.status.description' # 3. Active incidents on Intercom's side? echo -n "3. Active incidents: " curl -s https://status.intercom.com/api/v2/incidents/unresolved.json | jq '.incidents | length' # 4. Rate limit status echo -n "4. Rate limit remaining: " curl -s -D - -o /dev/null \ -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \ https://api.intercom.io/me 2>/dev/null | grep -i x-ratelimit-remaining | awk '{print $2}' # 5. Our health check echo -n "5. Our integration health: " curl -s https://your-app.com/health | jq '.services.intercom.status' 2>/dev/null || echo "UNKNOWN"
Decision Tree
API returning errors? ├── YES ──▶ Check status.intercom.com │ ├── Incident reported ──▶ Intercom's problem │ │ → Enable graceful degradation │ │ → Monitor for resolution │ │ → No action needed on our side │ └── No incident ──▶ Our integration issue │ ├── 401 → Token expired/revoked → Rotate token │ ├── 403 → Scope missing → Add OAuth scope │ ├── 429 → Rate limited → Enable queue/backoff │ └── 5xx → Server error → Retry with backoff └── NO ──▶ Is our service healthy? ├── YES → Resolved or intermittent → Monitor └── NO → Our infrastructure issue → Check pods, memory, network, DNS
Mitigation by Error Type
401 - Authentication Failed
# Verify token is valid curl -s -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \ https://api.intercom.io/me | jq '.type' # Expected: "admin" # If error: Token is invalid or revoked # IMMEDIATE: Regenerate token # Developer Hub > Your App > Authentication > Generate new token # Update in secret manager: aws secretsmanager update-secret \ --secret-id intercom/production/token \ --secret-string "new_token_here" # Restart application to pick up new token kubectl rollout restart deployment/intercom-service
429 - Rate Limited
# Check rate limit headers curl -s -D - -o /dev/null \ -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \ https://api.intercom.io/me 2>/dev/null | grep -i "x-ratelimit" # Immediate: Reduce request volume # - Pause any batch/sync jobs # - Enable request queuing if available # Check if multiple apps are consuming workspace quota # Limit: 25,000 req/min per workspace across all apps
5xx - Intercom Server Errors
# 1. Check Intercom status curl -s https://status.intercom.com/api/v2/status.json | jq # 2. Enable graceful degradation # Your app should serve cached data or fallback UI # 3. Track request_id from error responses for Intercom support # Error response includes: { "request_id": "req_abc123" }
Graceful Degradation Pattern
import { IntercomClient, IntercomError } from "intercom-client"; import { LRUCache } from "lru-cache"; const cache = new LRUCache<string, any>({ max: 10000, ttl: 3600000 }); // 1hr fallback async function getContactWithFallback(contactId: string): Promise<any> { try { const contact = await client.contacts.find({ contactId }); cache.set(contactId, contact); // Update cache on success return contact; } catch (err) { if (err instanceof IntercomError && (err.statusCode === 429 || (err.statusCode ?? 0) >= 500)) { // Return stale cached data during outages const cached = cache.get(contactId); if (cached) { console.warn(`[Intercom] Serving cached data for ${contactId} due to ${err.statusCode}`); return { ...cached, _stale: true }; } } throw err; } }
Communication Templates
Internal Slack
[P1] INCIDENT: Intercom Integration Status: INVESTIGATING Impact: [Customer conversations not loading / messages not sending] Cause: [Intercom API returning 5xx / our token expired / rate limited] Action: [Enabling fallback / rotating token / pausing sync jobs] Next update: [Time] Commander: @[name]
Postmortem Template
## Incident: Intercom [Type] **Date:** YYYY-MM-DD HH:MM - HH:MM UTC **Duration:** X hours Y minutes **Severity:** P[1-4] **Intercom request_ids:** [req_abc123, req_def456] ### Summary [1-2 sentences describing what happened and user impact] ### Timeline - HH:MM - First alert: [what triggered] - HH:MM - Triage started: [findings] - HH:MM - Mitigation: [action taken] - HH:MM - Resolution: [what fixed it] ### Root Cause [Technical explanation of why it happened] ### Impact - Conversations affected: N - Users unable to reach support: N - Duration of degraded service: Xm ### Action Items - [ ] [Preventive measure] - Owner - Due - [ ] [Monitoring gap to fill] - Owner - Due - [ ] [Documentation to update] - Owner - Due
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Triage script fails | Token not set | Export INTERCOM_ACCESS_TOKEN |
| Status page unreachable | DNS/network | Try mobile network or VPN |
| Can't rotate token | No Developer Hub access | Escalate to workspace admin |
| Cache empty during outage | No pre-warming | Implement cache warming job |
Resources
Next Steps
For data handling compliance, see
intercom-data-handling.