Claude-code-plugins-plus-skills intercom-incident-runbook

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-intercom-incident-runbook && rm -rf "$T"
manifest: plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook/SKILL.md
source content

Intercom Incident Runbook

Overview

Rapid incident response procedures for Intercom integration failures, including triage by HTTP status code, mitigation steps, and postmortem template.

Severity Levels

LevelDefinitionResponse TimeExample
P1All Intercom API calls failing< 15 min401 auth failures, API unreachable
P2Degraded service< 1 hourHigh latency, rate limited (429)
P3Partial impact< 4 hoursWebhook delays, search timeouts
P4No user impactNext business dayMonitoring gaps, stale cache

Quick Triage (Copy-Paste)

#!/bin/bash
echo "=== Intercom Incident Triage ==="

# 1. Is Intercom's API responding?
echo -n "1. API reachable: "
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me
echo ""

# 2. Is there a platform-wide incident?
echo -n "2. Intercom status: "
curl -s https://status.intercom.com/api/v2/status.json | jq -r '.status.description'

# 3. Active incidents on Intercom's side?
echo -n "3. Active incidents: "
curl -s https://status.intercom.com/api/v2/incidents/unresolved.json | jq '.incidents | length'

# 4. Rate limit status
echo -n "4. Rate limit remaining: "
curl -s -D - -o /dev/null \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me 2>/dev/null | grep -i x-ratelimit-remaining | awk '{print $2}'

# 5. Our health check
echo -n "5. Our integration health: "
curl -s https://your-app.com/health | jq '.services.intercom.status' 2>/dev/null || echo "UNKNOWN"

Decision Tree

API returning errors?
├── YES ──▶ Check status.intercom.com
│           ├── Incident reported ──▶ Intercom's problem
│           │   → Enable graceful degradation
│           │   → Monitor for resolution
│           │   → No action needed on our side
│           └── No incident ──▶ Our integration issue
│               ├── 401 → Token expired/revoked → Rotate token
│               ├── 403 → Scope missing → Add OAuth scope
│               ├── 429 → Rate limited → Enable queue/backoff
│               └── 5xx → Server error → Retry with backoff
└── NO ──▶ Is our service healthy?
           ├── YES → Resolved or intermittent → Monitor
           └── NO → Our infrastructure issue
               → Check pods, memory, network, DNS

Mitigation by Error Type

401 - Authentication Failed

# Verify token is valid
curl -s -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me | jq '.type'
# Expected: "admin"
# If error: Token is invalid or revoked

# IMMEDIATE: Regenerate token
# Developer Hub > Your App > Authentication > Generate new token
# Update in secret manager:
aws secretsmanager update-secret \
  --secret-id intercom/production/token \
  --secret-string "new_token_here"

# Restart application to pick up new token
kubectl rollout restart deployment/intercom-service

429 - Rate Limited

# Check rate limit headers
curl -s -D - -o /dev/null \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me 2>/dev/null | grep -i "x-ratelimit"

# Immediate: Reduce request volume
# - Pause any batch/sync jobs
# - Enable request queuing if available

# Check if multiple apps are consuming workspace quota
# Limit: 25,000 req/min per workspace across all apps

5xx - Intercom Server Errors

# 1. Check Intercom status
curl -s https://status.intercom.com/api/v2/status.json | jq

# 2. Enable graceful degradation
# Your app should serve cached data or fallback UI

# 3. Track request_id from error responses for Intercom support
# Error response includes: { "request_id": "req_abc123" }

Graceful Degradation Pattern

import { IntercomClient, IntercomError } from "intercom-client";
import { LRUCache } from "lru-cache";

const cache = new LRUCache<string, any>({ max: 10000, ttl: 3600000 }); // 1hr fallback

async function getContactWithFallback(contactId: string): Promise<any> {
  try {
    const contact = await client.contacts.find({ contactId });
    cache.set(contactId, contact); // Update cache on success
    return contact;
  } catch (err) {
    if (err instanceof IntercomError && (err.statusCode === 429 || (err.statusCode ?? 0) >= 500)) {
      // Return stale cached data during outages
      const cached = cache.get(contactId);
      if (cached) {
        console.warn(`[Intercom] Serving cached data for ${contactId} due to ${err.statusCode}`);
        return { ...cached, _stale: true };
      }
    }
    throw err;
  }
}

Communication Templates

Internal Slack

[P1] INCIDENT: Intercom Integration
Status: INVESTIGATING
Impact: [Customer conversations not loading / messages not sending]
Cause: [Intercom API returning 5xx / our token expired / rate limited]
Action: [Enabling fallback / rotating token / pausing sync jobs]
Next update: [Time]
Commander: @[name]

Postmortem Template

## Incident: Intercom [Type]
**Date:** YYYY-MM-DD HH:MM - HH:MM UTC
**Duration:** X hours Y minutes
**Severity:** P[1-4]
**Intercom request_ids:** [req_abc123, req_def456]

### Summary
[1-2 sentences describing what happened and user impact]

### Timeline
- HH:MM - First alert: [what triggered]
- HH:MM - Triage started: [findings]
- HH:MM - Mitigation: [action taken]
- HH:MM - Resolution: [what fixed it]

### Root Cause
[Technical explanation of why it happened]

### Impact
- Conversations affected: N
- Users unable to reach support: N
- Duration of degraded service: Xm

### Action Items
- [ ] [Preventive measure] - Owner - Due
- [ ] [Monitoring gap to fill] - Owner - Due
- [ ] [Documentation to update] - Owner - Due

Error Handling

IssueCauseSolution
Triage script failsToken not setExport INTERCOM_ACCESS_TOKEN
Status page unreachableDNS/networkTry mobile network or VPN
Can't rotate tokenNo Developer Hub accessEscalate to workspace admin
Cache empty during outageNo pre-warmingImplement cache warming job

Resources

Next Steps

For data handling compliance, see

intercom-data-handling
.