Claude-code-plugins-plus-skills posthog-incident-runbook

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/posthog-pack/skills/posthog-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-posthog-incident-runbook && rm -rf "$T"
manifest: plugins/saas-packs/posthog-pack/skills/posthog-incident-runbook/SKILL.md
source content

PostHog Incident Runbook

Overview

Rapid incident response for PostHog integration failures. PostHog Cloud has its own status page (status.posthog.com) — the first step is always determining whether the issue is PostHog-side or your integration.

Severity Levels

LevelDefinitionResponse TimeExamples
P1Analytics completely down< 15 minAll capture calls failing, feature flags returning defaults
P2Degraded analytics< 1 hourHigh latency, partial event loss, slow flag eval
P3Minor impact< 4 hoursWebhook delays, specific event type missing
P4No user impactNext dayMonitoring gaps, dashboard stale data

Quick Triage (Run First)

set -euo pipefail
echo "=== PostHog Triage ==="
echo ""

# 1. Is PostHog Cloud up?
echo -n "PostHog US Cloud: "
curl -sf -o /dev/null -w "%{http_code}" https://us.i.posthog.com/healthz || echo "UNREACHABLE"
echo ""

# 2. Can we capture events?
echo -n "Event capture: "
curl -sf -o /dev/null -w "%{http_code}" -X POST 'https://us.i.posthog.com/capture/' \
  -H 'Content-Type: application/json' \
  -d "{\"api_key\":\"${NEXT_PUBLIC_POSTHOG_KEY}\",\"event\":\"triage_test\",\"distinct_id\":\"triage\"}" || echo "FAILED"
echo ""

# 3. Can we evaluate flags?
echo -n "Flag evaluation: "
curl -sf -o /dev/null -w "%{http_code}" -X POST 'https://us.i.posthog.com/decide/?v=3' \
  -H 'Content-Type: application/json' \
  -d "{\"api_key\":\"${NEXT_PUBLIC_POSTHOG_KEY}\",\"distinct_id\":\"triage\"}" || echo "FAILED"
echo ""

# 4. Can we access admin API?
if [ -n "${POSTHOG_PERSONAL_API_KEY:-}" ]; then
  echo -n "Admin API: "
  curl -sf -o /dev/null -w "%{http_code}" "https://app.posthog.com/api/projects/" \
    -H "Authorization: Bearer $POSTHOG_PERSONAL_API_KEY" || echo "FAILED"
  echo ""
fi

# 5. Check our integration health
echo -n "Our health endpoint: "
curl -sf -o /dev/null -w "%{http_code}" "https://your-app.com/api/health" || echo "UNREACHABLE"
echo ""

Decision Tree

Is PostHog Cloud healthy (status.posthog.com)?
├── NO → PostHog outage
│   ├── Enable graceful degradation (feature flags return defaults)
│   ├── Monitor status.posthog.com for resolution
│   └── Events will be lost during outage (capture is fire-and-forget)
│
└── YES → Our integration issue
    ├── Are we getting 401? → API key issue (see Error 401 below)
    ├── Are we getting 429? → Rate limited (see Error 429 below)
    ├── Are events just not appearing? → Check flush/shutdown (see below)
    └── Are flags returning defaults? → Check personalApiKey (see below)

Immediate Actions by Error Type

401/403 — Authentication Failed

set -euo pipefail
# Verify API key type and validity
echo "Project key prefix: $(echo "$NEXT_PUBLIC_POSTHOG_KEY" | head -c 4)"
echo "Personal key prefix: $(echo "$POSTHOG_PERSONAL_API_KEY" | head -c 4)"

# Test project key (should return HTTP 200)
curl -s -o /dev/null -w "Capture: %{http_code}\n" -X POST 'https://us.i.posthog.com/capture/' \
  -H 'Content-Type: application/json' \
  -d "{\"api_key\":\"$NEXT_PUBLIC_POSTHOG_KEY\",\"event\":\"test\",\"distinct_id\":\"test\"}"

# Test personal key (should return project list)
curl -s -o /dev/null -w "Admin: %{http_code}\n" "https://app.posthog.com/api/projects/" \
  -H "Authorization: Bearer $POSTHOG_PERSONAL_API_KEY"

# Fix: If key is invalid, rotate in PostHog dashboard and update secrets

429 — Rate Limited

set -euo pipefail
# PostHog rate limits (private API only):
# - Analytics endpoints: 240/min, 1200/hour
# - HogQL query: 1200/hour
# - Local flag eval polling: 600/min
# - Capture endpoints: NO LIMIT

# Immediate: Cache API responses, reduce polling frequency
# Long-term: See posthog-rate-limits skill

Events Not Appearing

set -euo pipefail
# Most common cause: not calling flush/shutdown in serverless

# Check 1: Is capture endpoint reachable?
curl -s -X POST 'https://us.i.posthog.com/capture/' \
  -H 'Content-Type: application/json' \
  -d "{\"api_key\":\"$NEXT_PUBLIC_POSTHOG_KEY\",\"event\":\"debug_test\",\"distinct_id\":\"debug-$(date +%s)\"}" | jq .
# Expected: {"status": 1}

# Check 2: Verify API host is correct (common mistake)
# WRONG: https://app.posthog.com (this is the UI)
# RIGHT: https://us.i.posthog.com (this is the ingest endpoint)

Feature Flags Returning Defaults

// Most common causes:
// 1. No personalApiKey → falls back to remote eval which may fail
// 2. Flags not loaded yet → check timing
// 3. Wrong project key → flags from different project

// Fix 1: Add personalApiKey
const posthog = new PostHog(process.env.NEXT_PUBLIC_POSTHOG_KEY!, {
  personalApiKey: process.env.POSTHOG_PERSONAL_API_KEY, // Required for local eval
});

// Fix 2: Wait for flags in browser
posthog.onFeatureFlags(() => {
  // Now flags are loaded
  const value = posthog.isFeatureEnabled('my-flag');
});

Graceful Degradation Pattern

// PostHog should NEVER crash your app
function safeCapture(distinctId: string, event: string, props?: Record<string, any>) {
  try {
    posthog.capture({ distinctId, event, properties: props });
  } catch {
    // Swallow error — analytics failure should never impact users
  }
}

async function safeFlag(key: string, userId: string, fallback: boolean = false): Promise<boolean> {
  try {
    const result = await posthog.isFeatureEnabled(key, userId);
    return result ?? fallback;
  } catch {
    return fallback; // Return safe default
  }
}

Post-Incident Evidence Collection

set -euo pipefail
INCIDENT_DIR="posthog-incident-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$INCIDENT_DIR"

# Collect diagnostics
echo "Incident: $(date -u)" > "$INCIDENT_DIR/timeline.txt"
curl -s https://us.i.posthog.com/healthz > "$INCIDENT_DIR/healthz.json" 2>&1
env | grep -i posthog | sed 's/=.*/=***/' > "$INCIDENT_DIR/env-redacted.txt"
npm list posthog-js posthog-node 2>/dev/null > "$INCIDENT_DIR/versions.txt"

tar -czf "$INCIDENT_DIR.tar.gz" "$INCIDENT_DIR"
echo "Evidence collected: $INCIDENT_DIR.tar.gz"

Error Handling

IssueCauseSolution
Complete analytics outagePostHog Cloud downEnable graceful degradation, monitor status page
Partial event lossServerless not flushingAdd
await posthog.shutdown()
All flags return false
personalApiKey
missing or expired
Add/rotate personal API key
Admin API 401Personal key revokedGenerate new key in PostHog settings
High latencyNetwork path to PostHogCheck reverse proxy, try direct connection

Output

  • Triage commands identifying issue source
  • Immediate remediation for each error type
  • Graceful degradation wrappers
  • Post-incident evidence bundle

Resources

Next Steps

For data handling, see

posthog-data-handling
.