Claude-code-plugins clerk-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/clerk-pack/skills/clerk-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-clerk-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/clerk-pack/skills/clerk-incident-runbook/SKILL.mdsource content
Clerk Incident Runbook
Overview
Procedures for responding to Clerk-related incidents in production. Covers triage, emergency auth bypass, recovery scripts, and post-incident review.
Prerequisites
- Access to Clerk Dashboard (dashboard.clerk.com)
- Access to application logs and monitoring
- Emergency contact list for on-call team
- Rollback procedures documented
Instructions
Step 1: Triage — Identify Incident Category
| Category | Symptoms | Severity |
|---|---|---|
| Clerk outage | status.clerk.com shows incident, all auth fails | Critical |
| Key compromise | Unauthorized access detected | Critical |
| Middleware failure | All routes return 500 | High |
| Session issues | Users randomly logged out | Medium |
| Webhook backlog | User sync falling behind | Low |
Quick diagnostic:
#!/bin/bash # scripts/clerk-triage.sh set -euo pipefail echo "=== Clerk Incident Triage ===" echo "Time: $(date -u +%Y-%m-%dT%H:%M:%SZ)" # 1. Check Clerk status echo -e "\n--- Clerk Status ---" curl -s https://status.clerk.com/api/v2/status.json | python3 -c " import json,sys d=json.load(sys.stdin) print(f\"Status: {d['status']['description']}\")" 2>/dev/null || echo "Cannot reach status API" # 2. Check API connectivity echo -e "\n--- API Connectivity ---" HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${CLERK_SECRET_KEY}" \ https://api.clerk.com/v1/users?limit=1 2>/dev/null) echo "API response: HTTP $HTTP_CODE" # 3. Check app health echo -e "\n--- App Health ---" curl -s http://localhost:3000/api/clerk-health 2>/dev/null | python3 -m json.tool || echo "App not reachable"
Step 2: Emergency Auth Bypass (Clerk Outage Only)
// middleware.ts — emergency bypass mode import { clerkMiddleware, createRouteMatcher } from '@clerk/nextjs/server' import { NextResponse } from 'next/server' const EMERGENCY_BYPASS = process.env.CLERK_EMERGENCY_BYPASS === 'true' const isPublicRoute = createRouteMatcher(['/', '/sign-in(.*)', '/sign-up(.*)']) export default clerkMiddleware(async (auth, req) => { // Emergency bypass: allow all requests when Clerk is down if (EMERGENCY_BYPASS) { console.warn('[EMERGENCY] Auth bypass active — all requests allowed') const response = NextResponse.next() response.headers.set('X-Auth-Bypass', 'true') return response } if (!isPublicRoute(req)) { await auth.protect() } })
Activate bypass:
# Vercel: set env var and redeploy vercel env add CLERK_EMERGENCY_BYPASS production # Set to "true" vercel deploy --prod # After Clerk recovers: remove bypass vercel env rm CLERK_EMERGENCY_BYPASS production vercel deploy --prod
Step 3: Key Rotation (Compromised Secret Key)
#!/bin/bash # scripts/rotate-clerk-keys.sh set -euo pipefail echo "=== Clerk Key Rotation ===" echo "1. Go to dashboard.clerk.com > API Keys" echo "2. Generate new Secret Key" echo "3. Update all environments:" # Update production echo "Updating production..." # vercel env rm CLERK_SECRET_KEY production # vercel env add CLERK_SECRET_KEY production # Paste new key # vercel deploy --prod echo "4. Verify all endpoints still work" echo "5. Monitor for unauthorized access attempts" echo "6. File incident report"
Step 4: Session Recovery (Mass Logout Fix)
// app/api/admin/refresh-sessions/route.ts import { auth, clerkClient } from '@clerk/nextjs/server' export async function POST() { const { has } = await auth() if (!has({ role: 'org:admin' })) { return Response.json({ error: 'Admin only' }, { status: 403 }) } // Force-revoke all sessions (users will need to re-authenticate) const client = await clerkClient() const users = await client.users.getUserList({ limit: 500 }) let revoked = 0 for (const user of users.data) { const sessions = await client.sessions.getSessionList({ userId: user.id }) for (const session of sessions.data) { if (session.status === 'active') { await client.sessions.revokeSession(session.id) revoked++ } } } return Response.json({ revoked, message: `Revoked ${revoked} sessions` }) }
Step 5: Webhook Replay (Missed Events)
# Check for missed webhooks in Clerk Dashboard: # Dashboard > Webhooks > Select endpoint > Message Logs # Click "Retry" on failed messages # Or replay from your audit log: echo "Check database for missing user records:" echo "SELECT clerk_id FROM users WHERE created_at > NOW() - INTERVAL '1 hour'"
Step 6: Post-Incident Review Template
## Incident Report **Date:** YYYY-MM-DD HH:MM UTC **Duration:** X hours Y minutes **Severity:** Critical / High / Medium / Low **Category:** Clerk Outage / Key Compromise / Config Error / Middleware Failure ### Timeline - HH:MM — Incident detected (how: monitoring alert / user report / manual) - HH:MM — Triage started, category identified - HH:MM — Mitigation applied (emergency bypass / key rotation / rollback) - HH:MM — Service restored - HH:MM — Post-incident review completed ### Root Cause [Description of what caused the incident] ### Impact - Users affected: X - Duration of auth downtime: Y minutes - Data loss: None / Partial / Details ### Action Items - [ ] Add monitoring for [specific check] - [ ] Update runbook with [new procedure] - [ ] Implement [preventive measure]
Output
- Triage script identifying incident category and severity
- Emergency auth bypass middleware (activate via env var)
- Key rotation procedure for compromised credentials
- Session revocation endpoint for mass-logout recovery
- Post-incident review template
Error Handling
| Scenario | Response |
|---|---|
| Clerk API completely down | Activate emergency bypass, monitor status.clerk.com |
| Secret key compromised | Rotate keys immediately, revoke all sessions, audit logs |
| Middleware 500 errors | Check middleware.ts syntax, verify Clerk SDK version |
| Webhook delivery failures | Retry from Dashboard, check endpoint accessibility |
| Users randomly logged out | Check session lifetime settings, verify domain config |
Examples
Quick Status Check One-Liner
curl -s https://status.clerk.com/api/v2/status.json | python3 -c "import json,sys; print(json.load(sys.stdin)['status']['description'])"
Resources
Next Steps
After resolving incident, review
clerk-observability for improved monitoring.