Claude-skill-registry clerk-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/clerk-incident-runbook" ~/.claude/skills/majiayu000-claude-skill-registry-clerk-incident-runbook && rm -rf "$T"
manifest:
skills/data/clerk-incident-runbook/SKILL.mdsource content
Clerk Incident Runbook
Overview
Procedures for responding to Clerk-related incidents in production.
Prerequisites
- Access to Clerk dashboard
- Access to application logs
- Emergency contact list
- Rollback procedures documented
Incident Categories
Category 1: Complete Auth Outage
Symptoms: All users unable to sign in, middleware returning errors
Immediate Actions:
# 1. Check Clerk status curl -s https://status.clerk.com/api/v1/status | jq # 2. Check your endpoint curl -I https://yourapp.com/api/health/clerk # 3. Check environment variables vercel env ls | grep CLERK
Mitigation Steps:
// Emergency bypass mode (use with caution) // middleware.ts import { clerkMiddleware } from '@clerk/nextjs/server' import { NextResponse } from 'next/server' const EMERGENCY_BYPASS = process.env.CLERK_EMERGENCY_BYPASS === 'true' export default clerkMiddleware(async (auth, request) => { if (EMERGENCY_BYPASS) { // Log for audit console.warn('[EMERGENCY] Auth bypass active', { path: request.nextUrl.pathname, timestamp: new Date().toISOString() }) return NextResponse.next() } // Normal auth flow await auth.protect() })
Category 2: Webhook Processing Failure
Symptoms: User data out of sync, missing user records
Diagnosis:
# Check webhook endpoint curl -X POST https://yourapp.com/api/webhooks/clerk \ -H "Content-Type: application/json" \ -d '{"type":"ping"}' \ -w "\n%{http_code}" # Check Clerk dashboard for failed webhooks # Dashboard > Webhooks > Failed Deliveries
Recovery:
// scripts/resync-users.ts import { clerkClient } from '@clerk/nextjs/server' import { db } from '../lib/db' async function resyncAllUsers() { const client = await clerkClient() let offset = 0 const limit = 100 while (true) { const { data: users, totalCount } = await client.users.getUserList({ limit, offset }) for (const user of users) { await db.user.upsert({ where: { clerkId: user.id }, update: { email: user.emailAddresses[0]?.emailAddress, firstName: user.firstName, lastName: user.lastName, updatedAt: new Date() }, create: { clerkId: user.id, email: user.emailAddresses[0]?.emailAddress, firstName: user.firstName, lastName: user.lastName } }) } console.log(`Synced ${offset + users.length} of ${totalCount} users`) offset += limit if (offset >= totalCount) break } console.log('Resync complete') } resyncAllUsers()
Category 3: Security Incident
Symptoms: Unauthorized access detected, suspicious sessions
Immediate Actions:
// scripts/emergency-session-revoke.ts import { clerkClient } from '@clerk/nextjs/server' async function revokeUserSessions(userId: string) { const client = await clerkClient() // Get all active sessions const sessions = await client.sessions.getSessionList({ userId, status: 'active' }) // Revoke all sessions for (const session of sessions.data) { await client.sessions.revokeSession(session.id) console.log(`Revoked session: ${session.id}`) } console.log(`Revoked ${sessions.data.length} sessions for user ${userId}`) } // Revoke all sessions for compromised user revokeUserSessions('user_xxx')
// scripts/emergency-lockout.ts import { clerkClient } from '@clerk/nextjs/server' async function lockoutUser(userId: string) { const client = await clerkClient() // Ban user (prevents new sign-ins) await client.users.banUser(userId) // Revoke all sessions const sessions = await client.sessions.getSessionList({ userId, status: 'active' }) for (const session of sessions.data) { await client.sessions.revokeSession(session.id) } console.log(`User ${userId} locked out and all sessions revoked`) }
Category 4: Performance Degradation
Symptoms: Slow sign-in, high latency, timeouts
Diagnosis:
// scripts/diagnose-performance.ts async function diagnosePerformance() { const results = { authCheck: 0, getUserList: 0, currentUser: 0 } // Measure auth check const authStart = performance.now() await auth() results.authCheck = performance.now() - authStart // Measure API call const apiStart = performance.now() const client = await clerkClient() await client.users.getUserList({ limit: 1 }) results.getUserList = performance.now() - apiStart // Measure currentUser const userStart = performance.now() await currentUser() results.currentUser = performance.now() - userStart console.log('Performance Diagnosis:', results) // Check for issues if (results.authCheck > 100) { console.warn('Auth check slow - check middleware configuration') } if (results.getUserList > 500) { console.warn('API slow - check Clerk status or network') } return results }
Runbook Procedures
Procedure 1: Auth Outage Response
1. [ ] Confirm outage (check status.clerk.com) 2. [ ] Check application logs for errors 3. [ ] Verify environment variables 4. [ ] If Clerk outage: a. [ ] Enable emergency bypass (if safe) b. [ ] Notify users via status page c. [ ] Monitor Clerk status 5. [ ] If application issue: a. [ ] Check recent deployments b. [ ] Rollback if necessary c. [ ] Check middleware configuration 6. [ ] Document timeline and actions 7. [ ] Conduct post-mortem
Procedure 2: Security Breach Response
1. [ ] Identify affected accounts 2. [ ] Revoke all sessions for affected users 3. [ ] Lock compromised accounts 4. [ ] Reset API keys if exposed 5. [ ] Enable additional verification 6. [ ] Notify affected users 7. [ ] Review access logs 8. [ ] Document and report
Procedure 3: Data Sync Recovery
1. [ ] Identify sync gap (check webhook logs) 2. [ ] Pause webhook processing 3. [ ] Export current database state 4. [ ] Run resync script 5. [ ] Verify data integrity 6. [ ] Resume webhook processing 7. [ ] Monitor for new issues
Emergency Contacts
# .github/INCIDENT_CONTACTS.yml contacts: on_call: - name: On-Call Engineer phone: "+1-xxx-xxx-xxxx" slack: "@oncall" clerk_support: - url: "https://clerk.com/support" - email: "support@clerk.com" - priority: "For enterprise: contact account manager" escalation: - level: 1 contact: "On-call engineer" time: "0-15 min" - level: 2 contact: "Engineering lead" time: "15-30 min" - level: 3 contact: "CTO" time: "30+ min"
Post-Incident
Template
# Incident Report: [Title] ## Summary - **Date:** YYYY-MM-DD - **Duration:** X hours Y minutes - **Severity:** P1/P2/P3 - **Impact:** [Number of affected users] ## Timeline - HH:MM - Incident detected - HH:MM - Initial response - HH:MM - Mitigation applied - HH:MM - Resolution confirmed ## Root Cause [Description of root cause] ## Resolution [Steps taken to resolve] ## Prevention - [ ] Action item 1 - [ ] Action item 2 ## Lessons Learned [Key takeaways]
Output
- Incident response procedures
- Recovery scripts
- Emergency bypass capability
- Post-incident templates
Resources
Next Steps
Proceed to
clerk-data-handling for user data management.