Claude-code-plugins langchain-incident-runbook
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/langchain-pack/skills/langchain-incident-runbook" ~/.claude/skills/jeremylongshore-claude-code-plugins-langchain-incident-runbook && rm -rf "$T"
manifest:
plugins/saas-packs/langchain-pack/skills/langchain-incident-runbook/SKILL.mdsource content
LangChain Incident Runbook
Overview
Standard operating procedures for LangChain production incidents: provider outages, error rate spikes, latency degradation, memory issues, and cost overruns.
Severity Classification
| Level | Description | Response Time | Example |
|---|---|---|---|
| SEV1 | Complete outage | 15 min | All LLM calls failing |
| SEV2 | Major degradation | 30 min | >50% error rate, >10s latency |
| SEV3 | Minor degradation | 2 hours | <10% errors, slow responses |
| SEV4 | Low impact | 24 hours | Intermittent issues, warnings |
Runbook 1: LLM Provider Outage
Detect
# Check provider status pages curl -s https://status.openai.com/api/v2/status.json | jq '.status' curl -s https://status.anthropic.com/api/v2/status.json | jq '.status'
Diagnose
async function diagnoseProviders() { const results: Record<string, string> = {}; try { const openai = new ChatOpenAI({ model: "gpt-4o-mini", timeout: 10000 }); await openai.invoke("ping"); results.openai = "OK"; } catch (e: any) { results.openai = `FAIL: ${e.message.slice(0, 100)}`; } try { const anthropic = new ChatAnthropic({ model: "claude-sonnet-4-20250514" }); await anthropic.invoke("ping"); results.anthropic = "OK"; } catch (e: any) { results.anthropic = `FAIL: ${e.message.slice(0, 100)}`; } console.table(results); return results; }
Mitigate
// Enable fallback — switch to healthy provider const primary = new ChatOpenAI({ model: "gpt-4o-mini", maxRetries: 1, timeout: 5000, }); const fallback = new ChatAnthropic({ model: "claude-sonnet-4-20250514", maxRetries: 1, }); const resilientModel = primary.withFallbacks({ fallbacks: [fallback], }); // All chains using resilientModel auto-failover
Recover
- Monitor provider status page for resolution
- Verify primary provider works:
await diagnoseProviders() - Remove fallback config (or keep it for resilience)
- Document incident timeline for post-mortem
Runbook 2: High Error Rate
Detect
# Check LangSmith for error spike # https://smith.langchain.com/o/YOUR_ORG/projects/YOUR_PROJECT/runs?filter=error:true # Check application logs grep -c "Error\|error\|ERROR" /var/log/app/langchain.log | tail -5
Diagnose
// Common error patterns const ERROR_CAUSES: Record<string, string> = { "RateLimitError": "API quota exceeded -> reduce concurrency", "AuthenticationError": "API key invalid -> check secrets", "Timeout": "Provider slow -> increase timeout", "OutputParserException": "LLM output format changed -> check prompts", "ValidationError": "Schema mismatch -> update Zod schemas", "ContextLengthExceeded": "Input too long -> truncate or chunk", };
Mitigate
// 1. Reduce load // Lower maxConcurrency on batch operations // 2. Enable caching for repeated queries const cache = new Map(); async function withCache(chain: any, input: any) { const key = JSON.stringify(input); if (cache.has(key)) return cache.get(key); const result = await chain.invoke(input); cache.set(key, result); return result; } // 3. Enable fallback model const model = primary.withFallbacks({ fallbacks: [fallback] });
Runbook 3: Latency Spike
Detect
# Prometheus query histogram_quantile(0.95, rate(langchain_llm_latency_seconds_bucket[5m])) > 5
Diagnose
// Measure per-component latency const tracer = new MetricsCallback(); await chain.invoke({ input: "test" }, { callbacks: [tracer] }); console.table(tracer.getReport()); // Check: is it the LLM, retriever, or tool that's slow?
Mitigate
- Switch to faster model:
(200ms TTFT) vsgpt-4o-mini
(400ms)gpt-4o - Enable streaming to reduce perceived latency
- Enable caching for repeated queries
- Reduce context length (shorter prompts)
Runbook 4: Cost Overrun
Detect
# Check OpenAI usage dashboard # https://platform.openai.com/usage
Mitigate
// 1. Emergency model downgrade // gpt-4o ($2.50/1M) -> gpt-4o-mini ($0.15/1M) = 17x cheaper // 2. Enable budget enforcement const budget = new BudgetEnforcer(50.0); // $50 daily limit const model = new ChatOpenAI({ model: "gpt-4o-mini", callbacks: [budget], }); // 3. Enable aggressive caching // (see langchain-cost-tuning skill)
Runbook 5: Memory/OOM Issues
Detect
# Check process memory ps aux --sort=-%mem | head -5 # Node.js heap stats node -e "console.log(process.memoryUsage())"
Mitigate
- Clear caches: reset in-memory caches
- Reduce batch sizes: lower
maxConcurrency - Use streaming instead of accumulating full responses
- Restart pods:
kubectl rollout restart deployment/langchain-api
Incident Response Checklist
During Incident
- Acknowledge in incident channel
- Classify severity (SEV1-4)
- Check provider status pages
- Run diagnostic script
- Apply mitigation (fallback/cache/throttle)
- Communicate status to stakeholders
- Document timeline
Post-Incident
- Verify full recovery
- Schedule post-mortem (within 48h)
- Write incident report
- Create follow-up tickets
- Update monitoring/alerting rules
- Update this runbook if needed
Resources
Next Steps
Use
langchain-debug-bundle for detailed evidence collection during incidents.