Claude-skill-registry error-analysis
This skill should be used when analyzing errors, stack traces, and logs to identify root causes and implement fixes.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/error-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-error-analysis && rm -rf "$T"
skills/data/error-analysis/SKILL.mdError Analysis Skill
Systematically analyze errors and logs to find root causes.
When to Use
- Debugging production errors
- Analyzing stack traces
- Investigating log patterns
- Performing root cause analysis
- Categorizing error types
Reference Documents
- Log Patterns - Common error patterns by type
- Root Cause Analysis - RCA techniques
- Error Categorization - Classifying errors
- Fix Patterns - Common fixes by error type
Analysis Workflow
1. Gather Information
## Error Report ### Error Message
TypeError: Cannot read property 'id' of undefined at UserService.getUser (src/services/user.ts:45:23) at async Router.handle (src/api/routes.ts:67:12)
### Context - **Environment:** Production - **Time:** 2024-01-15 14:30 UTC - **Frequency:** 15 occurrences in last hour - **Affected Users:** ~5% of requests - **Recent Changes:** Deploy at 14:00 UTC
2. Categorize Error
## Error Classification **Type:** Runtime Error **Category:** Null/Undefined Reference **Severity:** High (user-facing) ### Common Causes 1. Missing data validation 2. Race condition 3. API contract change 4. Data migration issue
3. Root Cause Analysis
## Root Cause Investigation ### Timeline - 14:00 - Deploy to production - 14:25 - First error reported - 14:30 - Error rate increased ### Hypothesis 1: Deploy introduced bug - Check: git diff between versions - Result: New code path added ### Hypothesis 2: Data issue - Check: Query for affected users - Result: All have specific condition ### Root Cause Deploy introduced code that assumes `user.profile` exists, but 5% of users don't have profiles (legacy accounts).
4. Implement Fix
## Fix Implementation ### Short-term (Hotfix) ```typescript // Add null check const userId = user?.profile?.id ?? user.id;
Long-term
- Add data validation at API boundary
- Migrate legacy accounts
- Add regression test
## Error Categories ### By Origin | Category | Description | Example | |----------|-------------|---------| | Client | User/browser errors | Invalid input | | Server | Application errors | Null reference | | Infrastructure | System errors | Connection timeout | | External | Third-party errors | API rate limit | ### By Severity | Level | Impact | Response | |-------|--------|----------| | Critical | System down | Immediate | | High | Major feature broken | Hours | | Medium | Degraded experience | This sprint | | Low | Minor inconvenience | Backlog | ## Stack Trace Analysis ### Reading Stack Traces ```markdown ## Stack Trace Components
Error: Connection refused ← Error type and message at Database.connect ← Where error was thrown (src/db/connection.ts:23) ← File and line at async initialize ← Call chain (src/server.ts:45) at async main ← Entry point (src/index.ts:12)
### Key Questions 1. Where was the error thrown? (top of stack) 2. What called that code? (stack trace) 3. What was the input/state? (logs) 4. What changed recently? (git history)
Common Patterns
## Null Reference
TypeError: Cannot read property 'x' of undefined
**Check:** Variable existence, API response shape ## Connection Error
Error: ECONNREFUSED 127.0.0.1:5432
**Check:** Service running, network config, credentials ## Timeout
Error: Operation timed out after 30000ms
**Check:** Service health, query performance, network ## Memory
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed
**Check:** Memory leaks, large data processing
Log Analysis
Correlation
## Correlating Logs ### Using Request ID ```bash grep "req-12345" application.log
Timeline Reconstruction
14:30:00.123 [req-12345] User login started 14:30:00.456 [req-12345] Fetching user profile 14:30:00.789 [req-12345] ERROR: Profile not found 14:30:00.790 [req-12345] TypeError: Cannot read 'id'
Pattern Identification
# Count error types grep "ERROR" app.log | cut -d: -f2 | sort | uniq -c | sort -rn # Find error spikes grep "ERROR" app.log | cut -d' ' -f1 | uniq -c
## Resolution Tracking ### Fix Verification ```markdown ## Fix Verification Checklist - [ ] Error no longer reproducible locally - [ ] Unit test added for fix - [ ] Integration test added - [ ] Deployed to staging - [ ] Verified in staging - [ ] Deployed to production - [ ] Monitoring error rate - [ ] Error rate returned to baseline
Post-mortem Template
# Incident Post-mortem: [Title] ## Summary Brief description of what happened. ## Timeline - HH:MM - Event - HH:MM - Detection - HH:MM - Resolution ## Impact - Users affected: X - Duration: Y hours - Revenue impact: $Z ## Root Cause Detailed explanation. ## Resolution What was done to fix it. ## Lessons Learned 1. What went well 2. What went poorly 3. Where we got lucky ## Action Items - [ ] Prevent: [Action] - [ ] Detect: [Action] - [ ] Respond: [Action]