Awesome-omni-skill systematic-debugging
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/systematic-debugging" ~/.claude/skills/diegosouzapw-awesome-omni-skill-systematic-debugging && rm -rf "$T"
skills/data-ai/systematic-debugging/SKILL.mdSystematic Debugging
Overview
Random fixes waste time and mask underlying issues. Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
When to Use
Use for ANY technical issue (test failures, production bugs, performance). ESPECIALLY when:
- Under time pressure or "quick fix" seems obvious.
- You've already tried multiple fixes or previous fix failed.
- You don't fully understand the issue.
The Four Phases
Complete each phase before proceeding.
Phase 1: Root Cause Investigation
- Read Error Messages: Note line numbers, file paths, and stack traces.
- Reproduce Consistently: Identify exact steps to trigger the bug reliably.
- Check Recent Changes: Review git diffs, new dependencies, or config changes.
- Gather Evidence: In multi-component systems, log data at boundaries to isolate the failing layer.
- Trace Data Flow: For deep errors, trace backward to the source (see Root Cause Tracing below).
Phase 2: Pattern Analysis
- Find Working Examples: Locate similar working code.
- Compare Against References: Read reference implementations completely.
- Identify Differences: List every difference between working and broken states.
Phase 3: Hypothesis and Testing
- Form Single Hypothesis: "I think X is the root cause because Y."
- Test Minimally: Make the SMALLEST change to test the hypothesis.
- Verify: If it fails, form a NEW hypothesis. Don't stack fixes.
Phase 4: Implementation
- Create Failing Test: Simplest reproduction (use
).test-driven-development - Implement Single Fix: Address the root cause identified.
- Verify Fix: Ensure test passes and no regressions.
- If 3+ Fixes Fail: STOP and question the architecture. Discuss with your human partner.
Red Flags - STOP and Return to Phase 1
- "Quick fix for now" or "Just try changing X."
- Adding multiple changes at once.
- Skipping tests or manual verification only.
- Proposing solutions before tracing data flow.
- Each fix reveals a new problem elsewhere.
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Issue is simple" | Simple bugs have root causes too. |
| "Emergency" | Systematic is FASTER than thrashing. |
| "Test after fix" | Untested fixes don't stick. Test first proves it. |
| "Multiple fixes" | Can't isolate what worked. Causes new bugs. |
| "3+ failures" | Architectural problem. Question the pattern. |
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
Supporting Techniques
Use the embedded templates below to apply specific debugging strategies:
Root Cause Tracing
Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.
# Root Cause Tracing ## Overview Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom. ## The Tracing Process ### 1. Observe the Symptom
Error: git init failed in /Users/jesse/project/packages/core
### 2. Find Immediate Cause **What code directly causes this?** ```typescript await execFileAsync('git', ['init'], { cwd: projectDir });
3. Ask: What Called This?
WorktreeManager.createSessionWorktree(projectDir, sessionId) → called by Session.initializeWorkspace() → called by Session.initialize() → called by test at Project.create()
4. Keep Tracing Up
What value was passed?
(empty string!)projectDir = ''- Empty string as
resolves tocwdprocess.cwd() - That's the source code directory!
5. Find Original Trigger
Where did empty string come from?
const context = setupCoreTest(); // Returns { tempDir: '' } Project.create('name', context.tempDir); // Accessed before beforeEach!
Adding Stack Traces
When you can't trace manually, add instrumentation:
// Before the problematic operation async function gitInit(directory: string) { const stack = new Error().stack; console.error('DEBUG git init:', { directory, cwd: process.cwd(), nodeEnv: process.env.NODE_ENV, stack, }); await execFileAsync('git', ['init'], { cwd: directory }); }
Critical: Use
console.error() in tests (not logger - may not show)
Run and capture:
npm test 2>&1 | grep 'DEBUG git init'
Analyze stack traces:
- Look for test file names
- Find the line number triggering the call
- Identify the pattern (same test? same parameter?)
Finding Which Test Causes Pollution: If something appears during tests but you don't know which test is responsible, use the bisection script:
skills/systematic-debugging/find-polluter.sh. It runs tests one-by-one until it finds the first one that creates the unwanted file or state.
### Defense in Depth **Core principle:** Validate at EVERY layer data passes through. Make the bug structurally impossible. ```markdown # Defense-in-Depth Validation ## The Four Layers ### Layer 1: Entry Point Validation **Purpose:** Reject obviously invalid input at API boundary ```typescript function createProject(name: string, workingDirectory: string) { if (!workingDirectory || workingDirectory.trim() === '') { throw new Error('workingDirectory cannot be empty'); } if (!existsSync(workingDirectory)) { throw new Error(`workingDirectory does not exist: ${workingDirectory}`); } // ... proceed }
Layer 2: Business Logic Validation
Purpose: Ensure data makes sense for this operation
function initializeWorkspace(projectDir: string, sessionId: string) { if (!projectDir) { throw new Error('projectDir required for workspace initialization'); } // ... proceed }
Layer 3: Environment Guards
Purpose: Prevent dangerous operations in specific contexts
async function gitInit(directory: string) { // In tests, refuse git init outside temp directories if (process.env.NODE_ENV === 'test') { const normalized = normalize(resolve(directory)); const tmpDir = normalize(resolve(tmpdir())); if (!normalized.startsWith(tmpDir)) { throw new Error( `Refusing git init outside temp dir during tests: ${directory}` ); } } // ... proceed }
Layer 4: Debug Instrumentation
Purpose: Capture context for forensics
async function gitInit(directory: string) { const stack = new Error().stack; logger.debug('About to git init', { directory, cwd: process.cwd(), stack, }); // ... proceed }
### Condition-Based Waiting **Core principle:** Wait for the actual condition you care about, not a guess about how long it takes. ```markdown # Condition-Based Waiting ## Core Pattern ```typescript // ❌ BEFORE: Guessing at timing await new Promise(r => setTimeout(r, 50)); const result = getResult(); expect(result).toBeDefined(); // ✅ AFTER: Waiting for condition await waitFor(() => getResult() !== undefined); const result = getResult(); expect(result).toBeDefined();
Quick Patterns
| Scenario | Pattern |
|---|---|
| Wait for event | |
| Wait for state | |
| Wait for count | |
| Wait for file | |
| Complex condition | |
Implementation
Generic polling function:
async function waitFor<T>( condition: () => T | undefined | null | false, description: string, timeoutMs = 5000 ): Promise<T> { const startTime = Date.now(); while (true) { const result = condition(); if (result) return result; if (Date.now() - startTime > timeoutMs) { throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`); } await new Promise(r => setTimeout(r, 10)); // Poll every 10ms } } **Complete Implementation:** For a production-ready implementation of `waitForEvent`, `waitForEventCount`, and `waitForEventMatch` (useful for complex async systems), see: `skills/systematic-debugging/condition-based-waiting-example.ts`.
**Related skills:** - **test-driven-development** - For creating failing test case (Phase 4, Step 1) - **verification-before-completion** - Verify fix worked before claiming success ## Real-World Impact From debugging sessions: - Systematic approach: 15-30 minutes to fix - Random fixes approach: 2-3 hours of thrashing - First-time fix rate: 95% vs 40% - New bugs introduced: Near zero vs common