Citadel systematic-debugging
git clone https://github.com/SethGammon/Citadel
T=$(mktemp -d) && git clone --depth=1 https://github.com/SethGammon/Citadel "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/systematic-debugging" ~/.claude/skills/sethgammon-citadel-systematic-debugging && rm -rf "$T"
skills/systematic-debugging/SKILL.md/systematic-debugging — Root Cause Before Fix
Identity
/systematic-debugging enforces one rule: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
Most debugging failures come from guessing. This skill forces a structured approach: observe → hypothesize → verify → fix. The fix is always the last step, never the first.
Protocol
Phase 1: OBSERVATION & REPRODUCTION
- Read the error message, stack trace, or bug description thoroughly
- Reproduce the issue:
- If it's a type error: run typecheck and read the full error
- If it's a runtime error: identify the triggering conditions
- If it's a behavioral bug: document expected vs actual behavior
- Isolate the failing component/function:
- What file? What function? What line?
- What are the inputs when it fails?
- Does it fail consistently or intermittently?
Output: A clear problem statement: "{Component} does {X} when it should do {Y}, triggered by {condition}"
Phase 2: HYPOTHESIS & VERIFICATION
-
Formulate up to 3 hypotheses for WHY the bug exists:
- H1: {most likely cause} — because {evidence}
- H2: {second candidate} — because {evidence}
- H3: {third candidate} — because {evidence}
-
For each hypothesis, define a verification step:
- Add a console.log / diagnostic read / breakpoint
- Check a specific value at a specific point
- DO NOT change any logic yet — only observe
-
Run the verification:
- Which hypothesis was confirmed?
- Which were eliminated?
- If none confirmed: formulate new hypotheses with the new information
CRITICAL: Do not skip this phase. Do not "just try" a fix. Verify first.
Phase 3: ROOT CAUSE ANALYSIS
Once a hypothesis is confirmed:
-
Explain WHY the bug happens, not just WHERE:
- Trace the data flow backward from the symptom to the source
- Identify the specific incorrect assumption or logic error
- Document the causal chain: "A calls B with X, B assumes X > 0, but A passes -1 when {condition}"
-
Check for related occurrences:
- Is this pattern used elsewhere? Could the same bug exist in similar code?
- Is there a systemic issue (e.g., missing validation at a boundary)?
Output: Root cause statement: "The bug occurs because {cause}. This happens when {trigger}."
Phase 4: IMPLEMENTATION
- Write a failing test case that reproduces the bug (if test framework exists)
- Apply the minimal fix — change only what's necessary to resolve the root cause
- Verify the fix:
- Test case now passes
- Typecheck passes
- No regressions in related functionality
- If the root cause analysis revealed related occurrences, fix those too
Emergency Stop Rule
If a fix fails TWICE: STOP.
Do not try a third guess. The root cause analysis was wrong. Either:
- Go back to Phase 2 with new hypotheses
- Ask the user for more context about the system's intended behavior
- Check if there's a higher-level architectural issue
Three failed fixes in a row means you're guessing, not debugging.
What This Skill Prevents
- Shotgun debugging — changing random things until the error goes away
- Symptom fixing — patching the output without understanding the cause
- Fix cascades — one bad fix creating three new bugs
- Silent regressions — fixing one path while breaking another
Quality Gates
- A clear problem statement exists before any hypothesis is formed
- At least one hypothesis was verified (not assumed) before the fix was written
- The fix addresses the root cause, not just the symptom
- Typecheck passes after the fix with no new errors
- If related occurrences were found, they were fixed or documented
Fringe Cases
Bug is intermittent: Document the triggering conditions as precisely as possible. Reproduce it at least once before forming hypotheses. If it can't be reproduced, stop at Phase 1 and ask for more context.
Two fix attempts have already failed: Invoke the Emergency Stop Rule. Return to Phase 2 with new hypotheses. Do not try a third guess without re-reading the relevant code.
No test framework exists: Skip the "write a failing test" step in Phase 4. Verify the fix manually and document how to reproduce the original bug for future reference.
Error is in a dependency or generated file: Document the root cause but do not modify the dependency. Propose a workaround in the consuming code instead.
Exit Protocol
---HANDOFF--- - Bug: {problem statement} - Root cause: {one-line cause} - Fix: {what was changed} - Verified: {typecheck + tests passing} - Related: {any similar patterns found and fixed} ---