Agent-alchemy bug-killer
git clone https://github.com/sequenzia/agent-alchemy
T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ported/20260310/all/skills/bug-killer" ~/.claude/skills/sequenzia-agent-alchemy-bug-killer-2b99f4 && rm -rf "$T"
ported/20260310/all/skills/bug-killer/SKILL.mdBug Killer — Hypothesis-Driven Debugging Workflow
Execute a systematic debugging workflow that enforces investigation before fixes. Every bug gets a hypothesis journal, evidence gathering, and root cause confirmation before any code changes.
Phase Overview
- Triage & Reproduction — Understand, reproduce, route to quick or deep track
- Investigation — Gather evidence with language-specific techniques
- Root Cause Analysis — Confirm root cause through hypothesis testing
- Fix & Verify — Fix with proof, regression test, quality check
- Wrap-up & Report — Document trail, capture learnings
Phase 1: Triage & Reproduction
Goal: Understand the bug, reproduce it, and decide the investigation track.
1.1 Parse Context
Extract from
$ARGUMENTS and conversation context:
- Bug description: What is failing? Error messages, symptoms
- Reproduction steps: How to trigger the bug (test command, user action, etc.)
- Environment: Language, framework, test runner, relevant config
- Prior attempts: Has the user already tried fixes? What did not work?
- Deep flag: If
is present, skip triage and go directly to deep track (jump to Phase 2 deep track)--deep
1.2 Reproduce the Bug
Attempt to reproduce before investigating:
- If a failing test was mentioned, run it:
# Run the specific test to confirm the failure <test-runner> <test-file>::<test-name> - If an error was described, find and trigger it
- If neither, search for related test files and run them
Capture the exact error output — this is your primary evidence.
If the bug cannot be reproduced:
- Prompt the user for more context
- Check if it is environment-specific or intermittent
- Note "not yet reproduced" in the hypothesis journal
1.3 Form Initial Hypothesis
Based on the error message and context, form your first hypothesis:
### H1: [Title] - Hypothesis: [What you think is causing the bug] - Evidence for: [What supports this — error message, stack trace, etc.] - Evidence against: [Anything that contradicts it — if none yet, say "None yet"] - Test plan: [Specific steps to confirm or reject] - Status: Pending
1.4 Route to Track
Quick-fix signals (ALL must be true):
- Clear, specific error message pointing to exact location
- Localized to 1-2 files (not spread across the codebase)
- Obvious fix visible from reading the error location
- No concurrency, timing, or state management involved
Deep-track signals (ANY one triggers deep track):
- Bug spans 3+ files or modules
- Root cause unclear from the error message alone
- Intermittent or environment-dependent failure
- Involves concurrency, timing, shared state, or async behavior
- User already tried fixes that did not work
- Generic error message (e.g., "null reference" without clear origin)
- Stack trace points to library/framework code rather than application code
Present your assessment: Prompt the user:
- Summarize the bug and your initial hypothesis
- Recommend quick or deep track with justification
- Options: "Quick track (Recommended)" / "Deep track" or "Deep track (Recommended)" / "Quick track" — depending on your assessment
- Let the user override your recommendation
Track escalation rule: If during quick track execution, 2 hypotheses are rejected, automatically escalate to deep track. Preserve all hypothesis journal entries when escalating.
Phase 2: Investigation
Goal: Gather evidence systematically, guided by language-specific techniques.
2.1 Load Language Reference
Detect the primary language of the bug's context and load the appropriate reference:
| Language | Reference |
|---|---|
| Python | Read the python-debugging reference from |
| TypeScript / JavaScript | Read the typescript-debugging reference from |
| Other / Multiple | Use the general debugging guidance below |
Always also apply the general debugging guidance as a supplement when using a language-specific reference.
2.2 Quick Track Investigation
For quick-track bugs, investigate directly:
- Read the error location — the file and function where the error occurs
- Read the immediate callers — 1-2 files up the call chain
- Check recent changes —
for the affected filesgit log --oneline -5 -- <file> - Update hypothesis — does the evidence support H1? Add evidence for/against
Proceed to Phase 3 (quick track).
2.3 Deep Track Investigation
For deep-track bugs, use parallel exploration workers:
-
Plan exploration areas — identify 2-3 focus areas based on the bug:
- Focus 1: The error site and immediate code path
- Focus 2: Data flow and state management leading to the error
- Focus 3: Related subsystems, configuration, or external dependencies
-
Delegate to independent exploration workers:
Delegate to 2-3 read-only exploration workers with distinct focus areas:
Each worker receives: Bug context: [description of the bug and error] Focus area: [specific area for this worker] Investigate this focus area in relation to the bug: - Find all relevant files - Trace the execution/data path - Identify where behavior diverges from expected - Note any suspicious patterns, recent changes, or known issues - Report structured findingsLaunch workers in parallel for independent focus areas.
-
Synthesize exploration results:
- Collect findings from all workers
- Identify convergence (multiple workers pointing to same area)
- Update hypothesis journal with new evidence
- Form additional hypotheses if evidence warrants (aim for 2-3 total)
Proceed to Phase 3 (deep track).
Phase 3: Root Cause Analysis
Goal: Confirm the root cause through systematic hypothesis testing.
3.1 Quick Track Root Cause
For quick-track bugs:
-
Verify the hypothesis:
- Read the specific code identified in Phase 2
- Trace the logic step-by-step
- Confirm that the hypothesized cause produces the observed error
-
If confirmed (Status -> Confirmed):
- Update H1 with confirming evidence
- Proceed to Phase 4
-
If rejected (Status -> Rejected):
- Update H1 with evidence against and reason for rejection
- Form a new hypothesis (H2) based on what you learned
- Investigate H2 following Phase 2 quick track steps
- If H2 is also rejected -> escalate to deep track
- Preserve all journal entries, continue with Phase 2 deep track
3.2 Deep Track Root Cause
For deep-track bugs:
-
Prepare hypotheses for testing:
- You should have 2-3 hypotheses from Phase 2
- Each needs a concrete test plan (how to confirm or reject)
-
Delegate to independent investigation workers:
Delegate to 1-3 investigation workers to test hypotheses in parallel:
Each worker receives: Bug context: [description of the bug and error] Hypothesis to test: [specific hypothesis] Test plan: 1. [Step 1 — e.g., run this specific test with these arguments] 2. [Step 2 — e.g., check git blame for this function] 3. [Step 3 — e.g., trace the data from input to error site] Report your findings with verdict (confirmed/rejected/inconclusive), evidence, and recommendations.Launch workers in parallel when they test independent hypotheses.
-
Evaluate results:
-
Update hypothesis journal with each worker's findings
-
If one hypothesis is confirmed -> proceed to Phase 4
-
If all are rejected/inconclusive -> apply 5 Whys technique:
Take the strongest "inconclusive" finding and ask "why?" iteratively:
Observed: [what actually happens] Why? -> [first-level cause] Why? -> [second-level cause] Why? -> [root cause] -
Form new hypotheses from 5 Whys analysis and repeat investigation
-
-
If stuck after 2 rounds of investigation: Prompt the user:
- Share all findings and the hypothesis journal
- Options: "Continue investigating", "Try a different angle", "Provide more context"
Phase 4: Fix & Verify
Goal: Fix the root cause and prove the fix works.
4.1 Design the Fix
Before writing any code:
- Explain the root cause — state clearly what is wrong and why
- Explain the fix — describe what will change and WHY it addresses the root cause
- Identify affected files — list every file that needs modification
- Consider side effects — could this fix break other behavior?
4.2 Implement the Fix
- Read all files that will be modified before making changes
- Apply the fix — minimal, focused changes
- Match existing patterns — follow the codebase's conventions
4.3 Run Tests
-
Run the originally failing test — it should now pass:
<test-runner> <test-file>::<test-name> -
Run related tests — tests in the same file and nearby test files:
<test-runner> <test-directory> -
If tests fail:
- Determine if the failure is related to the fix or pre-existing
- If related, revise the fix (do NOT revert to a different approach without updating the hypothesis journal)
- If pre-existing, note it but do not let it block the fix
4.4 Write Regression Test
Write a test that would have caught this bug:
- The test should fail WITHOUT the fix (verifying it tests the right thing)
- The test should pass WITH the fix
- The test should be minimal — test the specific behavior that was broken
- Place it in the appropriate test file following project conventions
4.5 Deep Track: Quality Check and Related Issues
Deep track only — skip on quick track.
-
Load code-quality skill: Refer to the code-quality skill for code quality principles. Review the fix against code quality principles.
-
Check for related issues:
- Search for the same pattern elsewhere in the codebase
- If the same bug exists in other locations, report them to the user
- Prompt the user: "Fix all related instances now?" / "Fix only the reported bug" / "Create tasks for related fixes"
Phase 5: Wrap-up & Report
Goal: Document the investigation trail and capture learnings.
5.1 Bug Fix Summary
Present to the user:
## Bug Fix Summary ### Bug [One-line description of the bug] ### Root Cause [What was actually wrong and why] ### Fix Applied [What was changed, with file:line references] ### Tests - [Originally failing test]: Now passing - [Regression test added]: [test name and location] - [Related tests]: All passing ### Track [Quick / Deep] [Escalated from quick: Yes/No]
5.2 Hypothesis Journal Recap
Present the complete hypothesis journal showing the investigation trail:
### Investigation Trail #### H1: [Title] - Status: Confirmed / Rejected - [Key evidence summary] #### H2: [Title] (if applicable) - Status: Confirmed / Rejected - [Key evidence summary] [... additional hypotheses ...]
5.3 Project Learnings
Refer to the project-learnings skill to evaluate whether this bug reveals project-specific knowledge worth capturing.
Follow its workflow to evaluate the finding. Common debugging discoveries that qualify:
- Surprising API behavior specific to this project
- Undocumented conventions that caused the bug
- Architectural constraints that are not obvious from the code
5.4 Deep Track: Future Recommendations
Deep track only:
If the investigation revealed broader concerns, present recommendations:
- Architecture improvements to prevent similar bugs
- Missing test coverage areas
- Documentation gaps
- Monitoring or alerting suggestions
5.5 Next Steps
Prompt the user:
- "Commit the fix" — proceed to commit workflow
- "Review the changes" — show a diff of all modifications
- "Run full test suite" — run the complete test suite to verify no regressions
- "Done" — wrap up the session
Hypothesis Journal
The hypothesis journal is the core artifact of this workflow. Maintain it throughout all phases.
Format
## Hypothesis Journal — [Bug Title] ### H1: [Descriptive Title] - **Hypothesis:** [What is causing the bug — be specific] - **Evidence for:** [Supporting observations with file:line references] - **Evidence against:** [Contradicting observations] - **Test plan:** [Concrete steps to confirm or reject] - **Status:** Pending / Confirmed / Rejected - **Notes:** [Additional context, timestamps, worker findings] ### H2: [Descriptive Title] [Same format]
Rules
- Minimum hypotheses: 1 on quick track, 2-3 on deep track
- Never delete entries — rejected hypotheses are valuable context
- Update incrementally — add evidence as you find it, do not wait
- Be specific — "the data is wrong" is not a hypothesis; "processOrder receives dollars but expects cents" is
Track Reference
| Aspect | Quick Track | Deep Track |
|---|---|---|
| Investigation | Read error location + 1-2 callers | 2-3 exploration workers in parallel |
| Hypotheses | Minimum 1 | Minimum 2-3 |
| Root cause testing | Manual verification | 1-3 investigation workers in parallel |
| Fix validation | Run failing + related tests | Tests + code-quality skill + related issue scan |
| Auto-escalation | After 2 rejected hypotheses | N/A |
| Typical complexity | Off-by-one, typo, wrong argument, missing null check | Race condition, state corruption, multi-file logic error |
Worker Coordination
Exploration Workers (Phase 2, deep track)
These are read-only workers that explore codebase areas. Give each a distinct focus area related to the bug. They report structured findings.
Investigation Workers (Phase 3, deep track)
These are workers with shell access for running tests and git commands, but no write access — they investigate and report evidence, they do not fix code. Give each a specific hypothesis to test.
Error Handling
- If a worker fails, continue with remaining workers' results
- If all workers fail in a phase, fall back to manual investigation
- Never block on a single worker — partial results are better than no results
Error Recovery
If any phase fails:
- Explain what went wrong and what you have learned so far
- Present the hypothesis journal as-is
- Prompt the user:
- "Retry this phase"
- "Skip to fix" (if you have enough evidence)
- "Provide more context"
- "Abort"
General Debugging Reference
Language-agnostic debugging strategies, systematic investigation methods, and common bug categories.
Systematic Debugging Methods
Binary Search for Bugs
Narrow the problem space by half at each step:
- Identify the full code path from input to incorrect output
- Place a diagnostic check at the midpoint
- Determine which half contains the bug
- Repeat in the failing half until the exact location is found
Works for: data transformation pipelines, middleware chains, multi-step processes.
Git Bisect
Automate binary search through commit history:
git bisect start git bisect bad # current commit is broken git bisect good <known-good-sha> # this commit was working # Automated: let a test command drive the search git bisect run <test-command> # returns 0 = good, non-0 = bad git bisect reset # when done
Best for: regressions where you know "it used to work."
Delta Debugging
Minimize the input that triggers the bug:
- Start with the full failing input
- Remove half the input — does it still fail?
- If yes, keep the smaller input and repeat
- If no, restore and try removing the other half
- Continue until you find the minimal failing case
Works for: large inputs, complex configurations, test case reduction.
Rubber Duck Debugging
Explain the problem out loud (or in writing), step by step:
- State what the code is supposed to do
- Walk through the actual execution, line by line
- At each step, explain what the state should be vs. what it is
- The discrepancy often reveals itself during the explanation
5 Whys
Drill past symptoms to root causes:
Bug: Users see a 500 error on checkout Why? -> The payment API call throws a timeout Why? -> The request takes >30 seconds Why? -> The order total calculation is O(n^2) Why? -> It recalculates item prices for each item pair Why? -> The discount logic compares every item against every other item Root cause: Quadratic discount calculation algorithm
Stop when the answer is something you can directly fix.
Reading Stack Traces
Universal Patterns
| Element | What It Tells You |
|---|---|
| Error type/name | Category of failure (null access, type mismatch, etc.) |
| Error message | Specific details about what went wrong |
| File path + line number | Where the error was thrown |
| Function/method name | What was executing when it failed |
| Frame ordering | The call chain that led to the error |
What Stack Traces Cannot Tell You
- Why the wrong value got there (you need to trace backwards)
- When the state became corrupted (may have happened much earlier)
- Where in async code the real problem is (async gaps in traces)
- Whether a caught-and-rethrown error lost its original context
Investigation Strategy
- Read the error message first — it often contains the key clue
- Find your code in the trace (skip framework/library frames)
- Read the immediate caller — what arguments were passed?
- Check the state at that point — are variables what you expect?
- Trace backwards from the error to where the data originated
Bug Categories
Off-by-One Errors
Symptoms: Missing first/last element, array index out of bounds, fencepost errors.
Check for:
vs<
in loop conditions<=- 0-based vs 1-based indexing confusion
- Inclusive vs exclusive range boundaries
- Empty collection edge case (length 0)
Null/Undefined/None Errors
Symptoms: Null reference exceptions, "undefined is not a function," AttributeError.
Check for:
- Uninitialized variables
- Missing return values (functions that implicitly return null/undefined)
- Optional fields accessed without guards
- API responses with unexpected null fields
- Database queries returning no results
Race Conditions
Symptoms: Intermittent failures, works in debugger but fails normally, order-dependent results.
Check for:
- Shared mutable state accessed concurrently
- Missing locks/synchronization
- Read-then-write without atomicity
- Callback ordering assumptions
- File system operations assuming sequential access
Resource Leaks
Symptoms: Slow degradation, eventual crashes, "too many open files," memory growth.
Check for:
- File handles not closed (missing
/close()
/with
)using - Database connections not returned to pool
- Event listeners added but never removed
- Timers/intervals not cleared
- Temporary files not cleaned up
State Corruption
Symptoms: Inconsistent data, works sometimes but not always, cascade of errors after a specific action.
Check for:
- Mutation of shared objects
- Missing deep copies (aliased references)
- Partial updates (crash between related writes)
- Cache invalidation issues
- Global/singleton state modified by multiple code paths
Diagnostic Logging Strategy
Targeted Logging
Log at decision points and data boundaries:
[ENTRY] function_name called with: key_arg=value [BRANCH] taking path X because condition=value [DATA] received from external: summary_of_data [EXIT] function_name returning: summary_of_result
Logging Anti-Patterns
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Logging everything | Noise hides signal | Log at boundaries and decision points |
| Logging sensitive data | Security risk | Redact or hash sensitive fields |
| Logging inside tight loops | Performance impact, massive output | Log summary after loop, or sample every Nth iteration |
| Logging without context | "Error occurred" is useless | Include function name, key parameters, state |
| Leaving debug logs in code | Clutters production output | Use conditional debug level, remove before commit |
Effective Diagnostic Pattern
- Before the suspected area: Log inputs and state
- At decision points: Log which branch was taken and why
- After the suspected area: Log outputs and state
- Compare: Are the inputs/outputs what you expect at each point?
Investigation Checklist
Before proposing a fix, verify you can answer:
Understanding the Bug
- Can you reproduce the bug reliably?
- What is the expected behavior vs actual behavior?
- When did this start happening? (regression or latent bug?)
- Does it happen always, sometimes, or in specific conditions?
Root Cause Identification
- Have you identified the specific line(s) causing the issue?
- Do you understand WHY those lines produce the wrong result?
- Is this the root cause, or a symptom of a deeper issue?
- Could this same root cause affect other code paths?
Fix Validation
- Does the fix address the root cause, not just the symptom?
- Could the fix introduce new bugs? (side effects, changed behavior)
- Are there existing tests that should have caught this?
- Does the fix handle all edge cases for this code path?
Broader Impact
- Are there similar patterns elsewhere that might have the same bug?
- Does the fix require updates to documentation or configuration?
- Could this be a regression? If so, what change introduced it?
- Is a new test needed to prevent this from recurring?
Integration Notes
Capabilities Needed
This skill requires the following capabilities from the host environment:
- File system access: Read, write, and modify files in the project directory
- Search: Search for files by name patterns and search within file contents
- Shell execution: Run shell commands (tests, git operations, diagnostic commands)
- Parallel delegation: Ability to delegate work to independent sub-workers for deep track investigation
- User interaction: Prompt the user for decisions, track routing, and additional context
Adaptation Guidance
- Phase 2 (Deep Track): The parallel exploration worker pattern can be replaced with sequential exploration if parallel delegation is not supported.
- Phase 3 (Deep Track): The parallel investigation worker pattern can be replaced with sequential hypothesis testing if parallel delegation is not supported.
- Language references: Python and TypeScript debugging references are provided in the
directory. For other languages, use the general debugging guidance inlined above.references/ - Quick track: Can be executed entirely without parallel delegation capabilities.