Agent-alchemy bug-killer
git clone https://github.com/sequenzia/agent-alchemy
T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ported/20260305-191444/skills/bug-killer" ~/.claude/skills/sequenzia-agent-alchemy-bug-killer-be9b34 && rm -rf "$T"
ported/20260305-191444/skills/bug-killer/SKILL.mdBug Killer -- Hypothesis-Driven Debugging Workflow
Execute a systematic debugging workflow that enforces investigation before fixes. Every bug gets a hypothesis journal, evidence gathering, and root cause confirmation before any code changes.
Phase Overview
- Triage & Reproduction -- Understand, reproduce, route to quick or deep track
- Investigation -- Gather evidence with language-specific techniques
- Root Cause Analysis -- Confirm root cause through hypothesis testing
- Fix & Verify -- Fix with proof, regression test, quality check
- Wrap-up & Report -- Document trail, capture learnings
Phase 1: Triage & Reproduction
Goal: Understand the bug, reproduce it, and decide the investigation track.
1.1 Parse Context
Extract from
$ARGUMENTS and conversation context:
- Bug description: What's failing? Error messages, symptoms
- Reproduction steps: How to trigger the bug (test command, user action, etc.)
- Environment: Language, framework, test runner, relevant config
- Prior attempts: Has the user already tried fixes? What didn't work?
- Deep flag: If
is present, skip triage and go directly to deep track (jump to Phase 2 deep track)--deep
1.2 Reproduce the Bug
Attempt to reproduce before investigating:
- If a failing test was mentioned, run it:
# Run the specific test to confirm the failure <test-runner> <test-file>::<test-name> - If an error was described, find and trigger it
- If neither, search for related test files and run them
Capture the exact error output -- this is your primary evidence.
If the bug cannot be reproduced:
- Prompt the user for more context
- Check if it's environment-specific or intermittent
- Note "not yet reproduced" in the hypothesis journal
1.3 Form Initial Hypothesis
Based on the error message and context, form your first hypothesis:
### H1: [Title] - Hypothesis: [What you think is causing the bug] - Evidence for: [What supports this -- error message, stack trace, etc.] - Evidence against: [Anything that contradicts it -- if none yet, say "None yet"] - Test plan: [Specific steps to confirm or reject] - Status: Pending
1.4 Route to Track
Quick-fix signals (ALL must be true):
- Clear, specific error message pointing to exact location
- Localized to 1-2 files (not spread across the codebase)
- Obvious fix visible from reading the error location
- No concurrency, timing, or state management involved
Deep-track signals (ANY one triggers deep track):
- Bug spans 3+ files or modules
- Root cause unclear from the error message alone
- Intermittent or environment-dependent failure
- Involves concurrency, timing, shared state, or async behavior
- User already tried fixes that didn't work
- Generic error message (e.g., "null reference" without clear origin)
- Stack trace points to library/framework code rather than application code
Present your assessment: Prompt the user to choose:
- Summarize the bug and your initial hypothesis
- Recommend quick or deep track with justification
- Quick track (Recommended) / Deep track -- depending on your assessment
- Let the user override your recommendation
Track escalation rule: If during quick track execution, 2 hypotheses are rejected, automatically escalate to deep track. Preserve all hypothesis journal entries when escalating.
Phase 2: Investigation
Goal: Gather evidence systematically, guided by language-specific techniques.
2.1 Load Language Reference
Detect the primary language of the bug's context and load the appropriate reference:
| Language | Reference |
|---|---|
| Python | See the General Debugging Reference below, plus references/python-debugging.md |
| TypeScript / JavaScript | See the General Debugging Reference below, plus references/typescript-debugging.md |
| Other / Multiple | See the General Debugging Reference below |
Always also consult the General Debugging Reference as a supplement when using a language-specific reference.
2.2 Quick Track Investigation
For quick-track bugs, investigate directly:
- Read the error location -- the file and function where the error occurs
- Read the immediate callers -- 1-2 files up the call chain
- Check recent changes --
for the affected filesgit log --oneline -5 -- <file> - Update hypothesis -- does the evidence support H1? Add evidence for/against
Proceed to Phase 3 (quick track).
2.3 Deep Track Investigation
For deep-track bugs, use parallel exploration agents:
-
Plan exploration areas -- identify 2-3 focus areas based on the bug:
- Focus 1: The error site and immediate code path
- Focus 2: Data flow and state management leading to the error
- Focus 3: Related subsystems, configuration, or external dependencies
-
Delegate to 2-3 independent explorer agents to investigate focus areas:
Each agent should receive:
Bug context: [description of the bug and error] Focus area: [specific area for this agent] Investigate this focus area in relation to the bug: - Find all relevant files - Trace the execution/data path - Identify where behavior diverges from expected - Note any suspicious patterns, recent changes, or known issues - Report structured findingsLaunch agents in parallel for independent focus areas.
-
Synthesize exploration results:
- Collect findings from all agents
- Identify convergence (multiple agents pointing to same area)
- Update hypothesis journal with new evidence
- Form additional hypotheses if evidence warrants (aim for 2-3 total)
Proceed to Phase 3 (deep track).
Phase 3: Root Cause Analysis
Goal: Confirm the root cause through systematic hypothesis testing.
3.1 Quick Track Root Cause
For quick-track bugs:
-
Verify the hypothesis:
- Read the specific code identified in Phase 2
- Trace the logic step-by-step
- Confirm that the hypothesized cause produces the observed error
-
If confirmed (Status -> Confirmed):
- Update H1 with confirming evidence
- Proceed to Phase 4
-
If rejected (Status -> Rejected):
- Update H1 with evidence against and reason for rejection
- Form a new hypothesis (H2) based on what you learned
- Investigate H2 following Phase 2 quick track steps
- If H2 is also rejected, escalate to deep track
- Preserve all journal entries, continue with Phase 2 deep track
3.2 Deep Track Root Cause
For deep-track bugs:
-
Prepare hypotheses for testing:
- You should have 2-3 hypotheses from Phase 2
- Each needs a concrete test plan (how to confirm or reject)
-
Delegate to 1-3 independent investigator agents to test hypotheses:
Each agent should receive:
Bug context: [description of the bug and error] Hypothesis to test: [specific hypothesis] Test plan: 1. [Step 1 -- e.g., run this specific test with these arguments] 2. [Step 2 -- e.g., check git blame for this function] 3. [Step 3 -- e.g., trace the data from input to error site] Report your findings with verdict (confirmed/rejected/inconclusive), evidence, and recommendations.Launch agents in parallel when they test independent hypotheses.
-
Evaluate results:
-
Update hypothesis journal with each agent's findings
-
If one hypothesis is confirmed, proceed to Phase 4
-
If all are rejected/inconclusive, apply 5 Whys technique:
Take the strongest "inconclusive" finding and ask "why?" iteratively:
Observed: [what actually happens] Why? -> [first-level cause] Why? -> [second-level cause] Why? -> [root cause] -
Form new hypotheses from 5 Whys analysis and repeat investigation
-
-
If stuck after 2 rounds of investigation:
- Present all findings to the user
- Share the hypothesis journal
- Prompt the user to choose:
- Continue investigating
- Try a different angle
- Provide more context
Phase 4: Fix & Verify
Goal: Fix the root cause and prove the fix works.
4.1 Design the Fix
Before writing any code:
- Explain the root cause -- state clearly what's wrong and why
- Explain the fix -- describe what will change and WHY it addresses the root cause
- Identify affected files -- list every file that needs modification
- Consider side effects -- could this fix break other behavior?
4.2 Implement the Fix
- Read all files that will be modified before making changes
- Apply the fix -- minimal, focused changes
- Match existing patterns -- follow the codebase's conventions
4.3 Run Tests
-
Run the originally failing test -- it should now pass:
<test-runner> <test-file>::<test-name> -
Run related tests -- tests in the same file and nearby test files:
<test-runner> <test-directory> -
If tests fail:
- Determine if the failure is related to the fix or pre-existing
- If related, revise the fix (do NOT revert to a different approach without updating the hypothesis journal)
- If pre-existing, note it but don't let it block the fix
4.4 Write Regression Test
Write a test that would have caught this bug:
- The test should fail WITHOUT the fix (verifying it tests the right thing)
- The test should pass WITH the fix
- The test should be minimal -- test the specific behavior that was broken
- Place it in the appropriate test file following project conventions
4.5 Deep Track: Quality Check and Related Issues
Deep track only -- skip on quick track.
-
Refer to the code-quality skill: Review the fix against code quality principles.
-
Check for related issues:
- Search for the same pattern elsewhere in the codebase
- If the same bug exists in other locations, report them to the user
- Prompt the user to choose:
- Fix all related instances now
- Fix only the reported bug
- Create tasks for related fixes
Phase 5: Wrap-up & Report
Goal: Document the investigation trail and capture learnings.
5.1 Bug Fix Summary
Present to the user:
## Bug Fix Summary ### Bug [One-line description of the bug] ### Root Cause [What was actually wrong and why] ### Fix Applied [What was changed, with file:line references] ### Tests - [Originally failing test]: Now passing - [Regression test added]: [test name and location] - [Related tests]: All passing ### Track [Quick / Deep] [Escalated from quick: Yes/No]
5.2 Hypothesis Journal Recap
Present the complete hypothesis journal showing the investigation trail:
### Investigation Trail #### H1: [Title] - Status: Confirmed / Rejected - [Key evidence summary] #### H2: [Title] (if applicable) - Status: Confirmed / Rejected - [Key evidence summary] [... additional hypotheses ...]
5.3 Project Learnings
Refer to the project-learnings skill to evaluate whether this bug reveals project-specific knowledge worth capturing.
Follow its workflow to evaluate the finding. Common debugging discoveries that qualify:
- Surprising API behavior specific to this project
- Undocumented conventions that caused the bug
- Architectural constraints that aren't obvious from the code
5.4 Deep Track: Future Recommendations
Deep track only:
If the investigation revealed broader concerns, present recommendations:
- Architecture improvements to prevent similar bugs
- Missing test coverage areas
- Documentation gaps
- Monitoring or alerting suggestions
5.5 Next Steps
Prompt the user to choose:
- Commit the fix -- proceed to commit workflow
- Review the changes -- show a diff of all modifications
- Run full test suite -- run the complete test suite to verify no regressions
- Done -- wrap up the session
Hypothesis Journal
The hypothesis journal is the core artifact of this workflow. Maintain it throughout all phases.
Format
## Hypothesis Journal -- [Bug Title] ### H1: [Descriptive Title] - **Hypothesis:** [What's causing the bug -- be specific] - **Evidence for:** [Supporting observations with file:line references] - **Evidence against:** [Contradicting observations] - **Test plan:** [Concrete steps to confirm or reject] - **Status:** Pending / Confirmed / Rejected - **Notes:** [Additional context, timestamps, agent findings] ### H2: [Descriptive Title] [Same format]
Rules
- Minimum hypotheses: 1 on quick track, 2-3 on deep track
- Never delete entries -- rejected hypotheses are valuable context
- Update incrementally -- add evidence as you find it, don't wait
- Be specific -- "the data is wrong" is not a hypothesis; "processOrder receives dollars but expects cents" is
Track Reference
| Aspect | Quick Track | Deep Track |
|---|---|---|
| Investigation | Read error location + 1-2 callers | 2-3 explorer agents in parallel |
| Hypotheses | Minimum 1 | Minimum 2-3 |
| Root cause testing | Manual verification | 1-3 investigator agents in parallel |
| Fix validation | Run failing + related tests | Tests + code-quality skill + related issue scan |
| Auto-escalation | After 2 rejected hypotheses | N/A |
| Typical complexity | Off-by-one, typo, wrong argument, missing null check | Race condition, state corruption, multi-file logic error |
Error Recovery
If any phase fails:
- Explain what went wrong and what you've learned so far
- Present the hypothesis journal as-is
- Prompt the user to choose:
- Retry this phase
- Skip to fix (if you have enough evidence)
- Provide more context
- Abort
General Debugging Reference
Language-agnostic debugging strategies, systematic investigation methods, and common bug categories.
Systematic Debugging Methods
Binary Search for Bugs
Narrow the problem space by half at each step:
- Identify the full code path from input to incorrect output
- Place a diagnostic check at the midpoint
- Determine which half contains the bug
- Repeat in the failing half until the exact location is found
Works for: data transformation pipelines, middleware chains, multi-step processes.
Git Bisect
Automate binary search through commit history:
git bisect start git bisect bad # current commit is broken git bisect good <known-good-sha> # this commit was working # Automated: let a test command drive the search git bisect run <test-command> # returns 0 = good, non-0 = bad git bisect reset # when done
Best for: regressions where you know "it used to work."
Delta Debugging
Minimize the input that triggers the bug:
- Start with the full failing input
- Remove half the input -- does it still fail?
- If yes, keep the smaller input and repeat
- If no, restore and try removing the other half
- Continue until you find the minimal failing case
Works for: large inputs, complex configurations, test case reduction.
Rubber Duck Debugging
Explain the problem out loud (or in writing), step by step:
- State what the code is supposed to do
- Walk through the actual execution, line by line
- At each step, explain what the state should be vs. what it is
- The discrepancy often reveals itself during the explanation
5 Whys
Drill past symptoms to root causes:
Bug: Users see a 500 error on checkout Why? -> The payment API call throws a timeout Why? -> The request takes >30 seconds Why? -> The order total calculation is O(n^2) Why? -> It recalculates item prices for each item pair Why? -> The discount logic compares every item against every other item Root cause: Quadratic discount calculation algorithm
Stop when the answer is something you can directly fix.
Reading Stack Traces
Universal Patterns
| Element | What It Tells You |
|---|---|
| Error type/name | Category of failure (null access, type mismatch, etc.) |
| Error message | Specific details about what went wrong |
| File path + line number | Where the error was thrown |
| Function/method name | What was executing when it failed |
| Frame ordering | The call chain that led to the error |
What Stack Traces CAN'T Tell You
- Why the wrong value got there (you need to trace backwards)
- When the state became corrupted (may have happened much earlier)
- Where in async code the real problem is (async gaps in traces)
- Whether a caught-and-rethrown error lost its original context
Investigation Strategy
- Read the error message first -- it often contains the key clue
- Find your code in the trace (skip framework/library frames)
- Read the immediate caller -- what arguments were passed?
- Check the state at that point -- are variables what you expect?
- Trace backwards from the error to where the data originated
Bug Categories
Off-by-One Errors
Symptoms: Missing first/last element, array index out of bounds, fencepost errors.
Check for:
vs<
in loop conditions<=- 0-based vs 1-based indexing confusion
- Inclusive vs exclusive range boundaries
- Empty collection edge case (length 0)
Null/Undefined/None Errors
Symptoms: Null reference exceptions, "undefined is not a function," AttributeError.
Check for:
- Uninitialized variables
- Missing return values (functions that implicitly return null/undefined)
- Optional fields accessed without guards
- API responses with unexpected null fields
- Database queries returning no results
Race Conditions
Symptoms: Intermittent failures, works in debugger but fails normally, order-dependent results.
Check for:
- Shared mutable state accessed concurrently
- Missing locks/synchronization
- Read-then-write without atomicity
- Callback ordering assumptions
- File system operations assuming sequential access
Resource Leaks
Symptoms: Slow degradation, eventual crashes, "too many open files," memory growth.
Check for:
- File handles not closed (missing
/close()
/with
)using - Database connections not returned to pool
- Event listeners added but never removed
- Timers/intervals not cleared
- Temporary files not cleaned up
State Corruption
Symptoms: Inconsistent data, works sometimes but not always, cascade of errors after a specific action.
Check for:
- Mutation of shared objects
- Missing deep copies (aliased references)
- Partial updates (crash between related writes)
- Cache invalidation issues
- Global/singleton state modified by multiple code paths
Diagnostic Logging Strategy
Targeted Logging
Log at decision points and data boundaries:
[ENTRY] function_name called with: key_arg=value [BRANCH] taking path X because condition=value [DATA] received from external: summary_of_data [EXIT] function_name returning: summary_of_result
Logging Anti-Patterns
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Logging everything | Noise hides signal | Log at boundaries and decision points |
| Logging sensitive data | Security risk | Redact or hash sensitive fields |
| Logging inside tight loops | Performance impact, massive output | Log summary after loop, or sample every Nth iteration |
| Logging without context | "Error occurred" is useless | Include function name, key parameters, state |
| Leaving debug logs in code | Clutters production output | Use conditional debug level, remove before commit |
Effective Diagnostic Pattern
- Before the suspected area: Log inputs and state
- At decision points: Log which branch was taken and why
- After the suspected area: Log outputs and state
- Compare: Are the inputs/outputs what you expect at each point?
Investigation Checklist
Before proposing a fix, verify you can answer:
Understanding the Bug
- Can you reproduce the bug reliably?
- What is the expected behavior vs actual behavior?
- When did this start happening? (regression or latent bug?)
- Does it happen always, sometimes, or in specific conditions?
Root Cause Identification
- Have you identified the specific line(s) causing the issue?
- Do you understand WHY those lines produce the wrong result?
- Is this the root cause, or a symptom of a deeper issue?
- Could this same root cause affect other code paths?
Fix Validation
- Does the fix address the root cause, not just the symptom?
- Could the fix introduce new bugs? (side effects, changed behavior)
- Are there existing tests that should have caught this?
- Does the fix handle all edge cases for this code path?
Broader Impact
- Are there similar patterns elsewhere that might have the same bug?
- Does the fix require updates to documentation or configuration?
- Could this be a regression? If so, what change introduced it?
- Is a new test needed to prevent this from recurring?
Integration Notes
This skill requires the following capabilities:
- File read/write/search: Reading source files, searching for patterns, writing fixes
- Shell execution: Running tests, git commands, reproducing bugs
- User interaction: Confirming track selection, presenting findings, choosing next steps
- Sub-agent delegation: Launching parallel explorer agents (2-3) and investigator agents (1-3) for deep track