Agent-alchemy bug-killer

install

source · Clone the upstream repo

git clone https://github.com/sequenzia/agent-alchemy

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/ported/20260310/all/skills-nested/bug-killer" ~/.claude/skills/sequenzia-agent-alchemy-bug-killer-31edd4 && rm -rf "$T"

manifest: ported/20260310/all/skills-nested/bug-killer/SKILL.md

source content

Bug Killer -- Hypothesis-Driven Debugging Workflow

Execute a systematic debugging workflow that enforces investigation before fixes. Every bug gets a hypothesis journal, evidence gathering, and root cause confirmation before any code changes.

Phase Overview

Triage & Reproduction -- Understand, reproduce, route to quick or deep track
Investigation -- Gather evidence with language-specific techniques
Root Cause Analysis -- Confirm root cause through hypothesis testing
Fix & Verify -- Fix with proof, regression test, quality check
Wrap-up & Report -- Document trail, capture learnings

Phase 1: Triage & Reproduction

Goal: Understand the bug, reproduce it, and decide the investigation track.

1.1 Parse Context

Extract from

$ARGUMENTS

and conversation context:

Bug description: What's failing? Error messages, symptoms
Reproduction steps: How to trigger the bug (test command, user action, etc.)
Environment: Language, framework, test runner, relevant config
Prior attempts: Has the user already tried fixes? What didn't work?
Deep flag: If
```
--deep
```
is present, skip triage and go directly to deep track (jump to Phase 2 deep track)

1.2 Reproduce the Bug

Attempt to reproduce before investigating:

If a failing test was mentioned, run it:

# Run the specific test to confirm the failure
<test-runner> <test-file>::<test-name>

If an error was described, find and trigger it
If neither, search for related test files and run them

Capture the exact error output -- this is your primary evidence.

If the bug cannot be reproduced:

Ask the user for more context
Check if it's environment-specific or intermittent
Note "not yet reproduced" in the hypothesis journal

1.3 Form Initial Hypothesis

Based on the error message and context, form your first hypothesis:

### H1: [Title]
- Hypothesis: [What you think is causing the bug]
- Evidence for: [What supports this — error message, stack trace, etc.]
- Evidence against: [Anything that contradicts it — if none yet, say "None yet"]
- Test plan: [Specific steps to confirm or reject]
- Status: Pending

1.4 Route to Track

Quick-fix signals (ALL must be true):

Clear, specific error message pointing to exact location
Localized to 1-2 files (not spread across the codebase)
Obvious fix visible from reading the error location
No concurrency, timing, or state management involved

Deep-track signals (ANY one triggers deep track):

Bug spans 3+ files or modules
Root cause unclear from the error message alone
Intermittent or environment-dependent failure
Involves concurrency, timing, shared state, or async behavior
User already tried fixes that didn't work
Generic error message (e.g., "null reference" without clear origin)
Stack trace points to library/framework code rather than application code

Present your assessment to the user:

Summarize the bug and your initial hypothesis
Recommend quick or deep track with justification
Options: "Quick track (Recommended)" / "Deep track" or vice versa depending on your assessment
Let the user override your recommendation

Track escalation rule: If during quick track execution, 2 hypotheses are rejected, automatically escalate to deep track. Preserve all hypothesis journal entries when escalating.

Phase 2: Investigation

Goal: Gather evidence systematically, guided by language-specific techniques.

2.1 Load Language Reference

Detect the primary language of the bug's context and load the appropriate reference:

Language	Reference File
Python	Load `references/python-debugging.md` from this skill
TypeScript / JavaScript	Load `references/typescript-debugging.md` from this skill
Other / Multiple	Use the general debugging techniques below

Always also apply general debugging techniques as a supplement when using a language-specific reference.

General Debugging Techniques

Systematic Methods:

Binary Search for Bugs -- Narrow the problem space by half at each step: identify the full code path, place a diagnostic check at the midpoint, determine which half contains the bug, repeat
Git Bisect -- Automate binary search through commit history when you know "it used to work"
Delta Debugging -- Minimize the input that triggers the bug by progressively removing halves
5 Whys -- Drill past symptoms to root causes by asking "why?" iteratively until you reach something directly fixable

Reading Stack Traces:

Element	What It Tells You
Error type/name	Category of failure (null access, type mismatch, etc.)
Error message	Specific details about what went wrong
File path + line number	Where the error was thrown
Function/method name	What was executing when it failed
Frame ordering	The call chain that led to the error

What stack traces cannot tell you:

Why the wrong value got there (trace backwards)
When the state became corrupted (may have happened earlier)
Where in async code the real problem is (async gaps)

Bug Categories:

Off-by-One -- Check
```
<
```
vs
```
<=
```
, 0-based vs 1-based, inclusive vs exclusive ranges
Null/Undefined/None -- Uninitialized variables, missing return values, optional fields without guards
Race Conditions -- Shared mutable state, missing locks, read-then-write without atomicity
Resource Leaks -- File handles not closed, connections not returned, event listeners not removed
State Corruption -- Mutation of shared objects, missing deep copies, partial updates

Diagnostic Logging Strategy: Log at decision points and data boundaries:

[ENTRY] function_name called with: key_arg=value
[BRANCH] taking path X because condition=value
[DATA] received from external: summary_of_data
[EXIT] function_name returning: summary_of_result

Investigation Checklist: Before proposing a fix, verify you can answer:

Can you reproduce the bug reliably?
What is the expected vs actual behavior?
Have you identified the specific line(s) causing the issue?
Do you understand WHY those lines produce the wrong result?
Is this the root cause, or a symptom of a deeper issue?
Could this same root cause affect other code paths?

2.2 Quick Track Investigation

For quick-track bugs, investigate directly:

Read the error location -- the file and function where the error occurs
Read the immediate callers -- 1-2 files up the call chain
Check recent changes --
```
git log --oneline -5 -- <file>
```
for the affected files
Update hypothesis -- does the evidence support H1? Add evidence for/against

Proceed to Phase 3 (quick track).

2.3 Deep Track Investigation

For deep-track bugs, use parallel exploration agents:

Plan exploration areas -- identify 2-3 focus areas based on the bug:
- Focus 1: The error site and immediate code path
- Focus 2: Data flow and state management leading to the error
- Focus 3: Related subsystems, configuration, or external dependencies
Launch exploration agents:

Spawn 2-3 read-only exploration agents (refer to the code-explorer agent from the core-tools package):

Each agent receives:
- Bug context: description of the bug and error
- Focus area: specific area for that agent
- Instructions to find all relevant files, trace execution/data paths, identify where behavior diverges from expected, note suspicious patterns or recent changes, and report structured findings
Launch agents in parallel for independent focus areas.
Synthesize exploration results:
- Collect findings from all agents
- Identify convergence (multiple agents pointing to same area)
- Update hypothesis journal with new evidence
- Form additional hypotheses if evidence warrants (aim for 2-3 total)

Proceed to Phase 3 (deep track).

Phase 3: Root Cause Analysis

Goal: Confirm the root cause through systematic hypothesis testing.

3.1 Quick Track Root Cause

For quick-track bugs:

Verify the hypothesis:
- Read the specific code identified in Phase 2
- Trace the logic step-by-step
- Confirm that the hypothesized cause produces the observed error
If confirmed (Status -> Confirmed):
- Update H1 with confirming evidence
- Proceed to Phase 4
If rejected (Status -> Rejected):
- Update H1 with evidence against and reason for rejection
- Form a new hypothesis (H2) based on what you learned
- Investigate H2 following Phase 2 quick track steps
- If H2 is also rejected -> escalate to deep track
- Preserve all journal entries, continue with Phase 2 deep track

3.2 Deep Track Root Cause

For deep-track bugs:

Prepare hypotheses for testing:
- You should have 2-3 hypotheses from Phase 2
- Each needs a concrete test plan (how to confirm or reject)
Launch bug-investigator agents:

Spawn 1-3 bug-investigator agents to test hypotheses in parallel:

Each agent receives:
- Bug context: description of the bug and error
- Hypothesis to test: specific hypothesis
- Test plan with concrete steps (e.g., run a specific test, check git blame, trace data from input to error site)
- Instructions to report findings with verdict (confirmed/rejected/inconclusive), evidence, and recommendations
Launch agents in parallel when they test independent hypotheses.

Evaluate results:

Update hypothesis journal with each agent's findings
If one hypothesis is confirmed -> proceed to Phase 4

If all are rejected/inconclusive -> apply 5 Whys technique:

Take the strongest "inconclusive" finding and ask "why?" iteratively:

Observed: [what actually happens]
Why? -> [first-level cause]
Why? -> [second-level cause]
Why? -> [root cause]

Form new hypotheses from 5 Whys analysis and repeat investigation

If stuck after 2 rounds of investigation:
- Present all findings to the user
- Share the hypothesis journal
- Ask for additional context or direction
- Options: "Continue investigating", "Try a different angle", "Provide more context"

Phase 4: Fix & Verify

Goal: Fix the root cause and prove the fix works.

4.1 Design the Fix

Before writing any code:

Explain the root cause -- state clearly what's wrong and why
Explain the fix -- describe what will change and WHY it addresses the root cause
Identify affected files -- list every file that needs modification
Consider side effects -- could this fix break other behavior?

4.2 Implement the Fix

Read all files that will be modified before making changes
Apply the fix -- minimal, focused changes
Match existing patterns -- follow the codebase's conventions

4.3 Run Tests

Run the originally failing test -- it should now pass:

<test-runner> <test-file>::<test-name>

Run related tests -- tests in the same file and nearby test files:

<test-runner> <test-directory>

If tests fail:
- Determine if the failure is related to the fix or pre-existing
- If related, revise the fix (do NOT revert to a different approach without updating the hypothesis journal)
- If pre-existing, note it but don't let it block the fix

4.4 Write Regression Test

Write a test that would have caught this bug:

The test should fail WITHOUT the fix (verifying it tests the right thing)
The test should pass WITH the fix
The test should be minimal -- test the specific behavior that was broken
Place it in the appropriate test file following project conventions

4.5 Deep Track: Quality Check and Related Issues

Deep track only -- skip on quick track.

Review the fix against code quality principles: Refer to the code-quality skill for review criteria.
Check for related issues:
- Search for the same pattern elsewhere in the codebase
- If the same bug exists in other locations, report them to the user
- Prompt the user: "Fix all related instances now?" / "Fix only the reported bug" / "Create tasks for related fixes"

Phase 5: Wrap-up & Report

Goal: Document the investigation trail and capture learnings.

5.1 Bug Fix Summary

Present to the user:

## Bug Fix Summary

### Bug
[One-line description of the bug]

### Root Cause
[What was actually wrong and why]

### Fix Applied
[What was changed, with file:line references]

### Tests
- [Originally failing test]: Now passing
- [Regression test added]: [test name and location]
- [Related tests]: All passing

### Track
[Quick / Deep] [Escalated from quick: Yes/No]

5.2 Hypothesis Journal Recap

Present the complete hypothesis journal showing the investigation trail:

### Investigation Trail

#### H1: [Title]
- Status: Confirmed / Rejected
- [Key evidence summary]

#### H2: [Title] (if applicable)
- Status: Confirmed / Rejected
- [Key evidence summary]

[... additional hypotheses ...]

5.3 Project Learnings

Refer to the project-learnings skill to evaluate whether this bug reveals project-specific knowledge worth capturing.

Follow its workflow to evaluate the finding. Common debugging discoveries that qualify:

Surprising API behavior specific to this project
Undocumented conventions that caused the bug
Architectural constraints that aren't obvious from the code

5.4 Deep Track: Future Recommendations

Deep track only:

If the investigation revealed broader concerns, present recommendations:

Architecture improvements to prevent similar bugs
Missing test coverage areas
Documentation gaps
Monitoring or alerting suggestions

5.5 Next Steps

Prompt the user with options:

"Commit the fix" -- proceed to commit workflow
"Review the changes" -- show a diff of all modifications
"Run full test suite" -- run the complete test suite to verify no regressions
"Done" -- wrap up the session

Hypothesis Journal

The hypothesis journal is the core artifact of this workflow. Maintain it throughout all phases.

Format

## Hypothesis Journal -- [Bug Title]

### H1: [Descriptive Title]
- **Hypothesis:** [What's causing the bug — be specific]
- **Evidence for:** [Supporting observations with file:line references]
- **Evidence against:** [Contradicting observations]
- **Test plan:** [Concrete steps to confirm or reject]
- **Status:** Pending / Confirmed / Rejected
- **Notes:** [Additional context, timestamps, agent findings]

### H2: [Descriptive Title]
[Same format]

Rules

Minimum hypotheses: 1 on quick track, 2-3 on deep track
Never delete entries -- rejected hypotheses are valuable context
Update incrementally -- add evidence as you find it, don't wait
Be specific -- "the data is wrong" is not a hypothesis; "processOrder receives dollars but expects cents" is

Track Reference

Aspect	Quick Track	Deep Track
Investigation	Read error location + 1-2 callers	2-3 exploration agents in parallel
Hypotheses	Minimum 1	Minimum 2-3
Root cause testing	Manual verification	1-3 bug-investigator agents in parallel
Fix validation	Run failing + related tests	Tests + code-quality review + related issue scan
Auto-escalation	After 2 rejected hypotheses	N/A
Typical complexity	Off-by-one, typo, wrong argument, missing null check	Race condition, state corruption, multi-file logic error

Agent Coordination

Exploration Agents (Phase 2, deep track)

These are read-only agents that explore codebase areas. Refer to the code-explorer agent (from the core-tools package) for this role. Give each a distinct focus area related to the bug. They report structured findings.

Bug Investigators (Phase 3, deep track)

Bug-investigator agents have shell access for running tests and git commands, but no file-write access -- they investigate and report evidence, they don't fix code. Give each a specific hypothesis to test.

Error Handling

If an agent fails, continue with remaining agents' results
If all agents fail in a phase, fall back to manual investigation
Never block on a single agent -- partial results are better than no results

Error Recovery

If any phase fails:

Explain what went wrong and what you've learned so far
Present the hypothesis journal as-is
Ask the user how to proceed:
- "Retry this phase"
- "Skip to fix" (if you have enough evidence)
- "Provide more context"
- "Abort"

Integration Notes

What this component does: Provides a systematic, hypothesis-driven debugging workflow with triage-based routing (quick vs deep track), parallel agent investigation, regression testing, and project learning capture. Capabilities needed: Shell execution (test runners, git commands), file reading/writing/editing, pattern search, sub-agent spawning (code-explorer from core-tools, bug-investigator), user interaction. Adaptation guidance: The quick track is a single-agent workflow; the deep track requires spawning parallel sub-agents for exploration and hypothesis testing. Adapt agent spawning to your platform's sub-task mechanism. Language-specific debugging references are in

references/

. Sub-agent capabilities: Code-explorer agents (from core-tools) need read-only file access and search. Bug-investigator agents need read-only file access, search, and shell execution (for running tests and git commands) but no file-write access.