Claude-swe-workflows bug-hunt
Proactive bug-hunting workflow. Assesses codebase risk through complexity, coverage, and structural analysis, then spawns focused investigators that write reproducing tests to validate suspected bugs. Thoroughness over speed.
git clone https://github.com/chrisallenlane/claude-swe-workflows
T=$(mktemp -d) && git clone --depth=1 https://github.com/chrisallenlane/claude-swe-workflows "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bug-hunt" ~/.claude/skills/chrisallenlane-claude-swe-workflows-bug-hunt && rm -rf "$T"
skills/bug-hunt/SKILL.mdBug Hunt — Proactive Bug Discovery
Systematically hunts for bugs before they reach users. An assessor analyzes the codebase to identify high-risk hotspots by cross-referencing code complexity, test coverage gaps, and structural risk factors. Focused hunters then deep-dive into each hotspot, writing reproducing tests to validate or invalidate suspected bugs.
This is deliberately thorough. Each suspected bug gets a reproducing test — no speculative reports. The goal is confirmed findings with evidence, not a noisy list of maybes.
Workflow Overview
┌──────────────────────────────────────────────────────┐ │ BUG HUNT WORKFLOW │ ├──────────────────────────────────────────────────────┤ │ 1. Determine scope │ │ 2. Spawn assessor (risk analysis) │ │ └─ Output: ranked hotspot list + coverage map │ │ 3. For each hotspot: │ │ └─ Spawn hunter (investigation + repro tests) │ │ └─ Prior findings passed to subsequent hunters │ │ 4. Synthesize findings │ │ 5. Present consolidated findings to user │ │ 6. Optionally route findings to fixers │ │ 7. Optionally commit reproducing tests │ └──────────────────────────────────────────────────────┘
Workflow Details
1. Determine Scope
Default: Production code only. Excluded by default:
- Test code (test files, test fixtures, test helpers)
- Dev-only dependencies and tooling
- Generated code, vendored code
Inform the user of these exclusions.
Ask the user:
- "What is the scope of the hunt?" (entire codebase, specific module, specific area of concern)
- "Are there areas you're particularly worried about?" (recent changes, complex features, etc.)
- "Anything to skip beyond the defaults?"
User concerns influence prioritization but don't replace systematic analysis.
2. Risk Assessment
Spawn a
agent:swe-bug-assessor
You are the risk assessor for a proactive bug hunt. Your analysis will guide focused investigators who will deep-dive into the hotspots you identify. Scope: [entire codebase | user-specified scope] User concerns: [any areas mentioned, or "none specified"] Exclusions: [test code, vendored code, generated code, plus any user additions] Perform your full methodology: 1. Map the codebase — language, framework, structure, entry points 2. Coverage analysis — use instrumented coverage if available, fall back to manual inspection 3. Complexity analysis — identify functions with high cognitive complexity 4. Structural risk analysis — error handling gaps, input validation gaps, shared mutable state, resource management issues, concurrency risks, edge case blindness, consistency gaps 5. Git enrichment (optional) — churn hotspots, recent large changes 6. Cross-reference signals and produce a ranked hotspot list Focus on hotspots where MULTIPLE signals converge — complex AND untested AND structurally risky. Single-signal hotspots are lower priority. Output your full assessment in your standard format.
When the assessor reports back: Review the hotspot list. This drives the investigation phase.
3. Focused Investigation — Hunters
For each hotspot in the assessor's list (ALL priorities), spawn a dedicated
agent:swe-bug-hunter
You are a focused bug hunter investigating a specific hotspot. ## YOUR HOTSPOT Target: [from assessor's report] Files: [from assessor's report] Risk signals: [from assessor's report] Hypothesis: [from assessor's report] Investigation approach: [from assessor's report] ## PRIOR FINDINGS (if any) [Findings from previous hunters — confirmed bugs, patterns observed] ## YOUR MISSION Deep-dive into this hotspot. Systematically probe for bugs. For each suspected issue, write a reproducing test that encodes the correct expected behavior. - If the test FAILS: bug confirmed. Keep the test. Document the finding. - If the test PASSES: hypothesis invalidated. Evaluate whether the test improves coverage: - Covers a previously untested path → keep it - Redundant with existing tests → delete it Every confirmed finding must have a reproducing test. No speculative reports. Note any patterns that might apply to other hotspots.
Run hunters sequentially, not in parallel. Each hunter's findings and pattern observations are passed to the next. This enables cross-hotspot pattern detection — if hunter 2 finds that error handling is broken in module A, hunter 5 (investigating module B which shares error-handling utilities) gets that context.
Pass prior findings to each new hunter. As findings accumulate, each subsequent hunter receives confirmed bugs and observed patterns from previous investigations.
4. Synthesize Findings
After all hunters have reported, synthesize:
Cross-cutting analysis:
- Do confirmed bugs share a common root cause or pattern?
- Are there systemic issues (e.g., a utility function used across 10 modules is buggy, but only one module was a hotspot)?
- Do coverage improvements from invalidated hypotheses reveal areas worth further investigation?
Pattern escalation:
- If multiple hunters report the same pattern (e.g., "error handling is inconsistent"), note this as a systemic issue even if individual instances are low severity
- Systemic patterns may warrant additional investigation or a follow-up
/refactor
5. Present Consolidated Findings
Compile all findings into a single report:
## Bug Hunt Summary Scope: [what was analyzed] Assessment: [N hotspots identified across X files] Hotspots investigated: [N] Confirmed bugs: N (X critical, Y high, Z medium, W low) Coverage improvements: N tests added Systemic patterns: N ## CONFIRMED BUGS ### CRITICAL - **[file:line — description]** - Bug: [concrete description] - Root cause: [why it exists] - Impact: [what happens in practice] - Reproducing test: [test file:test name] - Fix guidance: [what needs to change] ### HIGH [same format] ### MEDIUM [same format] ### LOW [same format] ## SYSTEMIC PATTERNS [Cross-cutting issues observed across multiple hotspots] - [pattern] — observed in [locations] — suggests [recommendation] ## COVERAGE IMPROVEMENTS [Tests added that didn't find bugs but improved coverage] - [test name] in [file] — covers [what] ## SUSPECTED BUT UNCONFIRMED [Issues suspected but not validated with tests — lower confidence] - [description] — couldn't test because [reason] ## AREAS NOT INVESTIGATED [Hotspots deprioritized or areas outside scope that may warrant future attention]
Present to user interactively. Walk through CRITICAL findings first. For each, explain the bug, the impact, and show the reproducing test. Let the user ask questions before moving on.
6. Route to Fixers (Optional)
After presenting findings, ask: "Would you like to route confirmed bugs to agents for fixing?"
If yes:
- For each confirmed bug, determine the appropriate fixer:
- Detect project language and spawn the appropriate SME agent
- Pass the bug description, root cause, reproducing test, and fix guidance
- The reproducing test serves as the acceptance criterion — when it passes, the bug is fixed
- After each fix, spawn
to verify:qa-engineer- The reproducing test now passes
- No other tests broke
- Commit each fix atomically
If no: The report and reproducing tests stand on their own.
7. Commit Reproducing Tests (Optional)
If the user does not route to fixers (or after fixes are complete), ask: "Would you like to commit the reproducing tests? They document the bugs and improve coverage."
If yes:
- Commit all reproducing tests (both bug-confirming and coverage-improving) in a single commit
- Use a descriptive commit message referencing the bug hunt
If no: Leave tests uncommitted for the user to handle.
Agent Coordination
Sequential execution within investigation phase. The assessor runs first, then hunters run sequentially so findings accumulate for pattern detection.
Fresh instances for every agent. Each agent gets a clean context window dedicated entirely to its task.
State to maintain (as orchestrator):
- Assessor's hotspot list and coverage landscape
- Each hunter's findings (accumulating — passed to subsequent hunters)
- Running totals for the summary
- List of all test files created (for commit step)
- Current hotspot count (for progress tracking)
Abort Conditions
Abort investigation of a hotspot:
- Hunter reports hotspot is clean after thorough investigation (expected and fine — skip to next)
Abort entire workflow:
- User interrupts
- Assessor finds no significant hotspots (positive outcome — report clean assessment)
- Critical system error
Do NOT abort for:
- Individual clean hotspots (continue to next)
- Test infrastructure issues on a single hotspot (report as unconfirmed, continue)
- Low-confidence findings (include in report as SUSPECTED BUT UNCONFIRMED)
Integration with Other Skills
Relationship to
:/audit-security
is security-focused — blue team + red team methodology/audit-security
targets correctness bugs — logic errors, edge cases, missing error handling/bug-hunt- Both can find overlapping issues, but with different lenses.
asks "can an attacker exploit this?" while/audit-security
asks "will this fail for a normal user?"/bug-hunt - Run both for comprehensive pre-release assurance
Relationship to
:/bug-fix
is reactive — fixes a known, reported bug/bug-fix
is proactive — finds bugs before they're reported/bug-hunt- Bug hunt findings can feed into
for thorough remediation of complex issues/bug-fix
Relationship to
:/review-test
focuses on test quality — coverage gaps, brittle tests, missing fuzz tests/review-test
uses coverage data as one input signal but focuses on finding actual bugs, not improving test quality/bug-hunt- The coverage improvements from
are a side effect, not the primary goal/bug-hunt
Relationship to
:/refactor
- Systemic patterns identified by
(e.g., "inconsistent error handling across 15 modules") may warrant a follow-up/bug-hunt/refactor
identifies the pattern;/bug-hunt
fixes it systematically/refactor
Example Session
> /bug-hunt What is the scope of the hunt? > Focus on the payment processing module — we've had some edge case reports Anything you're particularly worried about? > Currency conversion and rounding — we support 30+ currencies now Anything to skip beyond the defaults? > No, defaults are fine Starting proactive bug hunt... [Phase 1 — Risk Assessment] Spawning assessor... Assessment report: Coverage: 67% line coverage (instrumented via go test -cover) Hotspots identified: 8 (3 critical, 3 high, 2 medium) CRITICAL-1: payment/converter.go:ConvertAmount (lines 45-112) Signals: 0% test coverage + deep nesting (6 levels) + floating-point arithmetic Hypothesis: Currency conversion may lose precision or handle edge currencies incorrectly CRITICAL-2: payment/checkout.go:ProcessCheckout (lines 23-89) Signals: Partial coverage (happy path only) + error handling inconsistency + 3 bug-fix commits in last month Hypothesis: Error paths may leave order in inconsistent state CRITICAL-3: payment/refund.go:CalculateRefund (lines 15-78) Signals: No test coverage + complex conditional logic + shared mutable state (order object) Hypothesis: Partial refund calculations may be incorrect for multi-item orders HIGH-1: payment/currency/rates.go:FetchRates (lines 30-67) Signals: No error path tests + external API dependency + no timeout handling ... [Phase 2 — Focused Investigation] Spawning hunter for CRITICAL-1 (ConvertAmount)... Test 1: TestConvertAmount_ZeroCurrencyPrecision — FAIL Bug confirmed: JPY (0-decimal currency) conversion multiplies by 100 then divides by 100, losing the original integer value for odd amounts. Impact: ¥101 → ¥100 (1 yen lost per odd-amount transaction) Test 2: TestConvertAmount_SameCurrency — PASS Kept: covers previously untested identity conversion path Test 3: TestConvertAmount_NegativeAmount — FAIL Bug confirmed: Negative amounts (credits/adjustments) bypass validation and produce positive conversion results due to Abs() call without sign restoration. Impact: -$10.00 credit → +€8.50 charge Test 4: TestConvertAmount_UnknownCurrency — PASS Kept: covers error path for unsupported currency codes Findings: 2 confirmed bugs, 2 coverage improvements Spawning hunter for CRITICAL-2 (ProcessCheckout)... Test 1: TestProcessCheckout_PaymentFailureCleanup — FAIL Bug confirmed: When payment gateway returns error after inventory was reserved, inventory reservation is not released. Order stuck in "processing" state. Impact: Phantom inventory holds that never clear (requires manual DB fix) Pattern noted: cleanup-on-error-path is missing in 3 other functions in this package (passed to next hunter) ... [Phase 3 — Synthesis] Confirmed bugs: 7 (3 critical, 3 high, 1 medium) Coverage improvements: 9 tests added Systemic pattern: Error-path cleanup is missing in 5 of 12 functions that reserve resources — this is a codebase-wide pattern, not isolated. ## Bug Hunt Summary [Full report...] Would you like to route confirmed bugs to agents for fixing? > Yes, fix the criticals [Routing CRITICAL bugs to Go SME...] [Reproducing tests serve as acceptance criteria — fix is done when they pass]