git clone https://github.com/not-matthias/dotfiles-nix
T=$(mktemp -d) && git clone --depth=1 https://github.com/not-matthias/dotfiles-nix "$T" && mkdir -p ~/.claude/skills && cp -r "$T/modules/home/programs/cli-agents/shared/skills/bug-finder" ~/.claude/skills/not-matthias-dotfiles-nix-bug-finder && rm -rf "$T"
modules/home/programs/cli-agents/shared/skills/bug-finder/SKILL.mdBug Finder
Find bugs. Verify correctness. Ignore style.
Iron law: never report a bug without a concrete trigger, a real failure mode, and a severe consequence.
Hard cap: 15 findings max per review. If you find more, keep only the highest severity. Inflation erodes trust.
When to Use
- User asks to review a PR, diff, or specific code
- User asks to check code for bugs or correctness
- Before merging or shipping a change
What to Look For
Hunt these (ordered by severity):
- Correctness — Logic errors, off-by-ones, wrong comparisons, unreachable code, broken invariants, incorrect state transitions
- Security — Injection, auth bypass, data exposure, unsafe deserialization, missing input validation at system boundaries, secret leakage
- Concurrency — Race conditions, deadlocks, missing synchronization, shared mutable state without protection
- Error handling — Silent failures, swallowed exceptions, missing error propagation, unhandled edge cases that crash at runtime
- Resource management — Leaks (memory, file handles, connections), missing cleanup, unbounded growth
- API misuse — Wrong function for the job, violated contracts, deprecated calls with known bugs, incorrect argument ordering
- Performance — Only when it causes real problems: algorithmic blowup, N+1 queries, hot-path allocations, unbounded retries
Ignore these (they belong to linters and minimal-diff):
- Style, formatting, whitespace
- Naming preferences
- Code structure opinions
- Missing comments or docs
- Import ordering
- "I would have done it differently" without a concrete failure scenario
Workflow
Step 1: Gather Context
Adapt to what the user gives you:
- PR: Read the full diff via
. Read the PR description. Identify what the change is supposed to do.gh pr diff - Staged changes: Run
(orgit diff --staged
for unstaged). Understand the intent from recent commits or user description.git diff - Pointed code: Read the file(s) the user indicated. Ask what the code is supposed to do if unclear.
Before hunting, identify the critical paths — code that handles auth, payments, data writes, user input, or concurrency. These get the most scrutiny.
Scope rule for diff-based reviews: only flag issues in changed or new code. Pre-existing issues in untouched code are out of scope unless the change makes them reachable in a new way.
Step 2: Hunt
Walk through the code change systematically. For each issue found, record:
- Exact location (
)file:line - What's wrong
- A concrete failure scenario: specific input, state, or sequence of events that triggers the bug
The failure scenario is mandatory. If you cannot construct one, the finding is not real — drop it.
Boundary analysis:
- What happens at boundaries? (empty input, max values, nil/null/None, zero-length, unicode)
- What happens under concurrency? (two requests at once, interrupted operations)
- What happens on failure? (network down, disk full, permission denied, timeout)
- What assumptions does this code make? Are they guaranteed by callers?
- Does the error handling actually handle the error, or does it just log and continue?
Feynman technique — for each non-obvious function or block, ask:
- Why does this line exist? If you can't explain it, that's where bugs hide.
- What if this line were removed? Would anything break? Would the bug surface?
- What if this check didn't exist? What bad state would flow through?
- Does this function's assumption hold across all callers? Trace backwards.
Flow-divergence analysis — look for two code paths that should behave identically but don't. Check: do both paths handle errors the same way? Do both paths update the same state? If one path has a guard, should the other?
For language-specific pitfalls, consult the references:
- Rust: references/rust.md
- TypeScript: references/typescript.md
- Python: references/python.md
- Nix: references/nix.md
Step 3: Skeptic Pass
Re-examine every finding. For each one, ask:
- Can I actually trigger this? Trace the call chain. Is the bad input reachable? Is the race condition possible given the actual usage pattern?
- Does the surrounding code already prevent this? Check for guards, validation, or constraints upstream that make the scenario impossible.
- Am I assuming something about the runtime/framework that isn't true? (e.g., "this isn't thread-safe" when the framework guarantees single-threaded execution)
- Is this a real bug or a theoretical concern? If it requires an attacker with root access or a cosmic ray to trigger, drop it.
Common false-positive anti-patterns — drop findings that match these:
- "Path A does X, path B doesn't" — check if the context makes X unnecessary on path B.
- "This isn't thread-safe" — check if the framework/runtime guarantees single-threaded execution.
- "Missing null check" — check if the type system or upstream validation already prevents null.
- "Resource leak" — check if an RAII guard, defer, or context manager already handles cleanup.
- "Unbounded input" — check if the caller/transport layer enforces limits before this code runs.
Drop any finding that fails the skeptic check. False positives waste the user's time and erode trust.
Step 4: Classify
Assign severity to surviving findings:
| Severity | Meaning | Examples |
|---|---|---|
| CRITICAL | Will break in production. Data loss, security breach, crash. | SQL injection, use-after-free, auth bypass |
| HIGH | Likely to break under realistic conditions. | Race condition on shared state, missing null check on user input |
| MEDIUM | Breaks under edge cases or degrades correctness. | Off-by-one in pagination, resource leak on error path |
| LOW | Unlikely to cause user-visible issues but still wrong. | Redundant check, minor perf issue on cold path |
Step 5: Report
Output findings sorted by severity (worst first). One line per finding:
file:line — [CRITICAL] Description. Breaks when: scenario. file:line — [HIGH] Description. Breaks when: scenario.
End with a summary line:
N findings: X critical, Y high, Z medium, W low
If zero findings survive the skeptic pass, say so:
No issues found.
Rules
- Every finding MUST have a concrete failure scenario. No scenario = not a real finding.
- Do not pad the report. Zero findings is a valid outcome.
- Do not suggest refactors, style changes, or "improvements" unless they fix a bug.
- Do not flag code for being "complex" unless the complexity causes an actual defect.
- When unsure if something is a bug, say so explicitly rather than presenting it as certain.
- Failure scenarios should be specific enough to turn into a test case.
Resources
- references/rust.md — Rust-specific pitfalls (unsafe, lifetimes, Send/Sync, unwrap)
- references/typescript.md — TypeScript pitfalls (any casts, null coercion, async errors)
- references/python.md — Python pitfalls (mutable defaults, GIL, exception handling)
- references/nix.md — Nix pitfalls (lazy eval, infinite recursion, override gotchas)
Common Issues
Issue: Too many findings — review feels noisy.
- Re-run the skeptic pass more aggressively. Group related findings under one root cause.
Issue: User disagrees with a finding.
- Accept it. Do not argue. The user has context you don't.
Issue: Not enough context to judge correctness.
- Ask the user what the code is supposed to do. Don't guess intent.