Claude-skill-registry git-bisect-debugging
Use when debugging regressions or identifying which commit introduced a bug - provides systematic workflow for git bisect with automated test scripts, manual verification, or hybrid approaches. Can be invoked from systematic-debugging as a debugging technique, or used standalone when you know the issue is historical.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/git-bisect-debugging" ~/.claude/skills/majiayu000-claude-skill-registry-git-bisect-debugging && rm -rf "$T"
skills/data/git-bisect-debugging/SKILL.mdGit Bisect Debugging
Overview
Systematically identify which commit introduced a bug or regression using git bisect. This skill provides a structured workflow for automated, manual, and hybrid bisect approaches.
Core principle: Binary search through commit history to find the exact commit that introduced the issue. Main agent orchestrates, subagents execute verification at each step.
Announce at start: "I'm using git-bisect-debugging to find which commit introduced this issue."
Quick Reference
| Phase | Key Activities | Output |
|---|---|---|
| 1. Setup & Verification | Identify good/bad commits, verify clean state | Confirmed commit range |
| 2. Strategy Selection | Choose automated/manual/hybrid approach | Test script or verification steps |
| 3. Execution | Run bisect with subagents | First bad commit hash |
| 4. Analysis & Handoff | Show commit details, analyze root cause | Root cause understanding |
MANDATORY Requirements
These are non-negotiable. No exceptions for time pressure, production incidents, or "simple" cases:
-
✅ ANNOUNCE skill usage at start:
"I'm using git-bisect-debugging to find which commit introduced this issue." -
✅ CREATE TodoWrite checklist immediately (before Phase 1):
- Copy the exact checklist from "The Process" section below
- Update status as you progress through phases
- Mark phases complete ONLY when finished
-
✅ VERIFY safety checks (Phase 1 - no skipping):
- Working directory MUST be clean (
)git status - Good commit MUST be verified (actually good)
- Bad commit MUST be verified (actually bad)
- If ANY check fails → abort and fix before proceeding
- Working directory MUST be clean (
-
✅ USE AskUserQuestion for strategy selection (Phase 2):
- Present all 3 approaches (automated, manual, hybrid)
- Don't default to automated without asking
- User must explicitly choose
-
✅ LAUNCH subagents for verification (Phase 3):
- Main agent: orchestrates git bisect state
- Subagents: execute verification at each commit (via Task tool)
- NEVER run verification in main context
- Each commit tested in isolated subagent
-
✅ HANDOFF to systematic-debugging (Phase 4):
- After finding bad commit, announce handoff
- Use superpowers:systematic-debugging skill
- Investigate root cause, not just WHAT changed
Red Flags - STOP and Follow the Skill
If you catch yourself thinking ANY of these, you're about to violate the skill:
- "User is in a hurry, I'll skip safety checks" → NO. Run all safety checks.
- "This is simple, no need for TodoWrite" → NO. Create the checklist.
- "I'll just use automated approach" → NO. Use AskUserQuestion.
- "I'll run the test in my context" → NO. Launch subagent.
- "Found the commit, that's enough" → NO. Handoff to systematic-debugging.
- "Working directory looks clean" → NO. Run
to verify.git status - "I'll verify good/bad commits later" → NO. Verify BEFORE starting bisect.
All of these mean: STOP. Follow the 4-phase workflow exactly.
The Process
Copy this checklist to track progress:
Git Bisect Progress: - [ ] Phase 1: Setup & Verification (good/bad commits identified) - [ ] Phase 2: Strategy Selection (approach chosen, script ready) - [ ] Phase 3: Execution (first bad commit found) - [ ] Phase 4: Analysis & Handoff (root cause investigation complete)
Phase 1: Setup & Verification
Purpose: Ensure git bisect is appropriate and safe to run.
Steps:
-
Verify prerequisites:
- Check we're in a git repository
- Verify working directory is clean (
)git status - If uncommitted changes exist, abort and ask user to commit or stash
-
Identify commit range:
- Ask user for good commit (where it worked)
- Suggestions: last release tag, last passing CI, commit from when it worked
- Commands to help:
,git log --oneline
,git taggit log --since="last week"
- Ask user for bad commit (where it's broken)
- Usually:
or a specific commit where issue confirmedHEAD
- Usually:
- Calculate estimated steps: ~log2(commits between good and bad)
- Ask user for good commit (where it worked)
-
Verify the range:
- Checkout bad commit and verify issue exists
- Checkout good commit and verify issue doesn't exist
- If reversed, offer to swap them
- Return to original branch/commit
-
Safety checks:
- Warn if range is >1000 commits (ask for confirmation)
- Verify good commit is ancestor of bad commit
- Note current branch/commit for cleanup later
Output: Confirmed good commit hash, bad commit hash, estimated steps
Phase 2: Strategy Selection
Purpose: Choose the most efficient bisect approach.
Assessment: Can we write an automated test script that deterministically identifies good vs bad?
MANDATORY: Use AskUserQuestion tool to present these three approaches (do NOT default to automated):
AskUserQuestion({ questions: [{ question: "Which git bisect approach should we use?", header: "Strategy", multiSelect: false, options: [ { label: "Automated - test script runs automatically", description: "Fast, no manual intervention. Best for: test failures, crashes, deterministic behavior. Requires: working test script." }, { label: "Manual - you verify each commit", description: "Handles subjective issues. Best for: UI/UX changes, complex scenarios. Requires: you can manually check each commit." }, { label: "Hybrid - script + manual confirmation", description: "Efficient with reliability. Best for: mostly automated but needs judgment. Requires: script for most cases, manual for edge cases." } ] }] })
Three approaches to present:
Approach 1: Automated Bisect
- When to use: Test failure, crash, deterministic behavior
- How it works: Script returns exit 0 (good) or 1 (bad), fully automatic
- Benefits: Fast, no manual intervention, reproducible
- Requirements: Can write a script that runs the test/check
Approach 2: Manual Bisect
- When to use: UI/UX changes, subjective behavior, complex scenarios
- How it works: User verifies at each commit, Claude guides
- Benefits: Handles non-deterministic or subjective issues
- Requirements: User can manually verify each commit
Approach 3: Hybrid Bisect
- When to use: Mostly automatable but needs human judgment
- How it works: Script narrows range, manual verification for final confirmation
- Benefits: Efficiency of automation with reliability of manual check
- Requirements: Can automate most checks, manual for edge cases
If automated or hybrid selected:
Write test script following this template:
#!/bin/bash # Exit codes: 0 = good, 1 = bad, 125 = skip (can't test) # Setup/build (required for each commit) npm install --silent 2>/dev/null || exit 125 # Run the actual test npm test -- path/to/specific-test.js exit $?
Script guidelines:
- Make it deterministic (no random data, use fixed seeds)
- Make it fast (runs ~log2(N) times)
- Exit codes: 0 = good, 1 = bad, 125 = skip
- Include build/setup (each commit might need different deps)
- Test ONE specific thing, not entire suite
- Make it read-only (no data modification)
If manual selected:
Write specific verification steps for subagent:
Good example:
1. Run `npm start` 2. Open browser to http://localhost:3000 3. Click the "Login" button 4. Check if it redirects to /dashboard 5. Respond 'good' if redirect happens, 'bad' if it doesn't
Bad example:
See if the login works
Output: Selected approach, test script (if automated/hybrid), or verification steps (if manual)
Phase 3: Execution
Architecture: Main agent orchestrates bisect, subagents verify each commit in isolated context.
Main agent responsibilities:
- Manage git bisect state (
,start
,good
,bad
)reset - Track progress and communicate remaining steps
- Launch subagents for verification
- Handle errors and cleanup
Subagent responsibilities:
- Execute verification in clean context (no bleeding between commits)
- Report result: "good", "bad", or "skip"
- Provide brief reasoning for result
Execution flow:
-
Main agent: Run
git bisect start <bad> <good> -
Loop until bisect completes:
a. Git checks out a commit to test
b. Main agent launches subagent using Task tool:
For automated:
Prompt: "Run this test script and report the result: <script content> Report 'good' if exit code is 0, 'bad' if exit code is 1, 'skip' if exit code is 125. Include the output of the script in your response."For manual:
Prompt: "We're testing commit <hash> (<message>). Follow these verification steps: <verification steps> Report 'good' if the issue doesn't exist, 'bad' if it does exist. Explain what you observed."For hybrid:
Prompt: "Run this test script: <script content> If exit code is 0 or 1, report that result. If exit code is 125 or script is ambiguous, perform manual verification: <verification steps> Report 'good', 'bad', or 'skip' with explanation."c. Subagent returns: Result ("good", "bad", or "skip") with explanation
d. Main agent: Run
based on resultgit bisect good|bad|skipe. Main agent: Update progress
- Show commit that was tested and result
- Calculate remaining steps:
git bisect log | grep "# .*step" | tail -1 - Example: "Tested commit abc123 (bad). ~4 steps remaining."
f. Repeat until git bisect identifies first bad commit
-
Main agent: Run
to cleanupgit bisect reset -
Main agent: Return to original branch/commit
Error handling during execution:
- Subagent timeout/error: Allow user to manually mark as "skip"
- Build failures: Use
git bisect skip - Too many skips (>5): Suggest manual investigation, show untestable commits
- Bisect interrupted: Ensure
runs in cleanupgit bisect reset
Output: First bad commit hash, bisect log showing the path taken
Phase 4: Analysis & Handoff
Purpose: Present findings and analyze root cause.
Steps:
-
Present the identified commit:
Found first bad commit: <hash> Author: <author> Date: <date> <commit message> Files changed: <list of files from git show --stat> -
Show how to view details:
View full diff: git show <hash> View file at that commit: git show <hash>:<file> -
Handoff to root cause analysis:
- Announce: "Now that we've found the breaking commit at
, I'm using systematic-debugging to analyze why this change caused the issue."<hash> - Use superpowers:systematic-debugging skill to investigate
- Focus analysis on the changes in the bad commit
- Identify the specific line/change that caused the issue
- Explain WHY it broke (not just WHAT changed)
- Announce: "Now that we've found the breaking commit at
Output: Root cause understanding of why the commit broke functionality
Safety & Error Handling
Pre-flight Checks (Phase 1)
- ✅ Working directory is clean
- ✅ In a git repository
- ✅ Good/bad commits exist and are valid
- ✅ Good commit is actually good (issue doesn't exist)
- ✅ Bad commit is actually bad (issue exists)
- ✅ Good is ancestor of bad
- ⚠️ Warn if >1000 commits in range
During Execution (Phase 3)
- Subagent fails: Log error, allow skip or abort
- Build fails: Use
, continuegit bisect skip - Ambiguous result: Use
, max 5 skipsgit bisect skip - Can't determine good/bad: Ask user for guidance
Cleanup & Recovery
- Always run
when done (success or failure)git bisect reset - If interrupted, prompt user to run
git bisect reset - Return to original branch/commit
- If bisect is running and skill exits, warn user to cleanup
Failure Modes
- Too many skips: Report untestable commits, suggest narrower range or manual review
- Good/bad reversed: Detect pattern (all results opposite), offer to restart with swapped inputs
- No bad commit found: Verify bad commit is actually bad, check if issue is environmental
Best Practices
Optimizing Commit Range
- Narrow the range first if possible:
- Issue appeared last week? Start from last week, not 6 months ago
- Use
to find starting pointgit log --since="2 weeks ago" - Use tags/releases as good commits when possible
Writing Good Test Scripts
Do:
- ✅ Test ONE specific thing
- ✅ Make it deterministic (fixed seeds, no random data)
- ✅ Make it fast (runs log2(N) times)
- ✅ Include setup/build in script
- ✅ Use proper exit codes (0=good, 1=bad, 125=skip)
Don't:
- ❌ Run entire test suite (too slow)
- ❌ Depend on external state (databases, APIs)
- ❌ Use random data or timestamps
- ❌ Modify production data
Manual Verification
Be specific:
- ✅ "API returns 200 for GET /health"
- ✅ "Login button redirects to /dashboard"
- ❌ "See if it works"
- ❌ "Check if login is broken"
Give exact steps:
- Run server with
npm start - Open browser to http://localhost:3000
- Click element with id="login-btn"
- Verify URL changes to /dashboard
Common Patterns
| Issue Type | Recommended Approach | Script/Steps Example |
|---|---|---|
| Test failure | Automated | |
| Crash/error | Automated | `node app.js 2>&1 |
| Performance | Automated | |
| UI/UX change | Manual | "Click X, verify Y appears" |
| Behavior change | Manual or Hybrid | Script to check, manual to confirm subjective aspects |
Progress Communication
- After each step: "Tested commit abc123 (<result>). ~X steps remaining."
- Show bisect log periodically:
git bisect log - Estimate remaining steps: log2(commits in range)
- Example: 100 commits → ~7 steps, 1000 commits → ~10 steps
Common Rationalizations (Resist These!)
| Rationalization | Reality | What to Do Instead |
|---|---|---|
| "User is in a hurry, skip safety checks" | Broken bisect from dirty state wastes MORE time | Run all Phase 1 checks. Always. |
| "This is simple, no need for TodoWrite" | You'll skip phases without tracking | Create checklist immediately |
| "I'll just use automated approach" | User might prefer manual for vague issues | Use AskUserQuestion tool |
| "I'll run the test in my context" | Context bleeding between commits breaks bisect | Launch subagent for each verification |
| "Working directory looks clean" | Assumptions cause failures | Run to verify |
| "I'll verify good/bad commits later" | Starting with wrong good/bad wastes all steps | Verify BEFORE |
| "Found the commit, user knows why" | User asked to FIND it, not debug it | Hand off to systematic-debugging |
| "Production incident, no time for process" | Skipping process in incidents causes MORE incidents | Follow workflow. It's faster. |
| "I remember from baseline, no need to test" | Skills evolve, baseline was different session | Test at each commit with subagent |
If you catch yourself rationalizing, STOP. Go back to MANDATORY Requirements section.
Integration with Other Skills
Called BY systematic-debugging
When systematic-debugging determines an issue is historical:
systematic-debugging detects: - Issue doesn't exist in commit from 2 weeks ago - Issue exists now → Suggests: "This appears to be a regression. I'm using git-bisect-debugging to find when it was introduced." → Invokes: git-bisect-debugging skill → Returns: First bad commit for analysis → Resumes: systematic-debugging analyzes the breaking change
Calls systematic-debugging
In Phase 4, after finding the bad commit:
git-bisect-debugging completes: → Announces: "Found commit abc123. Using systematic-debugging to analyze root cause." → Invokes: superpowers:systematic-debugging → Context: "Focus on changes in commit abc123" → Goal: Understand WHY the change broke functionality
Limitations (By Design)
This skill focuses on straightforward scenarios. It does NOT handle:
- ❌ Complex merge commit issues (would need
)--first-parent - ❌ Flaky/intermittent test failures (would need statistical approaches)
- ❌ Build system failures across many commits (would need advanced skip strategies)
For these scenarios, manual git bisect with user guidance is recommended.
Example Workflows
Example 1: Automated Test Failure
User: "The login test started failing sometime in the last 50 commits." Claude: "I'm using git-bisect-debugging to find which commit introduced this issue." [Phase 1: Setup] - git status → clean - Good commit: v1.2.0 tag (last release) - Bad commit: HEAD - Verify: checkout v1.2.0, run test → passes - Verify: checkout HEAD, run test → fails - Range: 47 commits, ~6 steps estimated [Phase 2: Strategy Selection] AskUserQuestion: Which approach? User selects: Automated Script created: #!/bin/bash npm install --silent 2>/dev/null || exit 125 npm test -- tests/login.spec.js exit $? [Phase 3: Execution] Step 1: Launch subagent for commit abc123 Subagent: "Test failed. Exit code 1. bad" Main: git bisect bad → ~3 steps remaining Step 2: Launch subagent for commit def456 Subagent: "Test passed. Exit code 0. good" Main: git bisect good → ~2 steps remaining Step 3: Launch subagent for commit ghi789 Subagent: "Test failed. Exit code 1. bad" Main: git bisect bad → ~1 step remaining Step 4: Launch subagent for commit jkl012 Subagent: "Test passed. Exit code 0. good" Main: git bisect good Git bisect identifies: ghi789 is first bad commit [Phase 4: Analysis] Found first bad commit: ghi789 Author: John Doe Date: 2025-11-10 feat: update authentication middleware Files changed: src/auth/middleware.js | 12 ++++--- tests/login.spec.js | 3 +- Using systematic-debugging to analyze why this change broke the test... [Hands off to systematic-debugging skill]
Example 2: Manual UI Regression
User: "The dashboard layout looks wrong, but I'm not sure when it broke." Claude: "I'm using git-bisect-debugging to find which commit introduced this issue." [Phase 1: Setup] - git status → clean - Good commit: 2 weeks ago (user remembers it working) - Bad commit: HEAD - Range: 89 commits, ~7 steps estimated [Phase 2: Strategy Selection] AskUserQuestion: Which approach? User selects: Manual Verification steps created: 1. Run `npm run dev` 2. Open browser to http://localhost:3000/dashboard 3. Check if the sidebar and main content are side-by-side 4. Respond 'good' if layout is correct, 'bad' if sidebar overlaps content [Phase 3: Execution] Step 1: Checkout commit abc123 Launch subagent with verification steps User verifies: "Layout is correct. good" Main: git bisect good → ~3 steps remaining Step 2: Checkout commit def456 Launch subagent with verification steps User verifies: "Sidebar overlaps content. bad" Main: git bisect bad → ~2 steps remaining ... continues until first bad commit found [Phase 4: Analysis] Found first bad commit: mno345 Author: Jane Smith Date: 2025-11-08 refactor: migrate to CSS Grid layout Files changed: src/components/Dashboard.css | 25 +++++++++++---------- Using systematic-debugging to analyze the CSS changes... [Hands off to systematic-debugging skill]
Example 3: Hybrid Performance Regression
User: "API response times got slower recently." Claude: "I'm using git-bisect-debugging to find which commit introduced this issue." [Phase 1: Setup] - git status → clean - Good commit: last release tag v2.1.0 - Bad commit: HEAD - Range: 120 commits, ~7 steps estimated [Phase 2: Strategy Selection] AskUserQuestion: Which approach? User selects: Hybrid Script created: #!/bin/bash npm install --silent 2>/dev/null || exit 125 # Run benchmark 3 times, take average total=0 for i in 1 2 3; do time=$(npm run benchmark:api 2>/dev/null | grep "response_time" | awk '{print $2}') [ -z "$time" ] && exit 125 # Can't test total=$(echo "$total + $time" | bc) done avg=$(echo "scale=2; $total / 3" | bc) # Threshold: 500ms is acceptable if (( $(echo "$avg > 500" | bc -l) )); then exit 1 # bad (too slow) else exit 0 # good (fast enough) fi Manual fallback steps: "If script is ambiguous, manually test API and verify response time is <500ms" [Phase 3: Execution] Steps proceed with script automation... If script returns 125 (can't test), subagent asks user to manually verify [Phase 4: Analysis] Found first bad commit: pqr678 Author: Bob Johnson Date: 2025-11-11 feat: add caching layer for user preferences Files changed: src/api/middleware/cache.js | 45 ++++++++++++++++++++++++++++++++ Using systematic-debugging to analyze the caching implementation... [Reveals: Cache lookup is synchronous and blocking, causing slowdown]
Troubleshooting
"Good and bad are reversed"
If early results suggest good/bad are swapped:
- Stop bisect
- Verify issue description is correct
- Swap good/bad commits and restart
"Too many skips, can't narrow down"
If >5 commits skipped:
- Review skipped commits manually
- Check if builds are broken in that range
- Consider narrowing the range or manual investigation
"Bisect is stuck/interrupted"
If bisect state is corrupted or interrupted:
git bisect reset # Clean up bisect state git checkout main # Return to main branch # Restart bisect with better range/script
"Subagent is taking too long"
- Set reasonable timeout for verification
- If automated: optimize test script
- If manual: simplify verification steps
- Consider marking commit as 'skip'
Summary
When to use: Historical bugs, regressions, "when did this break" questions
Key strengths:
- ✅ Systematic binary search (efficient)
- ✅ Subagent isolation (clean context)
- ✅ Automated + manual + hybrid approaches
- ✅ Integrates with systematic-debugging
Remember:
- Always verify good is good, bad is bad
- Keep test scripts deterministic and fast
- Use subagents for each verification step
- Clean up with
git bisect reset - Hand off to systematic-debugging for root cause