Skills-4-SE test-guided-bug-detector
Analyze failing tests to detect functional bugs in code. Takes repository and failing test output as input, analyzes execution behavior, assertions, and stack traces to identify suspicious code regions and root causes. Use when debugging test failures, investigating regression bugs, or understanding why tests fail. Explains the bug mechanism, identifies affected code, and suggests fixes based on test expectations vs actual behavior.
git clone https://github.com/ArabelaTso/Skills-4-SE
T=$(mktemp -d) && git clone --depth=1 https://github.com/ArabelaTso/Skills-4-SE "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/test-guided-bug-detector" ~/.claude/skills/arabelatso-skills-4-se-test-guided-bug-detector && rm -rf "$T"
skills/test-guided-bug-detector/SKILL.mdTest-Guided Bug Detector
Analyze failing tests to detect and explain functional bugs in code.
Overview
When tests fail, they provide valuable clues about bugs in the code. This skill analyzes:
- Test failure output - Error messages, stack traces, assertion failures
- Test expectations - What the test expects to happen
- Actual behavior - What actually happened
- Code execution path - Which code was executed
- Suspicious patterns - Common bug patterns that match the failure
The goal is to identify the root cause bug and explain why the test exposes it.
Bug Detection Workflow
Failing Test Output ↓ Parse Failure Information ↓ Identify Test Expectations ↓ Trace Execution Path ↓ Analyze Discrepancy ↓ Identify Suspicious Code ↓ Explain Bug Mechanism ↓ Suggest Fix
Analysis Process
Step 1: Parse Test Failure
Extract key information from test output:
What to extract:
- Test name and location
- Failure type (assertion, exception, timeout, etc.)
- Expected vs actual values
- Stack trace
- Error messages
Example:
FAILED tests/test_calculator.py::test_divide - AssertionError: assert 0 == 5 Expected: 5 Actual: 0 Stack trace: File "tests/test_calculator.py", line 15, in test_divide assert divide(10, 2) == 5 File "src/calculator.py", line 8, in divide return a // b
Step 2: Understand Test Intent
Determine what the test is trying to verify:
Questions to answer:
- What functionality is being tested?
- What are the inputs?
- What is the expected output?
- What properties should hold?
Example:
def test_divide(): # Intent: Verify division returns correct result result = divide(10, 2) assert result == 5 # Expects 10 / 2 = 5
Step 3: Trace Execution Path
Follow the code path from test to failure:
Trace elements:
- Function calls in stack trace
- Control flow decisions
- Data transformations
- Return values
Example trace:
test_divide() → divide(10, 2) → return a // b (integer division) → returns 5 → assert 5 == 5 ✓ Should pass!
Step 4: Identify Discrepancy
Find where expected and actual diverge:
Common discrepancies:
- Wrong operator (// vs /)
- Off-by-one errors
- Null/None handling
- Type mismatches
- Logic errors
Example:
# Expected: 10 / 2 = 5.0 # Actual: 10 // 2 = 5 (but test got 0?) # Discrepancy: Something else is wrong!
Step 5: Analyze Suspicious Code
Examine code for bug patterns:
Bug patterns to check:
- Uninitialized variables
- Wrong operators
- Missing return statements
- Incorrect conditions
- Edge case handling
Example analysis:
def divide(a, b): result = 0 # BUG: Initialized but never updated! return a // b # This line is unreachable? No, wait... # Actually, this returns correctly, but...
Step 6: Explain Bug Mechanism
Describe how the bug causes the failure:
Explanation structure:
- What the code does
- What it should do
- Why there's a mismatch
- How the test exposes it
Step 7: Suggest Fix
Propose concrete fix with explanation:
Fix components:
- Code change
- Why it fixes the bug
- How to verify the fix
Common Bug Patterns
For detailed bug patterns and detection strategies, see references/bug_patterns.md.
Categories include:
- Logic errors (wrong operators, conditions)
- State management (uninitialized, stale state)
- Boundary conditions (off-by-one, edge cases)
- Type errors (implicit conversions, null handling)
- Concurrency bugs (race conditions, deadlocks)
Failure Type Analysis
For analyzing different types of test failures, see references/failure_types.md.
Failure types:
- Assertion failures
- Exceptions and errors
- Timeouts
- Unexpected behavior
- Flaky tests
Example Analysis
Input: Failing test
# Test file: tests/test_list_utils.py def test_remove_duplicates(): input_list = [1, 2, 2, 3, 3, 3, 4] result = remove_duplicates(input_list) assert result == [1, 2, 3, 4] assert input_list == [1, 2, 2, 3, 3, 3, 4] # Original unchanged # Test output: # FAILED - AssertionError: assert [1, 2, 3, 4] == [1, 2, 2, 3, 3, 3, 4] # The second assertion failed! # Implementation: src/list_utils.py def remove_duplicates(lst): seen = set() i = 0 while i < len(lst): if lst[i] in seen: lst.pop(i) # BUG: Modifies input list! else: seen.add(lst[i]) i += 1 return lst
Output: Bug analysis
# Bug Analysis Report ## Test Failure Summary **Test:** test_remove_duplicates **Location:** tests/test_list_utils.py:2 **Failure Type:** Assertion failure **Failed Assertion:** `assert input_list == [1, 2, 2, 3, 3, 3, 4]` ## Expected vs Actual **Expected:** Original list unchanged: `[1, 2, 2, 3, 3, 3, 4]` **Actual:** Original list modified: `[1, 2, 3, 4]` ## Root Cause **Bug Location:** src/list_utils.py:7 **Bug Type:** Unintended side effect (input mutation) **Problematic Code:** ```python lst.pop(i) # Modifies the input list directly
Bug Mechanism
- What happens: The function modifies the input list in-place using
lst.pop(i) - Why it's wrong: The test expects the original list to remain unchanged
- How test exposes it: Second assertion checks that input_list is unmodified
- Why it fails: Since Python passes lists by reference, modifications to
affect the originallstinput_list
Execution Trace
test_remove_duplicates() input_list = [1, 2, 2, 3, 3, 3, 4] ↓ remove_duplicates(input_list) # lst points to same list as input_list i=0: lst[0]=1, not in seen, add to seen, i=1 i=1: lst[1]=2, not in seen, add to seen, i=2 i=2: lst[2]=2, in seen, lst.pop(2) # Removes from input_list! # Now lst = input_list = [1, 2, 3, 3, 3, 4] i=2: lst[2]=3, not in seen, add to seen, i=3 i=3: lst[3]=3, in seen, lst.pop(3) # Removes from input_list! # Now lst = input_list = [1, 2, 3, 3, 4] i=3: lst[3]=3, in seen, lst.pop(3) # Removes from input_list! # Now lst = input_list = [1, 2, 3, 4] i=3: lst[3]=4, not in seen, add to seen, i=4 return lst # Returns [1, 2, 3, 4] ↓ result = [1, 2, 3, 4] ✓ First assertion passes input_list = [1, 2, 3, 4] ✗ Second assertion fails!
Suspicious Code Regions
Primary Suspect: src/list_utils.py:7
lst.pop(i) # Direct mutation of input
Suspicion Level: HIGH Reason: Modifies input list, violating immutability expectation
Secondary Suspect: src/list_utils.py:11
return lst # Returns reference to modified input
Suspicion Level: MEDIUM Reason: Returns same object as input, not a new list
Recommended Fix
Option 1: Create a copy (Recommended)
def remove_duplicates(lst): result = [] # Create new list seen = set() for item in lst: if item not in seen: seen.add(item) result.append(item) return result
Why this fixes it:
- Creates new list instead of modifying input
- Original list remains unchanged
- Clearer intent
Option 2: Explicit copy
def remove_duplicates(lst): lst = lst.copy() # Work on a copy seen = set() i = 0 while i < len(lst): if lst[i] in seen: lst.pop(i) else: seen.add(lst[i]) i += 1 return lst
Why this fixes it:
creates a shallow copylst.copy()- Modifications don't affect original
- Preserves original algorithm structure
Verification
To verify the fix:
- Run the failing test:
pytest tests/test_list_utils.py::test_remove_duplicates - Both assertions should pass
- Add additional test for immutability:
def test_remove_duplicates_immutable(): original = [1, 2, 2, 3] original_copy = original.copy() result = remove_duplicates(original) assert original == original_copy # Verify no mutation
Related Issues
This bug could affect:
- Any code that assumes
doesn't modify inputremove_duplicates - Functions that reuse the input list after calling
remove_duplicates - Concurrent code where multiple threads access the same list
## Analysis Strategies For detailed analysis strategies by language and framework, see [references/analysis_strategies.md](references/analysis_strategies.md). Strategies include: - Python (pytest, unittest) - JavaScript (Jest, Mocha) - Java (JUnit) - C/C++ (Google Test) - Go (testing package) ## Best Practices 1. **Start with the failure message** - It often points directly to the bug 2. **Understand test intent** - Know what should happen 3. **Trace execution carefully** - Follow the actual code path 4. **Look for common patterns** - Many bugs follow known patterns 5. **Consider edge cases** - Bugs often hide at boundaries 6. **Check assumptions** - Verify what the code assumes 7. **Explain clearly** - Make the bug mechanism understandable ## Red Flags Watch for these suspicious patterns: **High-priority red flags:** - Uninitialized variables - Missing return statements - Wrong operators (== vs =, // vs /) - Off-by-one errors (< vs <=) - Null/None without checks - Mutable default arguments - Side effects in pure functions **Medium-priority warnings:** - Complex conditionals - Nested loops with breaks - Exception swallowing - Type conversions - Global state access ## Report Template ```markdown # Bug Analysis Report ## Test Failure Summary - Test name and location - Failure type - Failed assertion/error ## Expected vs Actual - What should happen - What actually happened ## Root Cause - Bug location (file:line) - Bug type - Problematic code snippet ## Bug Mechanism - Step-by-step explanation - Why it's wrong - How test exposes it ## Execution Trace - Detailed trace from test to failure - Variable values at key points ## Suspicious Code Regions - Primary suspects with evidence - Secondary suspects ## Recommended Fix - Proposed code change - Explanation of why it fixes the bug - How to verify ## Related Issues - Other code that might be affected
Additional Resources
For detailed guidance:
- Bug Patterns - Common bug patterns and detection
- Failure Types - Analyzing different failure types
- Analysis Strategies - Language-specific strategies