Claude-skill-registry forensic-test-analysis
Use when investigating test suite issues, reducing CI/CD time, identifying brittle tests, finding test duplication, or analyzing test maintenance burden - reveals test code quality problems through git history analysis
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/forensic-test-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-forensic-test-analysis && rm -rf "$T"
skills/data/forensic-test-analysis/SKILL.mdForensic Test Analysis
🎯 When You Use This Skill
State explicitly: "Using forensic-test-analysis pattern"
Then follow these steps:
- Calculate test change frequency vs production code changes
- Identify brittle tests (coupling ratio >2x = test changes more than prod)
- Find large test files (>500 LOC = maintenance burden)
- Cite research when presenting findings (brittle tests = 2-3x maintenance cost)
- Suggest integration with hotspot-finder and complexity-trends at end
Overview
Test analysis examines test code quality through git forensics. Unlike static test coverage tools, this reveals:
- Brittle tests - Change more frequently than production code
- Over-coupled tests - Break with every production change
- Test hotspots - High-churn test files requiring constant fixes
- Duplicate test logic - Copy-paste test code (maintenance burden)
- Large test files - Unmaintainable test suites
- Slow tests - Impact CI/CD cycle time
Core principle: Good tests are stable. If tests change more than production code (ratio >2x), they're brittle and expensive.
When to Use
- Investigating slow or flaky CI/CD pipelines
- Reducing test maintenance burden
- Before refactoring test suites
- Diagnosing "broken tests" tickets frequency
- Quarterly test health checks
- After major refactoring (did tests improve?)
- Justifying test refactoring investment
When NOT to Use
- Insufficient git history (<6 months unreliable)
- No test files (obviously)
- Greenfield projects (no patterns yet)
- When you need test coverage metrics (use coverage tools)
- When you need defect correlation (use hotspot analysis)
Core Pattern
⚡ THE TEST BRITTLENESS FORMULA (USE THIS)
This is the test health metric - don't create custom ratios:
Test Brittleness Ratio = test_changes / production_changes Interpretation: - >2.0: BRITTLE (test changes more than prod - expensive) - 1.0-2.0: NORMAL (tests evolve with production) - 0.5-1.0: GOOD (stable tests, well-designed) - <0.5: UNDER-TESTED or integration tests (fewer changes expected) Test File Size Risk: - >500 LOC: CRITICAL (unmaintainable) - 300-500 LOC: HIGH (should split) - 150-300 LOC: MODERATE (monitor) - <150 LOC: GOOD (focused tests) Test Hotspot = Brittle (>2x) + High Changes (>20 commits/year)
Critical: Ratio >2x indicates tests are MORE expensive to maintain than production code.
📊 Research Benchmarks (CITE THESE)
Always reference research when presenting test findings:
| Finding | Impact | Source | When to Cite |
|---|---|---|---|
| Brittle tests | 2-3x maintenance cost | Google Testing Blog | "Brittle tests cost 2-3x more to maintain (Google)" |
| Test duplication | 40-60% wasted effort | Microsoft DevOps | "Test duplication wastes 40-60% of test effort (Microsoft)" |
| Slow tests | 20-30 min daily waste per dev | Continuous Delivery | "Slow tests waste 20-30 min/developer/day (CD research)" |
Always cite the source when justifying test refactoring investment.
Quick Reference
Essential Git Commands
| Purpose | Command |
|---|---|
| Test change frequency | |
| Production changes | |
| Test-only commits | |
| Test file sizes | |
Test Health Classification
| Brittleness Ratio | File Size | Change Frequency | Classification | Action |
|---|---|---|---|---|
| >2.0 | >500 LOC | >20/year | CRITICAL | Urgent refactoring |
| 1.5-2.0 | 300-500 | 15-20 | HIGH | Schedule refactoring |
| 1.0-1.5 | 150-300 | 10-15 | MODERATE | Monitor trends |
| <1.0 | <150 | <10 | GOOD | Maintain standards |
Common Test Anti-Patterns
| Pattern | Indicator | Fix |
|---|---|---|
| Brittle snapshots | "update snapshots" commits | Use semantic assertions |
| Test-only commits | "fix failing test" commits | Decouple from implementation |
| Large test files | >500 LOC | Split by feature/scenario |
| Duplicate setup | Repeated beforeEach code | Extract test helpers |
Implementation
Step 1: Identify Test Files
Gather test file list:
# Find all test files (adapt patterns to your project) test_files=$(find . -type f \ -name "*.test.js" -o \ -name "*.test.ts" -o \ -name "*.spec.js" -o \ -name "*_test.py" -o \ -name "*Test.java") # Get corresponding production files # (remove .test/.spec from filename)
Step 2: Calculate Brittleness Ratio
For each test file:
# Pseudocode for brittleness calculation def calculate_brittleness(test_file, production_file): # Count test file changes test_changes = git_log_count(test_file, since="12 months ago") # Count production file changes prod_changes = git_log_count(production_file, since="12 months ago") if prod_changes == 0: return None # No production changes to compare # Calculate ratio brittleness_ratio = test_changes / prod_changes # Classify if brittleness_ratio > 2.0: classification = "BRITTLE" severity = "CRITICAL" elif brittleness_ratio > 1.5: classification = "BRITTLE" severity = "HIGH" elif brittleness_ratio > 1.0: classification = "MODERATE" severity = "MEDIUM" else: classification = "GOOD" severity = "LOW" return { 'test_changes': test_changes, 'prod_changes': prod_changes, 'ratio': brittleness_ratio, 'classification': classification, 'severity': severity }
Step 3: Detect Test-Only Commits
Identify pure test maintenance:
def find_test_only_commits(since="12 months ago"): # Get all commits commits = git_log(since=since, name_only=True) test_only_commits = [] for commit in commits: changed_files = commit.files # Check if only test files changed all_tests = all(is_test_file(f) for f in changed_files) # Check for brittle test keywords brittle_keywords = ['fix failing test', 'update snapshot', 'fix flaky', 'fix test', 'test fix'] is_brittle = any(kw in commit.message.lower() for kw in brittle_keywords) if all_tests and is_brittle: test_only_commits.append({ 'hash': commit.hash, 'message': commit.message, 'files': changed_files, 'category': 'BRITTLE_TEST_MAINTENANCE' }) return test_only_commits
High count of test-only commits = brittle test suite
Step 4: Analyze Test File Size
Flag large test files:
def analyze_test_sizes(): large_tests = [] for test_file in find_test_files(): loc = count_lines(test_file) if loc > 500: severity = "CRITICAL" elif loc > 300: severity = "HIGH" elif loc > 150: severity = "MODERATE" else: severity = "LOW" if severity in ["CRITICAL", "HIGH"]: large_tests.append({ 'file': test_file, 'loc': loc, 'severity': severity, 'recommendation': 'Split into smaller test files' }) return large_tests
Output Format
1. Executive Summary
Test Suite Health Assessment (forensic-test-analysis pattern) Test Files: 247 Production Files: 312 Test-to-Production Ratio: 0.79:1 KEY FINDINGS: Brittle Tests (>2x changes): 18 files (7%) Large Test Files (>500 LOC): 12 files Test-Only Commits: 89 commits (23% of test commits) Test Hotspots (brittle + high-churn): 8 files Research shows brittle tests cost 2-3x more to maintain (Google). Estimated Annual Test Maintenance Cost: $45,000 - Brittle test fixes: $28,000 - Large file maintenance: $12,000 - Duplicate code: $5,000
2. Test Hotspots (Brittle + High-Churn)
Rank | Test File | Test Chg | Prod Chg | Ratio | LOC | Status -----|--------------------------|----------|----------|-------|-----|---------- 1 | auth/login.test.js | 42 | 15 | 2.8x | 687 | 🚨 CRITICAL 2 | api/users.spec.js | 35 | 18 | 1.9x | 523 | ❌ HIGH 3 | checkout.test.ts | 48 | 22 | 2.2x | 445 | ❌ HIGH 4 | Form.test.tsx | 38 | 14 | 2.7x | 392 | ❌ HIGH
3. Detailed Test Analysis
=== TEST HOTSPOT #1: auth/login.test.js === Brittleness Metrics: Test Changes (12mo): 42 commits Production Changes: 15 commits (login.js) Brittleness Ratio: 2.8x (CRITICAL - tests change faster than prod) Lines of Code: 687 (CRITICAL - unmaintainable size) Research: Brittle tests cost 2-3x more to maintain (Google). Change Pattern Analysis: - 14 commits: "fix failing test" (33% - pure maintenance) - 11 commits: "update snapshots" (26% - brittle snapshots) - 10 commits: aligned with production (24% - expected) - 7 commits: "refactor tests" (17%) Issues Identified: ⚠️ Brittle: 2.8x change ratio (expected ~1.0x) ⚠️ Large: 687 LOC (expected <300 LOC) ⚠️ Snapshot-heavy: 26% of changes are snapshot updates ⚠️ Maintenance burden: 33% pure test fixes RECOMMENDATIONS: 1. IMMEDIATE: Replace snapshots with semantic assertions 2. SHORT-TERM: Split into 3 smaller test files (~200 LOC each) 3. MEDIUM-TERM: Decouple tests from implementation details 4. PROCESS: Add test brittleness check to CI Expected Impact: -60% maintenance cost, -70% brittleness ratio
4. Test-Only Commit Analysis
Brittle Test Maintenance (Test-Only Commits): Total Test Commits: 387 Test-Only Commits: 89 (23% - maintenance overhead) Top Brittle Tests (by fix commits): 1. auth/login.test.js: 14 "fix" commits 2. api/users.spec.js: 11 "fix" commits 3. checkout.test.ts: 9 "fix" commits Pattern: 23% of test effort is pure maintenance (not new tests) Impact: Wasted effort, developer frustration Research: Brittle tests cost 2-3x more to maintain (Google).
Common Mistakes
Mistake 1: Ignoring brittleness ratio
Problem: Only looking at test change count, not comparing to production.
# ❌ BAD: Just count test changes high_churn_tests = tests with >20 changes # ✅ GOOD: Calculate brittleness ratio brittle_tests = tests where (test_changes / prod_changes) > 2.0
Fix: Always calculate ratio - 30 test changes with 30 prod changes is normal, not brittle.
Mistake 2: Treating all snapshot commits as bad
Problem: Flagging legitimate snapshot updates as brittle.
Fix: Distinguish between:
- Legitimate: Snapshot updates with corresponding UI changes
- Brittle: Frequent snapshot updates without meaningful prod changes (>5 per year)
- Always check: If "update snapshots" commit has NO production changes = brittle
Mistake 3: Not checking test file size
Problem: Focusing only on change frequency, missing unmaintainable large files.
# ❌ BAD: Only brittleness flag tests with ratio > 2.0 # ✅ GOOD: Combine brittleness + size flag tests where (ratio > 2.0 OR size > 500)
Fix: Always check file size - large files (>500 LOC) are maintenance burdens even if stable.
Mistake 4: Not estimating test maintenance cost
Problem: Identifying brittle tests without quantifying business impact.
Fix: Calculate cost:
- Average commit time: 30 minutes
- Brittle test commits: 89 per year
- Cost: 89 × 0.5 hours × $100/hour = $4,450/year per brittle test file
- Always translate to dollars for executive justification
⚡ After Running Test Analysis (DO THIS)
Immediately suggest these next steps to the user:
-
Correlate with production hotspots (use forensic-hotspot-finder)
- Are brittle tests testing hotspot code?
- Hotspot + brittle test = double maintenance burden
- Prioritize refactoring both together
-
Check test complexity trends (use forensic-complexity-trends)
- Are test files growing in complexity?
- Track whether test refactoring is working
- Set up monitoring for test file sizes
-
Calculate refactoring ROI (use forensic-refactoring-roi)
- Test maintenance cost = annual waste
- Test refactoring investment = effort estimation
- ROI typically very high (brittle tests are expensive)
-
Track test health monthly
- Re-run test analysis quarterly
- Monitor brittleness ratio trends
- Early warning for emerging brittle tests
Example: Complete Test Analysis Workflow
"Using forensic-test-analysis pattern, I analyzed 247 test files. TEST HEALTH ASSESSMENT: Brittle Tests: 18 files (7% of test suite) - Brittleness ratio >2.0x (tests change faster than production) - Research shows 2-3x higher maintenance cost (Google) TOP BRITTLE TEST: auth/login.test.js: - Ratio: 2.8x (42 test changes vs 15 prod changes) - Size: 687 LOC (CRITICAL) - Pattern: 33% "fix failing test" commits - Cost: ~$8,400/year in maintenance ESTIMATED ANNUAL COST: $45,000 in brittle test maintenance RECOMMENDATIONS: 1. Replace snapshot tests with semantic assertions 2. Split large test files (>500 LOC) 3. Decouple tests from implementation details NEXT STEPS: 1. Check production hotspots (forensic-hotspot-finder) - Testing hotspot code? 2. Track complexity trends (forensic-complexity-trends) - Are tests growing? 3. Calculate ROI (forensic-refactoring-roi) - Business case for cleanup Would you like me to proceed with hotspot correlation?"
Always provide this integration guidance - test issues often indicate production code quality problems.
Advanced Patterns
Test-Production Co-Change Analysis
Find which tests always change with production:
Co-Change Pattern: login.test.js ↔ login.js: - 15 commits changed both together (expected) - 27 commits changed ONLY login.test.js (brittle!) Ratio Analysis: - Expected: 1:1 co-change - Actual: 1:2.8 (test changes 2.8x more) Conclusion: Tests over-coupled to implementation details
Test Refactoring Impact Validation
Measure before/after:
Before Refactoring (auth/login.test.js): - Brittleness: 2.8x - Size: 687 LOC - Maintenance commits: 14/year After Refactoring (Q2 2024): - Brittleness: 1.1x (-61%) - Size: 245 LOC (-64%) - Maintenance commits: 2/year (-86%) VALIDATION: ✅ Refactoring successful Annual savings: $7,200 (from $8,400 to $1,200)
Flaky Test Detection
If test execution data available:
Flaky Tests (intermittent failures): checkout.test.ts: - 12 "fix flaky test" commits - Pattern: Failures on CI but pass locally - Root cause: Race conditions, timing dependencies Impact: Developer context switching, CI/CD unreliability Fix: Condition-based waiting, not arbitrary timeouts
Research Background
Key studies:
-
Google Testing Blog (2017): Test brittleness cost
- Brittle tests cost 2-3x more to maintain than stable tests
- Snapshot tests are particularly brittle
- Recommendation: Use semantic assertions, not snapshots
-
Microsoft DevOps (2019): Test duplication impact
- 40-60% of test effort wasted on duplicate test logic
- Copy-paste tests create maintenance burden
- Recommendation: Extract test helpers, reduce duplication
-
Continuous Delivery (Humble & Farley): Slow test impact
- Slow tests waste 20-30 minutes per developer per day
- Developers skip running tests if they're too slow
- Recommendation: Optimize test execution, parallelize
-
Test Maintenance Research (Garousi et al, 2013): Test code quality
- Test code quality predicts test effectiveness
- Large test files correlate with defects
- Recommendation: Apply same quality standards to test code
Why test quality matters: Poor test quality wastes developer time, reduces confidence, and creates maintenance burden exceeding test value.
Integration with Other Techniques
Combine test analysis with:
- forensic-hotspot-finder: Brittle tests on hotspot code = double maintenance burden
- forensic-complexity-trends: Track test complexity over time
- forensic-refactoring-roi: Test refactoring typically has very high ROI
- forensic-debt-quantification: Test maintenance is quantifiable technical debt
Why: Test quality affects developer productivity - poor tests slow everyone down.