Claude-skill-registry forensic-test-analysis

Use when investigating test suite issues, reducing CI/CD time, identifying brittle tests, finding test duplication, or analyzing test maintenance burden - reveals test code quality problems through git history analysis

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/forensic-test-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-forensic-test-analysis && rm -rf "$T"

manifest: skills/data/forensic-test-analysis/SKILL.md

source content

Forensic Test Analysis

🎯 When You Use This Skill

State explicitly: "Using forensic-test-analysis pattern"

Then follow these steps:

Calculate test change frequency vs production code changes
Identify brittle tests (coupling ratio >2x = test changes more than prod)
Find large test files (>500 LOC = maintenance burden)
Cite research when presenting findings (brittle tests = 2-3x maintenance cost)
Suggest integration with hotspot-finder and complexity-trends at end

Overview

Test analysis examines test code quality through git forensics. Unlike static test coverage tools, this reveals:

Brittle tests - Change more frequently than production code
Over-coupled tests - Break with every production change
Test hotspots - High-churn test files requiring constant fixes
Duplicate test logic - Copy-paste test code (maintenance burden)
Large test files - Unmaintainable test suites
Slow tests - Impact CI/CD cycle time

Core principle: Good tests are stable. If tests change more than production code (ratio >2x), they're brittle and expensive.

When to Use

Investigating slow or flaky CI/CD pipelines
Reducing test maintenance burden
Before refactoring test suites
Diagnosing "broken tests" tickets frequency
Quarterly test health checks
After major refactoring (did tests improve?)
Justifying test refactoring investment

When NOT to Use

Insufficient git history (<6 months unreliable)
No test files (obviously)
Greenfield projects (no patterns yet)
When you need test coverage metrics (use coverage tools)
When you need defect correlation (use hotspot analysis)

Core Pattern

⚡ THE TEST BRITTLENESS FORMULA (USE THIS)

This is the test health metric - don't create custom ratios:

Test Brittleness Ratio = test_changes / production_changes

Interpretation:
  - >2.0:  BRITTLE (test changes more than prod - expensive)
  - 1.0-2.0: NORMAL (tests evolve with production)
  - 0.5-1.0: GOOD (stable tests, well-designed)
  - <0.5:  UNDER-TESTED or integration tests (fewer changes expected)

Test File Size Risk:
  - >500 LOC:  CRITICAL (unmaintainable)
  - 300-500 LOC: HIGH (should split)
  - 150-300 LOC: MODERATE (monitor)
  - <150 LOC:  GOOD (focused tests)

Test Hotspot = Brittle (>2x) + High Changes (>20 commits/year)

Critical: Ratio >2x indicates tests are MORE expensive to maintain than production code.

📊 Research Benchmarks (CITE THESE)

Always reference research when presenting test findings:

Finding	Impact	Source	When to Cite
Brittle tests	2-3x maintenance cost	Google Testing Blog	"Brittle tests cost 2-3x more to maintain (Google)"
Test duplication	40-60% wasted effort	Microsoft DevOps	"Test duplication wastes 40-60% of test effort (Microsoft)"
Slow tests	20-30 min daily waste per dev	Continuous Delivery	"Slow tests waste 20-30 min/developer/day (CD research)"

Always cite the source when justifying test refactoring investment.

Quick Reference

Essential Git Commands

Purpose	Command
Test change frequency	`git log --since="12 months ago" --name-only --format="" -- "test" "spec" \| sort \| uniq -c \| sort -rn`
Production changes	`git log --since="12 months ago" --name-only --format="" -- "src/*/.js" \| grep -v test \| sort \| uniq -c`
Test-only commits	`git log --since="12 months ago" --name-only --format="COMMIT:%H\|%s" \| awk '/test.*fix\|flaky/'`
Test file sizes	`find . -name ".test." -o -name "spec." \| xargs wc -l \| sort -rn`

Test Health Classification

Brittleness Ratio	File Size	Change Frequency	Classification	Action
>2.0	>500 LOC	>20/year	CRITICAL	Urgent refactoring
1.5-2.0	300-500	15-20	HIGH	Schedule refactoring
1.0-1.5	150-300	10-15	MODERATE	Monitor trends
<1.0	<150	<10	GOOD	Maintain standards

Common Test Anti-Patterns

Pattern	Indicator	Fix
Brittle snapshots	"update snapshots" commits	Use semantic assertions
Test-only commits	"fix failing test" commits	Decouple from implementation
Large test files	>500 LOC	Split by feature/scenario
Duplicate setup	Repeated beforeEach code	Extract test helpers

Implementation

Step 1: Identify Test Files

Gather test file list:

# Find all test files (adapt patterns to your project)
test_files=$(find . -type f \
  -name "*.test.js" -o \
  -name "*.test.ts" -o \
  -name "*.spec.js" -o \
  -name "*_test.py" -o \
  -name "*Test.java")

# Get corresponding production files
# (remove .test/.spec from filename)

Step 2: Calculate Brittleness Ratio

For each test file:

# Pseudocode for brittleness calculation

def calculate_brittleness(test_file, production_file):
    # Count test file changes
    test_changes = git_log_count(test_file, since="12 months ago")

    # Count production file changes
    prod_changes = git_log_count(production_file, since="12 months ago")

    if prod_changes == 0:
        return None  # No production changes to compare

    # Calculate ratio
    brittleness_ratio = test_changes / prod_changes

    # Classify
    if brittleness_ratio > 2.0:
        classification = "BRITTLE"
        severity = "CRITICAL"
    elif brittleness_ratio > 1.5:
        classification = "BRITTLE"
        severity = "HIGH"
    elif brittleness_ratio > 1.0:
        classification = "MODERATE"
        severity = "MEDIUM"
    else:
        classification = "GOOD"
        severity = "LOW"

    return {
        'test_changes': test_changes,
        'prod_changes': prod_changes,
        'ratio': brittleness_ratio,
        'classification': classification,
        'severity': severity
    }

Step 3: Detect Test-Only Commits

Identify pure test maintenance:

def find_test_only_commits(since="12 months ago"):
    # Get all commits
    commits = git_log(since=since, name_only=True)

    test_only_commits = []
    for commit in commits:
        changed_files = commit.files

        # Check if only test files changed
        all_tests = all(is_test_file(f) for f in changed_files)

        # Check for brittle test keywords
        brittle_keywords = ['fix failing test', 'update snapshot',
                           'fix flaky', 'fix test', 'test fix']
        is_brittle = any(kw in commit.message.lower() for kw in brittle_keywords)

        if all_tests and is_brittle:
            test_only_commits.append({
                'hash': commit.hash,
                'message': commit.message,
                'files': changed_files,
                'category': 'BRITTLE_TEST_MAINTENANCE'
            })

    return test_only_commits

High count of test-only commits = brittle test suite

Step 4: Analyze Test File Size

Flag large test files:

def analyze_test_sizes():
    large_tests = []

    for test_file in find_test_files():
        loc = count_lines(test_file)

        if loc > 500:
            severity = "CRITICAL"
        elif loc > 300:
            severity = "HIGH"
        elif loc > 150:
            severity = "MODERATE"
        else:
            severity = "LOW"

        if severity in ["CRITICAL", "HIGH"]:
            large_tests.append({
                'file': test_file,
                'loc': loc,
                'severity': severity,
                'recommendation': 'Split into smaller test files'
            })

    return large_tests

Output Format

1. Executive Summary

Test Suite Health Assessment (forensic-test-analysis pattern)

Test Files: 247
Production Files: 312
Test-to-Production Ratio: 0.79:1

KEY FINDINGS:

Brittle Tests (>2x changes): 18 files (7%)
Large Test Files (>500 LOC): 12 files
Test-Only Commits: 89 commits (23% of test commits)
Test Hotspots (brittle + high-churn): 8 files

Research shows brittle tests cost 2-3x more to maintain (Google).

Estimated Annual Test Maintenance Cost: $45,000
  - Brittle test fixes: $28,000
  - Large file maintenance: $12,000
  - Duplicate code: $5,000

2. Test Hotspots (Brittle + High-Churn)

Rank | Test File                | Test Chg | Prod Chg | Ratio | LOC | Status
-----|--------------------------|----------|----------|-------|-----|----------
1    | auth/login.test.js      | 42       | 15       | 2.8x  | 687 | 🚨 CRITICAL
2    | api/users.spec.js       | 35       | 18       | 1.9x  | 523 | ❌ HIGH
3    | checkout.test.ts        | 48       | 22       | 2.2x  | 445 | ❌ HIGH
4    | Form.test.tsx           | 38       | 14       | 2.7x  | 392 | ❌ HIGH

3. Detailed Test Analysis

=== TEST HOTSPOT #1: auth/login.test.js ===

Brittleness Metrics:
  Test Changes (12mo): 42 commits
  Production Changes: 15 commits (login.js)
  Brittleness Ratio: 2.8x (CRITICAL - tests change faster than prod)
  Lines of Code: 687 (CRITICAL - unmaintainable size)

Research: Brittle tests cost 2-3x more to maintain (Google).

Change Pattern Analysis:
  - 14 commits: "fix failing test" (33% - pure maintenance)
  - 11 commits: "update snapshots" (26% - brittle snapshots)
  - 10 commits: aligned with production (24% - expected)
  - 7 commits: "refactor tests" (17%)

Issues Identified:
  ⚠️  Brittle: 2.8x change ratio (expected ~1.0x)
  ⚠️  Large: 687 LOC (expected <300 LOC)
  ⚠️  Snapshot-heavy: 26% of changes are snapshot updates
  ⚠️  Maintenance burden: 33% pure test fixes

RECOMMENDATIONS:
1. IMMEDIATE: Replace snapshots with semantic assertions
2. SHORT-TERM: Split into 3 smaller test files (~200 LOC each)
3. MEDIUM-TERM: Decouple tests from implementation details
4. PROCESS: Add test brittleness check to CI

Expected Impact: -60% maintenance cost, -70% brittleness ratio

4. Test-Only Commit Analysis

Brittle Test Maintenance (Test-Only Commits):

Total Test Commits: 387
Test-Only Commits: 89 (23% - maintenance overhead)

Top Brittle Tests (by fix commits):
  1. auth/login.test.js: 14 "fix" commits
  2. api/users.spec.js: 11 "fix" commits
  3. checkout.test.ts: 9 "fix" commits

Pattern: 23% of test effort is pure maintenance (not new tests)
Impact: Wasted effort, developer frustration

Research: Brittle tests cost 2-3x more to maintain (Google).

Common Mistakes

Mistake 1: Ignoring brittleness ratio

Problem: Only looking at test change count, not comparing to production.

# ❌ BAD: Just count test changes
high_churn_tests = tests with >20 changes

# ✅ GOOD: Calculate brittleness ratio
brittle_tests = tests where (test_changes / prod_changes) > 2.0

Fix: Always calculate ratio - 30 test changes with 30 prod changes is normal, not brittle.

Mistake 2: Treating all snapshot commits as bad

Problem: Flagging legitimate snapshot updates as brittle.

Fix: Distinguish between:

Legitimate: Snapshot updates with corresponding UI changes
Brittle: Frequent snapshot updates without meaningful prod changes (>5 per year)
Always check: If "update snapshots" commit has NO production changes = brittle

Mistake 3: Not checking test file size

Problem: Focusing only on change frequency, missing unmaintainable large files.

# ❌ BAD: Only brittleness
flag tests with ratio > 2.0

# ✅ GOOD: Combine brittleness + size
flag tests where (ratio > 2.0 OR size > 500)

Fix: Always check file size - large files (>500 LOC) are maintenance burdens even if stable.

Mistake 4: Not estimating test maintenance cost

Problem: Identifying brittle tests without quantifying business impact.

Fix: Calculate cost:

Average commit time: 30 minutes
Brittle test commits: 89 per year
Cost: 89 × 0.5 hours × $100/hour = $4,450/year per brittle test file
Always translate to dollars for executive justification

⚡ After Running Test Analysis (DO THIS)

Immediately suggest these next steps to the user:

Correlate with production hotspots (use forensic-hotspot-finder)
- Are brittle tests testing hotspot code?
- Hotspot + brittle test = double maintenance burden
- Prioritize refactoring both together
Check test complexity trends (use forensic-complexity-trends)
- Are test files growing in complexity?
- Track whether test refactoring is working
- Set up monitoring for test file sizes
Calculate refactoring ROI (use forensic-refactoring-roi)
- Test maintenance cost = annual waste
- Test refactoring investment = effort estimation
- ROI typically very high (brittle tests are expensive)
Track test health monthly
- Re-run test analysis quarterly
- Monitor brittleness ratio trends
- Early warning for emerging brittle tests

Example: Complete Test Analysis Workflow

"Using forensic-test-analysis pattern, I analyzed 247 test files.

TEST HEALTH ASSESSMENT:

Brittle Tests: 18 files (7% of test suite)
  - Brittleness ratio >2.0x (tests change faster than production)
  - Research shows 2-3x higher maintenance cost (Google)

TOP BRITTLE TEST:

auth/login.test.js:
  - Ratio: 2.8x (42 test changes vs 15 prod changes)
  - Size: 687 LOC (CRITICAL)
  - Pattern: 33% "fix failing test" commits
  - Cost: ~$8,400/year in maintenance

ESTIMATED ANNUAL COST: $45,000 in brittle test maintenance

RECOMMENDATIONS:
1. Replace snapshot tests with semantic assertions
2. Split large test files (>500 LOC)
3. Decouple tests from implementation details

NEXT STEPS:
1. Check production hotspots (forensic-hotspot-finder) - Testing hotspot code?
2. Track complexity trends (forensic-complexity-trends) - Are tests growing?
3. Calculate ROI (forensic-refactoring-roi) - Business case for cleanup

Would you like me to proceed with hotspot correlation?"

Always provide this integration guidance - test issues often indicate production code quality problems.

Advanced Patterns

Test-Production Co-Change Analysis

Find which tests always change with production:

Co-Change Pattern:

login.test.js ↔ login.js:
  - 15 commits changed both together (expected)
  - 27 commits changed ONLY login.test.js (brittle!)

Ratio Analysis:
  - Expected: 1:1 co-change
  - Actual: 1:2.8 (test changes 2.8x more)

Conclusion: Tests over-coupled to implementation details

Test Refactoring Impact Validation

Measure before/after:

Before Refactoring (auth/login.test.js):
  - Brittleness: 2.8x
  - Size: 687 LOC
  - Maintenance commits: 14/year

After Refactoring (Q2 2024):
  - Brittleness: 1.1x (-61%)
  - Size: 245 LOC (-64%)
  - Maintenance commits: 2/year (-86%)

VALIDATION: ✅ Refactoring successful
Annual savings: $7,200 (from $8,400 to $1,200)

Flaky Test Detection

If test execution data available:

Flaky Tests (intermittent failures):

checkout.test.ts:
  - 12 "fix flaky test" commits
  - Pattern: Failures on CI but pass locally
  - Root cause: Race conditions, timing dependencies

Impact: Developer context switching, CI/CD unreliability
Fix: Condition-based waiting, not arbitrary timeouts

Research Background

Key studies:

Google Testing Blog (2017): Test brittleness cost
- Brittle tests cost 2-3x more to maintain than stable tests
- Snapshot tests are particularly brittle
- Recommendation: Use semantic assertions, not snapshots
Microsoft DevOps (2019): Test duplication impact
- 40-60% of test effort wasted on duplicate test logic
- Copy-paste tests create maintenance burden
- Recommendation: Extract test helpers, reduce duplication
Continuous Delivery (Humble & Farley): Slow test impact
- Slow tests waste 20-30 minutes per developer per day
- Developers skip running tests if they're too slow
- Recommendation: Optimize test execution, parallelize
Test Maintenance Research (Garousi et al, 2013): Test code quality
- Test code quality predicts test effectiveness
- Large test files correlate with defects
- Recommendation: Apply same quality standards to test code

Why test quality matters: Poor test quality wastes developer time, reduces confidence, and creates maintenance burden exceeding test value.

Integration with Other Techniques

Combine test analysis with:

forensic-hotspot-finder: Brittle tests on hotspot code = double maintenance burden
forensic-complexity-trends: Track test complexity over time
forensic-refactoring-roi: Test refactoring typically has very high ROI
forensic-debt-quantification: Test maintenance is quantifiable technical debt

Why: Test quality affects developer productivity - poor tests slow everyone down.