Skills-4-SE regression-consistency-checker
Checks whether a new version of a repository preserves the behavior observed by tests on the old version. Use this skill when comparing two versions of code to detect regressions, verify refactoring safety, validate bug fixes don't break existing functionality, or ensure backward compatibility. Detects differences in function outputs, exceptions, observable states, and performance between versions. Generates reports highlighting potential regressions (critical, high, medium, low severity), improvements, and areas requiring verification. Triggers when users ask to check for regressions between versions, compare test behavior across versions, verify behavior preservation, or validate that changes don't break existing tests.
git clone https://github.com/ArabelaTso/Skills-4-SE
T=$(mktemp -d) && git clone --depth=1 https://github.com/ArabelaTso/Skills-4-SE "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/regression-consistency-checker" ~/.claude/skills/arabelatso-skills-4-se-regression-consistency-checker && rm -rf "$T"
skills/regression-consistency-checker/SKILL.mdRegression Consistency Checker
Check whether a new version of a repository preserves the behavior observed by tests on the old version.
Workflow
1. Prepare Versions
Set up old version:
# Tag or note the old version git tag old-version # Or checkout specific commit git checkout <old-commit-hash>
Set up new version:
# Tag the new version git tag new-version # Or checkout new commit git checkout <new-commit-hash>
Ensure clean environment:
- Same dependencies installed
- Same test configuration
- Same environment variables
- Deterministic test execution (fix random seeds, mock time)
2. Run Tests on Old Version
Capture baseline results:
# Python (pytest with JSON report) git checkout old-version pytest --json-report --json-report-file=old_results.json # JavaScript (Jest with JSON report) git checkout old-version npm test -- --json --outputFile=old_results.json # Run multiple times to check stability pytest --json-report --json-report-file=old_results_1.json pytest --json-report --json-report-file=old_results_2.json # Compare to ensure deterministic
Verify baseline stability:
- All tests should pass (or document known failures)
- Results should be consistent across runs
- No flaky tests
3. Run Tests on New Version
Capture new results:
# Python git checkout new-version pytest --json-report --json-report-file=new_results.json # JavaScript git checkout new-version npm test -- --json --outputFile=new_results.json
Note any immediate failures:
- Tests that now fail
- New errors or exceptions
- Changed behavior
4. Compare Results
Use comparison script:
python scripts/compare_results.py old_results.json new_results.json # With custom tolerance for floats python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001 # Save detailed report python scripts/compare_results.py old_results.json new_results.json --output regression_report.json
Script detects:
- 🔴 Critical: Tests that passed now fail, missing tests
- 🟠 High: Different outputs for same inputs
- 🟡 Medium: Different exception types
- 🔵 Low: Changed error messages
- ✅ Improvements: Tests that now pass, bug fixes
5. Analyze Regressions
For each regression, determine:
Is it a true regression?
- Unintended behavior change
- Bug introduced
- Performance degradation
- Breaking change
Or is it expected?
- Intentional behavior change
- Bug fix that changes output
- Improved error handling
- Refactoring with equivalent behavior
Review strategies in detection_strategies.md.
6. Investigate Root Causes
For critical regressions:
# Find commits that caused regression git bisect start git bisect bad new-version git bisect good old-version # Test each commit git bisect run pytest path/to/failing_test.py
For output differences:
- Compare function inputs/outputs
- Check for changed algorithms
- Verify data transformations
- Review calculation logic
For exception changes:
- Check error handling code
- Verify exception types
- Review validation logic
7. Document Findings
Create regression report:
REGRESSION ANALYSIS REPORT ========================== Version Comparison: v1.0.0 → v1.1.0 Date: 2024-01-15 Tests Run: 156 SUMMARY ------- Critical Regressions: 2 High Severity: 5 Medium Severity: 3 Low Severity: 8 Improvements: 4 Unchanged: 134 CRITICAL REGRESSIONS -------------------- 1. test_user_authentication - Status: PASS → FAIL - Error: KeyError: 'user_id' - Root Cause: Removed field from response - Action: Restore field or update API contract 2. test_payment_processing - Status: PASS → FAIL - Error: AssertionError: expected 100.00, got 100.01 - Root Cause: Rounding change in calculation - Action: Fix rounding logic HIGH SEVERITY REGRESSIONS -------------------------- 1. test_data_export - Output changed: CSV format → JSON format - Impact: Breaking change for consumers - Action: Maintain backward compatibility [... continue for all regressions ...] EXPECTED CHANGES ---------------- 1. test_error_messages - Error messages now include more context - Intentional improvement - Action: Update baseline RECOMMENDATIONS --------------- 1. Fix critical regressions before release 2. Review high severity changes with team 3. Document breaking changes in changelog 4. Update tests for intentional changes
8. Fix or Accept Changes
Fix true regressions:
# Fix the code git checkout new-version # Make fixes git commit -m "Fix: regression in user authentication" # Re-run tests pytest --json-report --json-report-file=fixed_results.json # Verify fix python scripts/compare_results.py old_results.json fixed_results.json
Accept intentional changes:
# Update baseline cp new_results.json baseline_results.json # Document in changelog echo "- Changed: CSV export now returns JSON" >> CHANGELOG.md
Quick Reference
Regression Types
Output Regressions:
- Function returns different values
- Data format changes
- Calculation differences
Exception Regressions:
- New exceptions raised
- Different exception types
- Changed error messages
State Regressions:
- Different database state
- Different files created
- Different side effects
Performance Regressions:
- Slower execution
- Higher memory usage
- More API calls
Severity Levels
Critical (block release):
- Test passed → failed
- Data corruption
- Security issues
- Crashes
High (fix before release):
- Wrong outputs
- Breaking API changes
- Major performance degradation (>2x)
Medium (review and decide):
- Minor output changes
- Moderate performance degradation (50-100%)
- Changed error messages
Low (document):
- Cosmetic changes
- Minor performance changes (<50%)
- Log message changes
Comparison Strategies
Exact comparison:
old_output == new_output
Approximate comparison (floats):
abs(old_output - new_output) < tolerance
Structural comparison (ignore fields):
# Ignore timestamps, IDs compare_ignoring_fields(old, new, ['timestamp', 'id'])
Semantic comparison (order-independent):
# Compare as sets set(old_list) == set(new_list)
Helper Script
The
compare_results.py script automates comparison:
# Basic comparison python scripts/compare_results.py old_results.json new_results.json # Custom float tolerance python scripts/compare_results.py old_results.json new_results.json --tolerance 0.001 # Save detailed report python scripts/compare_results.py old_results.json new_results.json --output report.json
Supported formats:
- pytest JSON report
- Jest JSON report
- Generic JSON format
Output includes:
- Categorized regressions by severity
- Specific test failures
- Output diffs
- Exception changes
- Improvements
Best Practices
Ensure deterministic tests:
- Fix random seeds
- Mock current time
- Mock external APIs
- Sort non-deterministic outputs
Run multiple times:
- Verify baseline stability
- Catch flaky tests
- Ensure reproducibility
Isolate changes:
- Test one change at a time
- Use git bisect for root cause
- Compare specific commits
Document expectations:
- Maintain changelog
- Note intentional changes
- Update test baselines
Automate checks:
- Run in CI/CD pipeline
- Block on critical regressions
- Generate reports automatically