Aiwg regression-check

Compare current behavior against baseline to detect regressions

install

source · Clone the upstream repo

git clone https://github.com/jmagly/aiwg

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/sdlc-complete/skills/regression-check" ~/.claude/skills/jmagly-aiwg-regression-check-e1eccd && rm -rf "$T"

manifest: agentic/code/frameworks/sdlc-complete/skills/regression-check/SKILL.md

source content

Regression Check Command

Compare current system behavior against a baseline to detect regressions using executable feedback patterns.

Task

I'll perform comprehensive regression detection by:

Establishing baseline comparison point (branch, tag, commit, or timestamp)
Identifying scope of changes to test
Executing relevant test suites with baseline comparison
Analyzing execution feedback for behavioral changes
Reporting detected regressions with severity and impact
Optionally correlating with bug reports

Arguments

```
--baseline <ref>
```
- Comparison baseline (branch name, git tag, commit hash, or ISO timestamp)
- Examples:
```
--baseline main
```
  ,
```
--baseline v2026.1.4
```
  ,
```
--baseline abc123
```
  ,
```
--baseline 2026-01-20T10:00:00Z
```
- Default:
```
main
```
  branch or most recent release tag
```
--scope <level>
```
- Testing scope
- ```
full
```
  - Run complete test suite (default)
- ```
module <name>
```
  - Test specific module/component
- ```
changed-files
```
  - Test only files changed since baseline
- Example:
```
--scope module auth
```
```
--format <type>
```
- Output format
- ```
summary
```
  - High-level regression report (default)
- ```
detailed
```
  - Full execution logs with diffs
- ```
json
```
  - Machine-readable format for CI/CD integration
```
--bug-report <path>
```
- Correlate with bug report
- Verifies bug fix doesn't introduce regressions
- Cross-references test failures with reported issues
- Example:
```
--bug-report .aiwg/bugs/BUG-042.md
```
```
--save-baseline
```
- Save current execution as new baseline
- Useful after confirming changes are intentional
- Updates regression detection baseline for future runs

Process

Step 1: Baseline Establishment

# Determine baseline reference
BASELINE_REF="${--baseline:-main}"

# Checkout baseline (in temporary worktree)
git worktree add /tmp/baseline-check "$BASELINE_REF"

# Capture baseline test results
cd /tmp/baseline-check
npm test -- --json > /tmp/baseline-results.json

Step 2: Scope Identification

Full Scope:

Run entire test suite in both baseline and current
Compare all test outcomes

Module Scope:

Identify module-specific tests
Run subset of test suite
Focus regression detection on specified component

Changed Files Scope:

# Identify changed files
git diff --name-only "$BASELINE_REF" > /tmp/changed-files.txt

# Map to test files
# Example: src/auth/login.ts → test/unit/auth/login.test.ts

# Run only affected tests

Step 3: Execution Comparison

Execute tests in current state:

# Run tests with same configuration as baseline
npm test -- --json > /tmp/current-results.json

# Capture additional metrics
# - Performance timing
# - Code coverage deltas
# - Error types and frequencies

Step 4: Regression Analysis

Compare execution results:

regression_detection:
  new_failures:
    - test: "test/unit/auth/login.test.ts::validateCredentials"
      baseline_status: PASS
      current_status: FAIL
      error: "Expected 200, received 401"
      severity: CRITICAL

  behavior_changes:
    - test: "test/integration/api/users.test.ts::createUser"
      baseline_output: "User created in 150ms"
      current_output: "User created in 450ms"
      delta: +200%
      severity: MAJOR

  fixed_tests:
    - test: "test/unit/utils/parser.test.ts::edgeCase"
      baseline_status: FAIL
      current_status: PASS
      note: "Intentional fix"

Step 5: Reporting

Summary Format:

## Regression Check Report

**Baseline**: main (commit abc123)
**Scope**: full
**Date**: 2026-01-25T15:30:00Z

### Summary

- **New Failures**: 3 tests
- **Behavior Changes**: 5 tests (performance degradation)
- **Fixed Tests**: 2 tests
- **Overall**: ⚠️ REGRESSIONS DETECTED

### Critical Regressions

1. **auth/login.test.ts::validateCredentials**
   - Status: PASS → FAIL
   - Error: "Expected 200, received 401"
   - Impact: Breaks user authentication
   - Action: BLOCK MERGE

2. **payments/process.test.ts::chargeCard**
   - Status: PASS → FAIL
   - Error: "Transaction timeout"
   - Impact: Payment processing broken
   - Action: BLOCK MERGE

### Behavior Changes

1. **api/users.test.ts::createUser**
   - Performance: 150ms → 450ms (+200%)
   - Severity: MAJOR
   - Action: INVESTIGATE

[... detailed report continues ...]

JSON Format:

{
  "baseline": {
    "ref": "main",
    "commit": "abc123",
    "timestamp": "2026-01-20T10:00:00Z"
  },
  "current": {
    "commit": "def456",
    "timestamp": "2026-01-25T15:30:00Z"
  },
  "summary": {
    "new_failures": 3,
    "behavior_changes": 5,
    "fixed_tests": 2,
    "total_tests": 247
  },
  "regressions": [
    {
      "test": "auth/login.test.ts::validateCredentials",
      "type": "failure",
      "severity": "critical",
      "baseline_status": "PASS",
      "current_status": "FAIL",
      "error": "Expected 200, received 401",
      "action": "block"
    }
  ],
  "verdict": "FAIL"
}

Bug Report Integration

When

--bug-report

is provided:

# Load bug report
BUG_REPORT=$(cat .aiwg/bugs/BUG-042.md)

# Extract:
# - Bug description
# - Affected components
# - Expected fix behavior

# Cross-reference regression results:
# 1. Did fix resolve reported bug? (test now passes)
# 2. Did fix introduce new failures? (regressions)
# 3. Did fix change behavior elsewhere? (side effects)

# Generate correlation report:

Example Output:

## Bug Fix Verification: BUG-042

**Bug**: Login fails for users with special characters in email

**Fix Verification**: ✅ PASS
- Test `auth/login.test.ts::specialCharactersInEmail` now passes
- Previously failing with "Email validation error"

**Regression Check**: ⚠️ WARNING
- **New failure detected**: `auth/login.test.ts::standardLogin`
  - Error: "Unexpected character escape"
  - Likely caused by overly aggressive input sanitization
  - **Action**: Refine fix to avoid breaking standard login flow

**Recommendation**: REFINE FIX before merge

Baseline Modes

Branch Baseline

# Compare against branch
/regression-check --baseline develop --scope changed-files

# Use case: Feature branch testing
# Ensures changes don't break existing functionality

Tag Baseline

# Compare against release
/regression-check --baseline v2026.1.4 --format detailed

# Use case: Release regression testing
# Verifies new version doesn't break previous release behavior

Commit Baseline

# Compare against specific commit
/regression-check --baseline abc123def --scope module auth

# Use case: Pinpoint regression introduction
# Identifies exact commit that introduced regression

Timestamp Baseline

# Compare against point in time
/regression-check --baseline 2026-01-20T10:00:00Z

# Use case: Time-based regression detection
# Finds when behavior changed over time

Executable Feedback Pattern

Implements REF-013 MetaGPT executable feedback loop:

executable_feedback_loop:
  1_execute_baseline:
    - checkout_baseline_ref
    - run_test_suite
    - capture_execution_results

  2_execute_current:
    - run_test_suite_current
    - capture_execution_results

  3_compare_feedback:
    - detect_new_failures
    - detect_behavior_changes
    - detect_performance_regressions

  4_analyze_root_cause:
    - diff_source_changes
    - identify_likely_culprits
    - suggest_fixes

  5_iterate_or_report:
    - if_regressions_critical: block_merge
    - if_regressions_minor: warn_and_document
    - if_no_regressions: approve

Usage Examples

Example 1: Pre-Merge Regression Check

# Before merging feature branch
/regression-check --baseline main --scope changed-files --format summary

# Expected outcome:
# - Fast execution (only changed files tested)
# - Summary of any regressions introduced
# - Recommendation: merge/block/investigate

Example 2: Bug Fix Verification

# After implementing bug fix
/regression-check --baseline v2026.1.4 --bug-report .aiwg/bugs/BUG-042.md

# Expected outcome:
# - Verifies bug is fixed (test now passes)
# - Checks for side effects (new failures)
# - Provides merge recommendation

Example 3: Release Regression Testing

# Before cutting release
/regression-check --baseline main --scope full --format detailed

# Expected outcome:
# - Comprehensive test suite comparison
# - Detailed logs for all detected regressions
# - Complete report for release decision

Example 4: Module-Specific Testing

# After refactoring auth module
/regression-check --baseline main --scope module auth --format json

# Expected outcome:
# - Focused regression detection
# - JSON output for CI/CD integration
# - Fast feedback on module changes

Example 5: Performance Regression Detection

# Detect performance regressions
/regression-check --baseline v2026.1.3 --format detailed

# Expected outcome:
# - Timing comparisons for all tests
# - Flags tests with >20% performance degradation
# - Detailed analysis of slow tests

Integration with CI/CD

# .github/workflows/regression-check.yml
name: Regression Check

on:
  pull_request:
    branches: [main]

jobs:
  regression:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Full history for baseline comparison

      - name: Run regression check
        run: |
          aiwg regression-check \
            --baseline main \
            --scope changed-files \
            --format json \
            > regression-results.json

      - name: Post results
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            const results = require('./regression-results.json');
            // Post comment to PR with regression report

Output Artifacts

Generated files:

.aiwg/testing/regression-reports/
├── regression-{timestamp}.md         # Human-readable report
├── regression-{timestamp}.json       # Machine-readable results
├── baseline-execution.log           # Baseline test output
├── current-execution.log            # Current test output
└── diff-analysis.md                 # Source code diff analysis

Exit Codes

0  - No regressions detected
1  - Critical regressions (blocking)
2  - Major regressions (warning)
3  - Minor regressions (informational)
4  - Execution error (baseline/current test failure)

References

@$AIWG_ROOT/docs/references/REF-013-metagpt-multi-agent-framework.md - Executable feedback pattern
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Feedback loop implementation rules
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/templates/test/regression-test-set-card.md - Regression test template
@.aiwg/testing/ - Test artifacts location
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Test-first principles