Aiwg regression-baseline

Create and maintain regression test baselines for comparison and drift detection across versioned snapshots

install

source · Clone the upstream repo

git clone https://github.com/jmagly/aiwg

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/regression-baseline" ~/.claude/skills/jmagly-aiwg-regression-baseline && rm -rf "$T"

manifest: .agents/skills/regression-baseline/SKILL.md

source content

regression-baseline

Create and maintain regression test baselines for comparison and drift detection.

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

"snapshot tests" → baseline creation shorthand
"capture current behavior" → baseline establishment

Purpose

This skill manages regression baselines by:

Capturing known-good system states as baselines
Storing snapshots of expected outputs
Maintaining versioned baseline history
Comparing current state to baseline
Detecting drift from established baselines
Managing baseline updates and approvals

Behavior

When triggered, this skill:

Identifies baseline scope:
- Determine what to baseline (tests, outputs, performance)
- Select baseline type (functional, visual, performance, API)
- Identify affected components
- Check for existing baselines
Captures baseline data:
- Run tests and capture outputs
- Take snapshots (visual, data, API responses)
- Measure performance metrics
- Record system configuration
- Generate checksums/hashes
Stores baseline:
- Version baseline with semantic versioning
- Tag with git commit/release
- Store in
```
.aiwg/testing/baselines/
```
- Update baseline manifest
- Link to requirements/features
Validates baseline quality:
- Ensure baseline is stable (run multiple times)
- Verify baseline is reproducible
- Check for known issues in baseline
- Require approval for critical baselines
Documents baseline:
- Record what is baselined
- Note when and why baseline was created
- Document known deviations
- Link to issues/requirements
Manages baseline lifecycle:
- Archive old baselines
- Update baselines on intentional changes
- Track baseline drift over time
- Alert on excessive drift

Baseline Types

Functional Baseline

functional_baseline:
  description: Expected behavior outputs
  scope: Unit and integration tests
  format: JSON snapshots

  capture:
    - test_outputs
    - assertions
    - mock_data
    - expected_errors

  example:
    test: "user registration"
    baseline: "baselines/functional/user-registration-v1.json"
    includes:
      - success_response_format
      - validation_error_messages
      - database_state_after_registration

Visual Baseline

visual_baseline:
  description: UI screenshots for visual regression
  scope: Frontend components
  format: PNG images with metadata

  capture:
    - component_screenshots
    - page_screenshots
    - responsive_breakpoints
    - interaction_states

  example:
    component: "LoginForm"
    baseline: "baselines/visual/login-form-v2.png"
    metadata:
      viewport: 1920x1080
      theme: dark
      state: default

Performance Baseline

performance_baseline:
  description: Performance benchmarks
  scope: API response times, load tests
  format: JSON metrics

  capture:
    - response_times_p50_p95_p99
    - throughput_requests_per_second
    - resource_utilization
    - database_query_times

  example:
    endpoint: "/api/users"
    baseline: "baselines/performance/api-users-v1.json"
    metrics:
      p50_response_time_ms: 45
      p95_response_time_ms: 120
      p99_response_time_ms: 250
      throughput_rps: 1000

API Contract Baseline

api_baseline:
  description: API request/response contracts
  scope: REST/GraphQL APIs
  format: OpenAPI/JSON schemas

  capture:
    - request_schemas
    - response_schemas
    - status_codes
    - headers

  example:
    endpoint: "POST /api/auth/login"
    baseline: "baselines/api/auth-login-v3.yaml"
    contract:
      request_body_schema: LoginRequest
      success_response: 200 with token
      error_responses: [400, 401, 429]

Baseline Manifest

# .aiwg/testing/baselines/manifest.yaml

baselines:
  - id: functional-auth-v1
    type: functional
    created: 2026-01-28T10:00:00Z
    created_by: test-engineer
    git_commit: abc123def
    release: v1.2.0
    scope: Authentication flows
    files:
      - baselines/functional/login-v1.json
      - baselines/functional/logout-v1.json
      - baselines/functional/register-v1.json
    status: active
    approved_by: tech-lead
    notes: Initial baseline for auth module

  - id: visual-dashboard-v2
    type: visual
    created: 2026-01-20T14:30:00Z
    created_by: frontend-engineer
    git_commit: def456ghi
    release: v1.1.0
    scope: Dashboard UI components
    files:
      - baselines/visual/dashboard-desktop-v2.png
      - baselines/visual/dashboard-mobile-v2.png
    status: active
    approved_by: design-lead
    previous_baseline: visual-dashboard-v1
    changes: Updated color scheme per design system v2

  - id: performance-api-v1
    type: performance
    created: 2026-01-15T09:00:00Z
    created_by: devops-engineer
    git_commit: ghi789jkl
    release: v1.0.0
    scope: Core API endpoints
    files:
      - baselines/performance/api-benchmarks-v1.json
    status: active
    approved_by: architect
    notes: Pre-optimization baseline

Baseline Comparison Report

# Baseline Comparison Report

**Date**: 2026-01-28
**Baseline**: functional-auth-v1
**Current State**: HEAD (commit xyz789)
**Comparison Type**: Functional

## Executive Summary

**Status**: ⚠️ Drift Detected
**Changes**: 3 outputs differ from baseline
**Severity**: Medium - Unexpected behavior changes

| Metric | Baseline | Current | Drift |
|--------|----------|---------|-------|
| Tests Passing | 45/45 | 43/45 | -2 |
| Output Matches | 100% | 93.3% | -6.7% |
| New Failures | 0 | 2 | +2 |

## Detailed Comparison

### Test: user-login

**Status**: ⚠️ Drift

**Baseline** (functional-auth-v1):
```json
{
  "status": 200,
  "body": {
    "token": "jwt.header.payload.signature",
    "user": {
      "id": "uuid",
      "email": "user@example.com"
    }
  }
}

Current:

{
  "status": 200,
  "body": {
    "token": "jwt.header.payload.signature",
    "user": {
      "id": "uuid",
      "email": "user@example.com",
      "name": "John Doe"  // NEW FIELD
    }
  }
}

Analysis: New

name

field added to response Impact: Breaking change for clients expecting exact schema Recommendation: Update baseline if intentional, or fix if bug

Test: user-registration-invalid-email

Status: ❌ Failure

Baseline:

{
  "status": 400,
  "body": {
    "error": "Invalid email format"
  }
}

Current:

{
  "status": 500,
  "body": {
    "error": "Internal server error"
  }
}

Analysis: Validation now returns 500 instead of 400 Impact: Critical - Server error on invalid input Recommendation: Fix immediately - regression in error handling

Test: user-logout

Status: ✅ Match

Output matches baseline exactly. No drift detected.

Drift Summary

Test	Status	Drift Type	Severity
user-login	⚠️ Drift	Schema change	Medium
user-registration	✅ Match	None	-
user-registration-invalid	❌ Fail	Error code	High
user-logout	✅ Match	None	-
token-refresh	✅ Match	None	-

Root Cause Analysis

Likely Causes

user-login drift: Intentional feature addition (commit abc123)
- PR #789: "Add user name to login response"
- Needs baseline update
invalid-email failure: Unintentional regression
- Validation middleware broken
- Returns 500 instead of 400
- Introduced in commit def456

Recommendations

Immediate Actions

Fix validation error handling (returns 500 instead of 400)
Add test for proper error codes
Verify error handling across all validation

Baseline Updates

Update baseline for user-login if name field is intentional
Document breaking change in API changelog
Notify API consumers of schema change

Process Improvements

Require baseline review for API changes
Add baseline diff to PR template
Run baseline comparison in CI

Approval Required

To update baseline with new outputs:

# Review changes
aiwg baseline compare functional-auth-v1

# Update if intentional
aiwg baseline update functional-auth-v1 \
  --approve-changes user-login \
  --justification "Added name field per REQ-123"

# Or create new baseline version
aiwg baseline create functional-auth-v2 \
  --based-on v1 \
  --changes "Added user name to login response"


## Usage Examples

### Create New Baseline

User: "Create baseline for authentication tests"

Skill executes:

Identify auth test suite
Run tests and capture outputs
Validate stability (run 3 times)
Store baseline with version
Update manifest

Output: "Baseline Created: functional-auth-v1

Scope: Authentication tests (45 tests) Format: JSON snapshots Location: .aiwg/testing/baselines/functional/auth/

Files:

login-v1.json
logout-v1.json
register-v1.json
token-refresh-v1.json

Status: Active Git Commit: abc123 Release: v1.2.0

Next: Baseline will be used for regression detection"


### Update Existing Baseline

User: "Update baseline for dashboard UI"

Skill checks current vs baseline: "Comparing to visual-dashboard-v1:

5 screenshots differ
Changes: Color scheme updated

Approve changes?

dashboard-desktop.png
dashboard-mobile.png
widget-panel.png

Reason for update?"

User: "Design system v2 rollout"

Skill creates: "Baseline Updated: visual-dashboard-v2

Previous: visual-dashboard-v1 (archived) Changes: Design system v2 colors Approved by: [current user] Files updated: 5

New baseline active for regression tests"


### Compare to Baseline

User: "Compare current state to baseline"

Skill analyzes: "Baseline Comparison: functional-auth-v1

Status: ⚠️ Drift Detected

Matches: 43/45 tests (95.6%) Drifts: 2 tests show differences

Drift 1: user-login

New field 'name' in response
Severity: Medium
Likely intentional (commit abc123)

Drift 2: invalid-email handling

Returns 500 instead of 400
Severity: High
Likely regression (needs fix)

See full report: .aiwg/testing/baseline-comparison.md"


## Integration

This skill uses:
- `regression-bisect`: Find commits that broke baseline
- `regression-metrics`: Track baseline drift over time
- `test-coverage`: Ensure baselines cover critical paths
- `project-awareness`: Detect test framework and conventions

## Agent Orchestration

```yaml
agents:
  creation:
    agent: test-engineer
    focus: Baseline capture and validation

  analysis:
    agent: test-architect
    focus: Drift analysis and recommendations

  approval:
    agent: tech-lead
    focus: Baseline update approval

Configuration

Baseline Storage

baseline_config:
  storage_path: .aiwg/testing/baselines/
  structure:
    - functional/
    - visual/
    - performance/
    - api/

  versioning: semantic  # v1, v2, v3
  compression: gzip     # for large baselines
  retention: 10         # keep last 10 versions

Drift Thresholds

drift_thresholds:
  functional:
    exact_match_required: true
    allow_new_fields: false  # strict schema matching

  visual:
    pixel_diff_threshold: 0.1%  # 0.1% difference allowed
    ignore_areas: [timestamp, dynamic-content]

  performance:
    p50_tolerance: 10%   # ±10% from baseline
    p95_tolerance: 15%   # ±15% from baseline
    p99_tolerance: 20%   # ±20% from baseline

Approval Requirements

approval_config:
  require_approval_for:
    - baseline_creation: true
    - baseline_updates: true
    - baseline_deletion: true

  approvers:
    - tech-lead
    - architect
    - test-lead

  approval_via:
    - pr_review
    - issue_comment
    - command_flag: --approved-by

Output Locations

Baselines:
```
.aiwg/testing/baselines/{type}/
```
Manifest:
```
.aiwg/testing/baselines/manifest.yaml
```
Comparisons:
```
.aiwg/testing/baseline-comparisons/
```
Archived:
```
.aiwg/testing/baselines/archive/
```

References

@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/test/baseline-schema.yaml
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/baseline-create.md
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md