Aiwg regression-auto-baseline

Automatically manage regression test baseline lifecycle triggered by releases, deployments, and quality gates

install

source · Clone the upstream repo

git clone https://github.com/jmagly/aiwg

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/sdlc-complete/skills/regression-auto-baseline" ~/.claude/skills/jmagly-aiwg-regression-auto-baseline-1d2263 && rm -rf "$T"

manifest: agentic/code/frameworks/sdlc-complete/skills/regression-auto-baseline/SKILL.md

source content

regression-auto-baseline

Automatically manage regression test baseline lifecycle with triggers for releases, deployments, and quality gates.

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

"auto-update baselines" → automatic baseline management
"baseline on release" → release-triggered baseline update

Purpose

This skill automates regression baseline management by:

Triggering baseline updates on semantic version tags
Updating baselines after successful merges to main/master
Creating baselines on deployment success
Scheduling periodic baseline refreshes
Managing baseline retention and archival
Validating baseline quality before activation
Integrating with CI/CD for seamless automation

Behavior

When triggered or activated automatically, this skill:

Detects trigger conditions:
- Semantic version tag created (v*..)
- Merge to protected branch completed
- Deployment succeeded to environment
- Quality gate passed
- Manual approval received
- Scheduled time reached
Validates pre-conditions:
- Tests are passing (minimum threshold met)
- No critical issues open
- Quality gates passed
- Human approval obtained (if required)
- No concurrent baseline updates in progress
Captures new baseline:
- Run full regression test suite
- Capture all artifact types (functional, visual, performance, API)
- Verify baseline stability (run multiple times if configured)
- Generate checksums for integrity
- Tag with git commit, release version, timestamp
Validates baseline quality:
- Compare to previous baseline
- Detect unexpected changes
- Verify all expected outputs present
- Check for baseline drift
- Flag suspicious differences for review
Activates baseline:
- Archive previous baseline version
- Set new baseline as active
- Update baseline manifest
- Notify relevant teams
- Create baseline comparison report
Manages lifecycle:
- Archive old baselines per retention policy
- Compress large baselines
- Upload to cloud storage if configured
- Prune outdated baselines
- Maintain audit trail

Trigger Types

Semantic Version Tag

trigger:
  type: git_tag
  pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$"
  description: "Baseline on semantic version tags"

  behavior:
    on_tag_create:
      - validate_tag_format
      - check_if_tests_passing
      - create_baseline:
          name: "release-{tag}"
          auto_activate: true
      - tag_baseline_with_git_tag

  example:
    tag: v2.1.0
    baseline_id: release-v2.1.0
    status: auto-created

Merge to Main

trigger:
  type: git_merge
  branches:
    - main
    - master
  description: "Baseline after merge to protected branch"

  behavior:
    on_merge_complete:
      - wait_for_ci_success
      - check_test_pass_rate:
          minimum: 95%
      - create_baseline:
          name: "main-{commit_short}"
          auto_activate: false  # Require approval
      - request_approval:
          approvers: [tech-lead, test-lead]

  example:
    merge_commit: abc123def
    baseline_id: main-abc123d
    approval_required: true

Deployment Success

trigger:
  type: deployment
  environments:
    - production
    - staging
  description: "Baseline after successful deployment"

  behavior:
    on_deployment_success:
      - wait_for_smoke_tests
      - verify_health_checks
      - create_baseline:
          name: "{environment}-{deploy_id}"
          auto_activate: true
      - link_to_deployment:
          id: deploy_id
          timestamp: deployment_timestamp

  example:
    environment: production
    deploy_id: prod-2026-01-28-01
    baseline_id: production-prod-2026-01-28-01

Manual Approval

trigger:
  type: manual
  description: "Baseline after explicit human approval"

  behavior:
    on_approval_received:
      - validate_approver_permissions
      - capture_approval_metadata
      - create_baseline:
          name: "approved-{timestamp}"
          auto_activate: true
      - record_approver:
          name: approver_name
          reason: approval_reason

  example:
    approver: tech-lead
    reason: "Post-release validation complete"
    baseline_id: approved-2026-01-28T15-30

Scheduled Updates

trigger:
  type: schedule
  description: "Baseline on fixed schedule"

  schedules:
    nightly:
      cron: "0 2 * * *"
      description: "Nightly baseline capture"
      auto_activate: false

    weekly:
      cron: "0 3 * * 0"
      description: "Weekly baseline for long-term tracking"
      auto_activate: false

    monthly:
      cron: "0 4 1 * *"
      description: "Monthly baseline archive"
      auto_activate: false

  behavior:
    on_schedule_trigger:
      - check_if_tests_healthy
      - create_baseline:
          name: "{schedule}-{date}"
          auto_activate: false
      - generate_drift_report

Baseline Storage Strategies

Local File Storage

storage:
  type: local
  location: .aiwg/testing/baselines/
  structure:
    - functional/
    - visual/
    - performance/
    - api/

  retention:
    active: 1        # Keep 1 active baseline per type
    archive: 10      # Keep 10 archived versions
    compress: true   # Compress archives with gzip

  cleanup:
    prune_after_days: 90
    keep_tagged: true  # Never prune tagged baselines

Git LFS

storage:
  type: git_lfs
  description: "Use Git LFS for large baselines (screenshots, large JSON)"

  configuration:
    track_patterns:
      - "*.png"
      - "*.jpg"
      - "baselines/**/*.json"
    max_file_size: 10MB

  behavior:
    on_baseline_create:
      - compress_if_needed
      - add_to_git_lfs
      - commit_with_metadata

Cloud Storage (S3/GCS)

storage:
  type: cloud
  provider: s3  # or gcs, azure_blob

  configuration:
    bucket: my-project-baselines
    path_prefix: baselines/{environment}/
    encryption: AES256

  behavior:
    on_baseline_create:
      - upload_to_cloud
      - generate_signed_url:
          expiry: 7 days
      - store_url_in_manifest
      - keep_local_copy: false  # Optional: delete after upload

  retention:
    lifecycle_policy:
      - after_30_days: transition_to_infrequent_access
      - after_90_days: transition_to_glacier
      - after_365_days: delete

Artifact Registry

storage:
  type: artifact_registry
  description: "Integrate with artifact registries (npm, Maven, Docker)"

  configuration:
    registry_url: https://registry.example.com
    repository: baseline-artifacts
    versioning: semver  # Follow package versioning

  behavior:
    on_baseline_create:
      - package_baseline:
          format: tarball
          name: "{project}-baseline"
          version: "{tag_version}"
      - publish_to_registry
      - tag_as_latest: true

Baseline Validation

Pre-Activation Checks

validation:
  before_activation:
    - test_pass_rate:
        minimum: 95%
        description: "At least 95% of tests must pass"

    - quality_gates:
        all_must_pass: true
        description: "All quality gates must be green"

    - stability_check:
        runs: 3
        consistency: 100%
        description: "Baseline must be 100% consistent across 3 runs"

    - critical_issues:
        max_open: 0
        description: "No critical issues can be open"

    - human_approval:
        required_for:
          - production_baselines
          - breaking_changes
        approvers: [tech-lead, qa-lead]

Baseline Diff Detection

validation:
  diff_detection:
    compare_to_previous: true
    flag_changes:
      - new_failures
      - performance_degradation
      - unexpected_schema_changes
      - visual_differences

    severity_classification:
      critical:
        - new_test_failures
        - security_regressions
      high:
        - performance_degradation_gt_20_percent
        - breaking_api_changes
      medium:
        - new_warnings
        - minor_visual_differences
      low:
        - timestamp_changes
        - non-functional_metadata

    action_on_critical:
      - block_activation
      - notify_stakeholders
      - require_manual_review

Environment-Specific Baselines

environments:
  development:
    auto_baseline: true
    trigger: merge_to_develop
    retention: 5
    approval_required: false

  staging:
    auto_baseline: true
    trigger: deployment_success
    retention: 10
    approval_required: true
    approvers: [qa-lead]

  production:
    auto_baseline: true
    trigger: semantic_version_tag
    retention: 20
    approval_required: true
    approvers: [tech-lead, release-manager]
    validation_level: strict

Selective Baseline Updates

selective_updates:
  description: "Update only specific baseline types"

  update_policies:
    functional:
      trigger: every_merge
      auto_activate: true

    visual:
      trigger: design_system_update
      auto_activate: false
      require_approval: true

    performance:
      trigger: weekly
      auto_activate: false
      threshold_change: 10%  # Only update if >10% improvement

    api:
      trigger: version_tag
      auto_activate: true
      breaking_changes_require_approval: true

Integration with CI/CD

GitHub Actions Example

# .github/workflows/auto-baseline.yml

name: Auto-Baseline Management

on:
  push:
    tags:
      - 'v*.*.*'
  workflow_dispatch:
    inputs:
      baseline_type:
        description: 'Baseline type to update'
        required: true
        type: choice
        options:
          - all
          - functional
          - visual
          - performance
          - api

jobs:
  auto_baseline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run tests and capture baseline
        run: |
          npm test -- --coverage
          npm run test:integration
          npm run test:e2e

      - name: Create baseline
        run: |
          npx aiwg baseline create auto-${GITHUB_REF_NAME} \
            --type all \
            --tag ${GITHUB_REF_NAME} \
            --auto-activate

      - name: Upload to S3
        run: |
          aws s3 sync .aiwg/testing/baselines/ \
            s3://my-baselines/baselines/${GITHUB_REF_NAME}/

      - name: Update manifest
        run: |
          git add .aiwg/testing/baselines/manifest.yaml
          git commit -m "chore: update baseline for ${GITHUB_REF_NAME}"
          git push

GitLab CI Example

# .gitlab-ci.yml

baseline_on_release:
  stage: deploy
  only:
    - tags
  script:
    - npm test
    - npx aiwg baseline create release-${CI_COMMIT_TAG}
    - npx aiwg baseline upload --storage s3
  artifacts:
    paths:
      - .aiwg/testing/baselines/
    expire_in: 30 days

Conflict Resolution

Concurrent Baseline Updates

concurrency_control:
  mode: lock_file
  lock_file: .aiwg/testing/baselines/.lock

  on_conflict:
    - check_lock_file
    - if_locked:
        wait_timeout: 300  # 5 minutes
        action_on_timeout: fail_with_message

  distributed_lock:
    enabled: true
    backend: redis
    key: "baseline_update_lock"
    ttl: 600  # 10 minutes

Merge Conflicts

merge_strategy:
  on_baseline_conflict:
    - prefer: newer_timestamp
    - fallback: manual_resolution

  resolution_workflow:
    - detect_conflict
    - preserve_both_baselines
    - notify_team_for_manual_merge
    - provide_diff_comparison

Baseline Rollback

rollback:
  enabled: true
  preserve_rollback_history: true

  rollback_triggers:
    - manual_command
    - quality_regression_detected
    - deployment_failure

  rollback_process:
    - identify_previous_baseline
    - validate_previous_baseline_integrity
    - deactivate_current_baseline
    - activate_previous_baseline
    - update_manifest
    - notify_stakeholders
    - record_rollback_reason

  example:
    current: baseline-v2.1.0
    rollback_to: baseline-v2.0.5
    reason: "Critical regression in v2.1.0"
    triggered_by: tech-lead

Baseline Diff Reports

# Baseline Diff Report

**Date**: 2026-01-28
**Previous Baseline**: functional-auth-v2
**New Baseline**: functional-auth-v3
**Trigger**: semantic_version_tag v2.1.0

## Executive Summary

**Changes Detected**: 12 differences
**Severity**: Medium
**Recommendation**: Approve with review of API changes

| Category | Previous | New | Change |
|----------|----------|-----|--------|
| Tests Passing | 45/45 | 47/47 | +2 new tests |
| Functional Outputs | 100% match | 98% match | 2 API changes |
| Performance | Baseline | +5% faster | Improvement |
| Visual | Baseline | 100% match | No change |

## Detailed Changes

### Functional Changes

#### Change 1: New field in login response

**Type**: API Schema Change
**Severity**: Medium - Breaking for strict clients

**Previous** (baseline-v2):
```json
{
  "token": "jwt.header.payload",
  "user": {
    "id": "uuid",
    "email": "user@example.com"
  }
}

New (baseline-v3):

{
  "token": "jwt.header.payload",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "display_name": "John Doe"
  }
}

Impact: Clients using strict schema validation may fail Recommendation: Document as breaking change in CHANGELOG

Change 2: Error message improved

Type: Error Message Update Severity: Low - Informational

Previous:

Error: Invalid input

New:

ValidationError: Email format is invalid. Expected format: user@example.com

Impact: Better user experience, non-breaking Recommendation: Approve

Performance Changes

Metric	Previous	New	Change	Status
Login p50	45ms	42ms	-7%	✅ Improved
Login p95	120ms	115ms	-4%	✅ Improved
Registration p50	85ms	80ms	-6%	✅ Improved

New Tests Added

```
test/auth/display-name-validation.test.ts
```
- Tests display name field validation
- Covers new feature
```
test/auth/error-message-quality.test.ts
```
- Tests improved error messaging
- Regression test for clarity

Recommendations

Immediate Actions

Review API schema change for breaking impact
Update API documentation with new field
Notify API consumers via changelog
Approve baseline if change is intentional

Follow-up

Add versioning to API if not present
Consider deprecation period for schema changes
Update client SDKs with new field

Approval

To approve this baseline:

aiwg baseline approve functional-auth-v3 \
  --reviewed-by "tech-lead" \
  --reason "API schema change documented and approved"

To reject and rollback:

aiwg baseline rollback functional-auth-v3 \
  --to functional-auth-v2 \
  --reason "Breaking change not acceptable"


## Usage Examples

### Auto-Baseline on Release Tag

Event: Git tag v2.1.0 created

Skill activates automatically:

Detect tag creation event
Wait for CI to complete
Check test pass rate: 100% (45/45)
Run baseline capture
Validate baseline stability (3 runs)
Compare to previous baseline
Generate diff report

Output: "Auto-Baseline Created: release-v2.1.0

Trigger: Git tag v2.1.0 Previous: release-v2.0.5 Changes: 3 API fields added, 2 tests added Quality: 100% tests passing Stability: 100% consistent (3/3 runs)

Status: Pending approval Approvers: tech-lead, release-manager

Diff report: .aiwg/testing/baseline-diff-v2.1.0.md

Approve with: aiwg baseline approve release-v2.1.0"


### Auto-Baseline on Merge to Main

Event: PR #456 merged to main

Skill activates:

Detect merge event
Wait for CI success
Check test pass rate: 98% (44/45) - 1 flaky test
Decision: Pass rate below 100%, require approval

Output: "Auto-Baseline Blocked: main-abc123d

Trigger: Merge to main (commit abc123d) Test pass rate: 98% (44/45) Reason: 1 test flaky, below 100% threshold

Action required:

Stabilize flaky test: test/api/timeout.test.ts
Re-run baseline after fix

Or override with: aiwg baseline create main-abc123d
--force
--approved-by tech-lead
--reason 'Known flaky test acceptable'"


### Scheduled Weekly Baseline

Event: Weekly schedule (Sunday 3 AM)

Skill activates:

Detect schedule trigger
Run full test suite
Capture baseline: weekly-2026-01-28
Compare to last week
Generate drift report

Output: "Weekly Baseline Created: weekly-2026-01-28

Previous: weekly-2026-01-21 Drift: Minimal (0.2% performance improvement) New tests: 3 added Test health: 100% passing

Auto-activated: No (scheduled baselines require approval)

Drift report: .aiwg/testing/weekly-drift-2026-01-28.md

Review and approve: aiwg baseline review weekly-2026-01-28"


## Integration

This skill uses:
- `regression-baseline`: Core baseline creation and comparison
- `regression-bisect`: Find commits that changed baseline
- `regression-metrics`: Track baseline drift over time
- `test-coverage`: Ensure baseline captures critical paths
- `issue-auto-sync`: Create issues for baseline approval
- `project-awareness`: Detect CI/CD configuration

## Agent Orchestration

```yaml
agents:
  baseline_creation:
    agent: test-engineer
    focus: Capture and validate baseline

  diff_analysis:
    agent: test-architect
    focus: Analyze baseline changes

  approval:
    agent: tech-lead
    focus: Review and approve baselines

  automation:
    agent: devops-engineer
    focus: CI/CD integration

Configuration

Auto-Baseline Rules

# aiwg.yml

regression:
  auto_baseline:
    enabled: true

    triggers:
      semantic_version_tag:
        enabled: true
        pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$"
        auto_activate: false
        approval_required: true

      merge_to_main:
        enabled: true
        branches: [main, master]
        auto_activate: false
        test_pass_threshold: 100%

      deployment_success:
        enabled: true
        environments: [production, staging]
        auto_activate: true

      scheduled:
        enabled: true
        nightly:
          cron: "0 2 * * *"
          auto_activate: false
        weekly:
          cron: "0 3 * * 0"
          auto_activate: false

    validation:
      minimum_test_pass_rate: 95%
      stability_runs: 3
      require_human_approval_for:
        - production_baselines
        - breaking_changes
        - critical_severity_changes

    storage:
      primary: local
      backup: s3
      retention:
        active: 1
        archive: 10
        compress: true

    notifications:
      on_baseline_created: [slack, email]
      on_approval_needed: [slack, issue_comment]
      on_baseline_activated: [slack]

Retention Policy

retention:
  by_tag:
    release_tags: keep_forever
    development_tags: 30_days
    untagged: 7_days

  by_environment:
    production: 365_days
    staging: 90_days
    development: 30_days

  by_type:
    functional: 90_days
    visual: 60_days
    performance: 120_days
    api: 90_days

  compression:
    after_days: 7
    algorithm: gzip
    keep_uncompressed: false

Notification Templates

Baseline Created

📸 Auto-Baseline Created

Baseline: {baseline_id}
Trigger: {trigger_type}
Environment: {environment}
Status: {status}

Changes from previous:
• {change_count} differences detected
• Severity: {max_severity}

Diff report: {diff_report_url}

{approval_required ? "Approval required from: {approvers}" : "Auto-activated"}

Approval Needed

⚠️ Baseline Approval Required

Baseline: {baseline_id}
Reason: {approval_reason}

Changes:
{change_summary}

Review:
{diff_report_url}

Approve:
aiwg baseline approve {baseline_id} --reason "..."

Reject:
aiwg baseline reject {baseline_id} --reason "..."

Expires: {approval_timeout}

Baseline Activated

✅ Baseline Activated

Baseline: {baseline_id}
Activated at: {activation_timestamp}
Replaces: {previous_baseline_id}

This baseline is now active for regression testing.

All future test runs will be compared against this baseline.

Output Locations

Baselines:
```
.aiwg/testing/baselines/{type}/
```
Manifest:
```
.aiwg/testing/baselines/manifest.yaml
```
Diff reports:
```
.aiwg/testing/baseline-diffs/
```
Approval requests:
```
.aiwg/testing/baseline-approvals/
```
Archive:
```
.aiwg/testing/baselines/archive/
```
Locks:
```
.aiwg/testing/baselines/.lock
```

References

@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-baseline/SKILL.md - Manual baseline creation
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-bisect/SKILL.md - Find breaking commits
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-metrics/SKILL.md - Baseline drift tracking
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/regression-check.md - Regression testing command
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md - Test Engineer agent
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Executable feedback rules