Aiwg regression-auto-baseline

Automatically manage regression test baseline lifecycle triggered by releases, deployments, and quality gates

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/regression-auto-baseline" ~/.claude/skills/jmagly-aiwg-regression-auto-baseline && rm -rf "$T"
manifest: .agents/skills/regression-auto-baseline/SKILL.md
source content

regression-auto-baseline

Automatically manage regression test baseline lifecycle with triggers for releases, deployments, and quality gates.

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

  • "auto-update baselines" → automatic baseline management
  • "baseline on release" → release-triggered baseline update

Purpose

This skill automates regression baseline management by:

  • Triggering baseline updates on semantic version tags
  • Updating baselines after successful merges to main/master
  • Creating baselines on deployment success
  • Scheduling periodic baseline refreshes
  • Managing baseline retention and archival
  • Validating baseline quality before activation
  • Integrating with CI/CD for seamless automation

Behavior

When triggered or activated automatically, this skill:

  1. Detects trigger conditions:

    • Semantic version tag created (v*..)
    • Merge to protected branch completed
    • Deployment succeeded to environment
    • Quality gate passed
    • Manual approval received
    • Scheduled time reached
  2. Validates pre-conditions:

    • Tests are passing (minimum threshold met)
    • No critical issues open
    • Quality gates passed
    • Human approval obtained (if required)
    • No concurrent baseline updates in progress
  3. Captures new baseline:

    • Run full regression test suite
    • Capture all artifact types (functional, visual, performance, API)
    • Verify baseline stability (run multiple times if configured)
    • Generate checksums for integrity
    • Tag with git commit, release version, timestamp
  4. Validates baseline quality:

    • Compare to previous baseline
    • Detect unexpected changes
    • Verify all expected outputs present
    • Check for baseline drift
    • Flag suspicious differences for review
  5. Activates baseline:

    • Archive previous baseline version
    • Set new baseline as active
    • Update baseline manifest
    • Notify relevant teams
    • Create baseline comparison report
  6. Manages lifecycle:

    • Archive old baselines per retention policy
    • Compress large baselines
    • Upload to cloud storage if configured
    • Prune outdated baselines
    • Maintain audit trail

Trigger Types

Semantic Version Tag

trigger:
  type: git_tag
  pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$"
  description: "Baseline on semantic version tags"

  behavior:
    on_tag_create:
      - validate_tag_format
      - check_if_tests_passing
      - create_baseline:
          name: "release-{tag}"
          auto_activate: true
      - tag_baseline_with_git_tag

  example:
    tag: v2.1.0
    baseline_id: release-v2.1.0
    status: auto-created

Merge to Main

trigger:
  type: git_merge
  branches:
    - main
    - master
  description: "Baseline after merge to protected branch"

  behavior:
    on_merge_complete:
      - wait_for_ci_success
      - check_test_pass_rate:
          minimum: 95%
      - create_baseline:
          name: "main-{commit_short}"
          auto_activate: false  # Require approval
      - request_approval:
          approvers: [tech-lead, test-lead]

  example:
    merge_commit: abc123def
    baseline_id: main-abc123d
    approval_required: true

Deployment Success

trigger:
  type: deployment
  environments:
    - production
    - staging
  description: "Baseline after successful deployment"

  behavior:
    on_deployment_success:
      - wait_for_smoke_tests
      - verify_health_checks
      - create_baseline:
          name: "{environment}-{deploy_id}"
          auto_activate: true
      - link_to_deployment:
          id: deploy_id
          timestamp: deployment_timestamp

  example:
    environment: production
    deploy_id: prod-2026-01-28-01
    baseline_id: production-prod-2026-01-28-01

Manual Approval

trigger:
  type: manual
  description: "Baseline after explicit human approval"

  behavior:
    on_approval_received:
      - validate_approver_permissions
      - capture_approval_metadata
      - create_baseline:
          name: "approved-{timestamp}"
          auto_activate: true
      - record_approver:
          name: approver_name
          reason: approval_reason

  example:
    approver: tech-lead
    reason: "Post-release validation complete"
    baseline_id: approved-2026-01-28T15-30

Scheduled Updates

trigger:
  type: schedule
  description: "Baseline on fixed schedule"

  schedules:
    nightly:
      cron: "0 2 * * *"
      description: "Nightly baseline capture"
      auto_activate: false

    weekly:
      cron: "0 3 * * 0"
      description: "Weekly baseline for long-term tracking"
      auto_activate: false

    monthly:
      cron: "0 4 1 * *"
      description: "Monthly baseline archive"
      auto_activate: false

  behavior:
    on_schedule_trigger:
      - check_if_tests_healthy
      - create_baseline:
          name: "{schedule}-{date}"
          auto_activate: false
      - generate_drift_report

Baseline Storage Strategies

Local File Storage

storage:
  type: local
  location: .aiwg/testing/baselines/
  structure:
    - functional/
    - visual/
    - performance/
    - api/

  retention:
    active: 1        # Keep 1 active baseline per type
    archive: 10      # Keep 10 archived versions
    compress: true   # Compress archives with gzip

  cleanup:
    prune_after_days: 90
    keep_tagged: true  # Never prune tagged baselines

Git LFS

storage:
  type: git_lfs
  description: "Use Git LFS for large baselines (screenshots, large JSON)"

  configuration:
    track_patterns:
      - "*.png"
      - "*.jpg"
      - "baselines/**/*.json"
    max_file_size: 10MB

  behavior:
    on_baseline_create:
      - compress_if_needed
      - add_to_git_lfs
      - commit_with_metadata

Cloud Storage (S3/GCS)

storage:
  type: cloud
  provider: s3  # or gcs, azure_blob

  configuration:
    bucket: my-project-baselines
    path_prefix: baselines/{environment}/
    encryption: AES256

  behavior:
    on_baseline_create:
      - upload_to_cloud
      - generate_signed_url:
          expiry: 7 days
      - store_url_in_manifest
      - keep_local_copy: false  # Optional: delete after upload

  retention:
    lifecycle_policy:
      - after_30_days: transition_to_infrequent_access
      - after_90_days: transition_to_glacier
      - after_365_days: delete

Artifact Registry

storage:
  type: artifact_registry
  description: "Integrate with artifact registries (npm, Maven, Docker)"

  configuration:
    registry_url: https://registry.example.com
    repository: baseline-artifacts
    versioning: semver  # Follow package versioning

  behavior:
    on_baseline_create:
      - package_baseline:
          format: tarball
          name: "{project}-baseline"
          version: "{tag_version}"
      - publish_to_registry
      - tag_as_latest: true

Baseline Validation

Pre-Activation Checks

validation:
  before_activation:
    - test_pass_rate:
        minimum: 95%
        description: "At least 95% of tests must pass"

    - quality_gates:
        all_must_pass: true
        description: "All quality gates must be green"

    - stability_check:
        runs: 3
        consistency: 100%
        description: "Baseline must be 100% consistent across 3 runs"

    - critical_issues:
        max_open: 0
        description: "No critical issues can be open"

    - human_approval:
        required_for:
          - production_baselines
          - breaking_changes
        approvers: [tech-lead, qa-lead]

Baseline Diff Detection

validation:
  diff_detection:
    compare_to_previous: true
    flag_changes:
      - new_failures
      - performance_degradation
      - unexpected_schema_changes
      - visual_differences

    severity_classification:
      critical:
        - new_test_failures
        - security_regressions
      high:
        - performance_degradation_gt_20_percent
        - breaking_api_changes
      medium:
        - new_warnings
        - minor_visual_differences
      low:
        - timestamp_changes
        - non-functional_metadata

    action_on_critical:
      - block_activation
      - notify_stakeholders
      - require_manual_review

Environment-Specific Baselines

environments:
  development:
    auto_baseline: true
    trigger: merge_to_develop
    retention: 5
    approval_required: false

  staging:
    auto_baseline: true
    trigger: deployment_success
    retention: 10
    approval_required: true
    approvers: [qa-lead]

  production:
    auto_baseline: true
    trigger: semantic_version_tag
    retention: 20
    approval_required: true
    approvers: [tech-lead, release-manager]
    validation_level: strict

Selective Baseline Updates

selective_updates:
  description: "Update only specific baseline types"

  update_policies:
    functional:
      trigger: every_merge
      auto_activate: true

    visual:
      trigger: design_system_update
      auto_activate: false
      require_approval: true

    performance:
      trigger: weekly
      auto_activate: false
      threshold_change: 10%  # Only update if >10% improvement

    api:
      trigger: version_tag
      auto_activate: true
      breaking_changes_require_approval: true

Integration with CI/CD

GitHub Actions Example

# .github/workflows/auto-baseline.yml

name: Auto-Baseline Management

on:
  push:
    tags:
      - 'v*.*.*'
  workflow_dispatch:
    inputs:
      baseline_type:
        description: 'Baseline type to update'
        required: true
        type: choice
        options:
          - all
          - functional
          - visual
          - performance
          - api

jobs:
  auto_baseline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run tests and capture baseline
        run: |
          npm test -- --coverage
          npm run test:integration
          npm run test:e2e

      - name: Create baseline
        run: |
          npx aiwg baseline create auto-${GITHUB_REF_NAME} \
            --type all \
            --tag ${GITHUB_REF_NAME} \
            --auto-activate

      - name: Upload to S3
        run: |
          aws s3 sync .aiwg/testing/baselines/ \
            s3://my-baselines/baselines/${GITHUB_REF_NAME}/

      - name: Update manifest
        run: |
          git add .aiwg/testing/baselines/manifest.yaml
          git commit -m "chore: update baseline for ${GITHUB_REF_NAME}"
          git push

GitLab CI Example

# .gitlab-ci.yml

baseline_on_release:
  stage: deploy
  only:
    - tags
  script:
    - npm test
    - npx aiwg baseline create release-${CI_COMMIT_TAG}
    - npx aiwg baseline upload --storage s3
  artifacts:
    paths:
      - .aiwg/testing/baselines/
    expire_in: 30 days

Conflict Resolution

Concurrent Baseline Updates

concurrency_control:
  mode: lock_file
  lock_file: .aiwg/testing/baselines/.lock

  on_conflict:
    - check_lock_file
    - if_locked:
        wait_timeout: 300  # 5 minutes
        action_on_timeout: fail_with_message

  distributed_lock:
    enabled: true
    backend: redis
    key: "baseline_update_lock"
    ttl: 600  # 10 minutes

Merge Conflicts

merge_strategy:
  on_baseline_conflict:
    - prefer: newer_timestamp
    - fallback: manual_resolution

  resolution_workflow:
    - detect_conflict
    - preserve_both_baselines
    - notify_team_for_manual_merge
    - provide_diff_comparison

Baseline Rollback

rollback:
  enabled: true
  preserve_rollback_history: true

  rollback_triggers:
    - manual_command
    - quality_regression_detected
    - deployment_failure

  rollback_process:
    - identify_previous_baseline
    - validate_previous_baseline_integrity
    - deactivate_current_baseline
    - activate_previous_baseline
    - update_manifest
    - notify_stakeholders
    - record_rollback_reason

  example:
    current: baseline-v2.1.0
    rollback_to: baseline-v2.0.5
    reason: "Critical regression in v2.1.0"
    triggered_by: tech-lead

Baseline Diff Reports

# Baseline Diff Report

**Date**: 2026-01-28
**Previous Baseline**: functional-auth-v2
**New Baseline**: functional-auth-v3
**Trigger**: semantic_version_tag v2.1.0

## Executive Summary

**Changes Detected**: 12 differences
**Severity**: Medium
**Recommendation**: Approve with review of API changes

| Category | Previous | New | Change |
|----------|----------|-----|--------|
| Tests Passing | 45/45 | 47/47 | +2 new tests |
| Functional Outputs | 100% match | 98% match | 2 API changes |
| Performance | Baseline | +5% faster | Improvement |
| Visual | Baseline | 100% match | No change |

## Detailed Changes

### Functional Changes

#### Change 1: New field in login response

**Type**: API Schema Change
**Severity**: Medium - Breaking for strict clients

**Previous** (baseline-v2):
```json
{
  "token": "jwt.header.payload",
  "user": {
    "id": "uuid",
    "email": "user@example.com"
  }
}

New (baseline-v3):

{
  "token": "jwt.header.payload",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "display_name": "John Doe"
  }
}

Impact: Clients using strict schema validation may fail Recommendation: Document as breaking change in CHANGELOG

Change 2: Error message improved

Type: Error Message Update Severity: Low - Informational

Previous:

Error: Invalid input

New:

ValidationError: Email format is invalid. Expected format: user@example.com

Impact: Better user experience, non-breaking Recommendation: Approve

Performance Changes

MetricPreviousNewChangeStatus
Login p5045ms42ms-7%✅ Improved
Login p95120ms115ms-4%✅ Improved
Registration p5085ms80ms-6%✅ Improved

New Tests Added

  1. test/auth/display-name-validation.test.ts

    • Tests display name field validation
    • Covers new feature
  2. test/auth/error-message-quality.test.ts

    • Tests improved error messaging
    • Regression test for clarity

Recommendations

Immediate Actions

  • Review API schema change for breaking impact
  • Update API documentation with new field
  • Notify API consumers via changelog
  • Approve baseline if change is intentional

Follow-up

  • Add versioning to API if not present
  • Consider deprecation period for schema changes
  • Update client SDKs with new field

Approval

To approve this baseline:

aiwg baseline approve functional-auth-v3 \
  --reviewed-by "tech-lead" \
  --reason "API schema change documented and approved"

To reject and rollback:

aiwg baseline rollback functional-auth-v3 \
  --to functional-auth-v2 \
  --reason "Breaking change not acceptable"

## Usage Examples

### Auto-Baseline on Release Tag

Event: Git tag v2.1.0 created

Skill activates automatically:

  1. Detect tag creation event
  2. Wait for CI to complete
  3. Check test pass rate: 100% (45/45)
  4. Run baseline capture
  5. Validate baseline stability (3 runs)
  6. Compare to previous baseline
  7. Generate diff report

Output: "Auto-Baseline Created: release-v2.1.0

Trigger: Git tag v2.1.0 Previous: release-v2.0.5 Changes: 3 API fields added, 2 tests added Quality: 100% tests passing Stability: 100% consistent (3/3 runs)

Status: Pending approval Approvers: tech-lead, release-manager

Diff report: .aiwg/testing/baseline-diff-v2.1.0.md

Approve with: aiwg baseline approve release-v2.1.0"


### Auto-Baseline on Merge to Main

Event: PR #456 merged to main

Skill activates:

  1. Detect merge event
  2. Wait for CI success
  3. Check test pass rate: 98% (44/45) - 1 flaky test
  4. Decision: Pass rate below 100%, require approval

Output: "Auto-Baseline Blocked: main-abc123d

Trigger: Merge to main (commit abc123d) Test pass rate: 98% (44/45) Reason: 1 test flaky, below 100% threshold

Action required:

  • Stabilize flaky test: test/api/timeout.test.ts
  • Re-run baseline after fix

Or override with: aiwg baseline create main-abc123d
--force
--approved-by tech-lead
--reason 'Known flaky test acceptable'"


### Scheduled Weekly Baseline

Event: Weekly schedule (Sunday 3 AM)

Skill activates:

  1. Detect schedule trigger
  2. Run full test suite
  3. Capture baseline: weekly-2026-01-28
  4. Compare to last week
  5. Generate drift report

Output: "Weekly Baseline Created: weekly-2026-01-28

Previous: weekly-2026-01-21 Drift: Minimal (0.2% performance improvement) New tests: 3 added Test health: 100% passing

Auto-activated: No (scheduled baselines require approval)

Drift report: .aiwg/testing/weekly-drift-2026-01-28.md

Review and approve: aiwg baseline review weekly-2026-01-28"


## Integration

This skill uses:
- `regression-baseline`: Core baseline creation and comparison
- `regression-bisect`: Find commits that changed baseline
- `regression-metrics`: Track baseline drift over time
- `test-coverage`: Ensure baseline captures critical paths
- `issue-auto-sync`: Create issues for baseline approval
- `project-awareness`: Detect CI/CD configuration

## Agent Orchestration

```yaml
agents:
  baseline_creation:
    agent: test-engineer
    focus: Capture and validate baseline

  diff_analysis:
    agent: test-architect
    focus: Analyze baseline changes

  approval:
    agent: tech-lead
    focus: Review and approve baselines

  automation:
    agent: devops-engineer
    focus: CI/CD integration

Configuration

Auto-Baseline Rules

# aiwg.yml

regression:
  auto_baseline:
    enabled: true

    triggers:
      semantic_version_tag:
        enabled: true
        pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$"
        auto_activate: false
        approval_required: true

      merge_to_main:
        enabled: true
        branches: [main, master]
        auto_activate: false
        test_pass_threshold: 100%

      deployment_success:
        enabled: true
        environments: [production, staging]
        auto_activate: true

      scheduled:
        enabled: true
        nightly:
          cron: "0 2 * * *"
          auto_activate: false
        weekly:
          cron: "0 3 * * 0"
          auto_activate: false

    validation:
      minimum_test_pass_rate: 95%
      stability_runs: 3
      require_human_approval_for:
        - production_baselines
        - breaking_changes
        - critical_severity_changes

    storage:
      primary: local
      backup: s3
      retention:
        active: 1
        archive: 10
        compress: true

    notifications:
      on_baseline_created: [slack, email]
      on_approval_needed: [slack, issue_comment]
      on_baseline_activated: [slack]

Retention Policy

retention:
  by_tag:
    release_tags: keep_forever
    development_tags: 30_days
    untagged: 7_days

  by_environment:
    production: 365_days
    staging: 90_days
    development: 30_days

  by_type:
    functional: 90_days
    visual: 60_days
    performance: 120_days
    api: 90_days

  compression:
    after_days: 7
    algorithm: gzip
    keep_uncompressed: false

Notification Templates

Baseline Created

📸 Auto-Baseline Created

Baseline: {baseline_id}
Trigger: {trigger_type}
Environment: {environment}
Status: {status}

Changes from previous:
• {change_count} differences detected
• Severity: {max_severity}

Diff report: {diff_report_url}

{approval_required ? "Approval required from: {approvers}" : "Auto-activated"}

Approval Needed

⚠️ Baseline Approval Required

Baseline: {baseline_id}
Reason: {approval_reason}

Changes:
{change_summary}

Review:
{diff_report_url}

Approve:
aiwg baseline approve {baseline_id} --reason "..."

Reject:
aiwg baseline reject {baseline_id} --reason "..."

Expires: {approval_timeout}

Baseline Activated

✅ Baseline Activated

Baseline: {baseline_id}
Activated at: {activation_timestamp}
Replaces: {previous_baseline_id}

This baseline is now active for regression testing.

All future test runs will be compared against this baseline.

Output Locations

  • Baselines:
    .aiwg/testing/baselines/{type}/
  • Manifest:
    .aiwg/testing/baselines/manifest.yaml
  • Diff reports:
    .aiwg/testing/baseline-diffs/
  • Approval requests:
    .aiwg/testing/baseline-approvals/
  • Archive:
    .aiwg/testing/baselines/archive/
  • Locks:
    .aiwg/testing/baselines/.lock

References

  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-baseline/SKILL.md - Manual baseline creation
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-bisect/SKILL.md - Find breaking commits
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-metrics/SKILL.md - Baseline drift tracking
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/regression-check.md - Regression testing command
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md - Test Engineer agent
  • @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Executable feedback rules