Aiwg regression-auto-baseline
Automatically manage regression test baseline lifecycle triggered by releases, deployments, and quality gates
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/sdlc-complete/skills/regression-auto-baseline" ~/.claude/skills/jmagly-aiwg-regression-auto-baseline-1d2263 && rm -rf "$T"
agentic/code/frameworks/sdlc-complete/skills/regression-auto-baseline/SKILL.mdregression-auto-baseline
Automatically manage regression test baseline lifecycle with triggers for releases, deployments, and quality gates.
Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "auto-update baselines" → automatic baseline management
- "baseline on release" → release-triggered baseline update
Purpose
This skill automates regression baseline management by:
- Triggering baseline updates on semantic version tags
- Updating baselines after successful merges to main/master
- Creating baselines on deployment success
- Scheduling periodic baseline refreshes
- Managing baseline retention and archival
- Validating baseline quality before activation
- Integrating with CI/CD for seamless automation
Behavior
When triggered or activated automatically, this skill:
-
Detects trigger conditions:
- Semantic version tag created (v*..)
- Merge to protected branch completed
- Deployment succeeded to environment
- Quality gate passed
- Manual approval received
- Scheduled time reached
-
Validates pre-conditions:
- Tests are passing (minimum threshold met)
- No critical issues open
- Quality gates passed
- Human approval obtained (if required)
- No concurrent baseline updates in progress
-
Captures new baseline:
- Run full regression test suite
- Capture all artifact types (functional, visual, performance, API)
- Verify baseline stability (run multiple times if configured)
- Generate checksums for integrity
- Tag with git commit, release version, timestamp
-
Validates baseline quality:
- Compare to previous baseline
- Detect unexpected changes
- Verify all expected outputs present
- Check for baseline drift
- Flag suspicious differences for review
-
Activates baseline:
- Archive previous baseline version
- Set new baseline as active
- Update baseline manifest
- Notify relevant teams
- Create baseline comparison report
-
Manages lifecycle:
- Archive old baselines per retention policy
- Compress large baselines
- Upload to cloud storage if configured
- Prune outdated baselines
- Maintain audit trail
Trigger Types
Semantic Version Tag
trigger: type: git_tag pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$" description: "Baseline on semantic version tags" behavior: on_tag_create: - validate_tag_format - check_if_tests_passing - create_baseline: name: "release-{tag}" auto_activate: true - tag_baseline_with_git_tag example: tag: v2.1.0 baseline_id: release-v2.1.0 status: auto-created
Merge to Main
trigger: type: git_merge branches: - main - master description: "Baseline after merge to protected branch" behavior: on_merge_complete: - wait_for_ci_success - check_test_pass_rate: minimum: 95% - create_baseline: name: "main-{commit_short}" auto_activate: false # Require approval - request_approval: approvers: [tech-lead, test-lead] example: merge_commit: abc123def baseline_id: main-abc123d approval_required: true
Deployment Success
trigger: type: deployment environments: - production - staging description: "Baseline after successful deployment" behavior: on_deployment_success: - wait_for_smoke_tests - verify_health_checks - create_baseline: name: "{environment}-{deploy_id}" auto_activate: true - link_to_deployment: id: deploy_id timestamp: deployment_timestamp example: environment: production deploy_id: prod-2026-01-28-01 baseline_id: production-prod-2026-01-28-01
Manual Approval
trigger: type: manual description: "Baseline after explicit human approval" behavior: on_approval_received: - validate_approver_permissions - capture_approval_metadata - create_baseline: name: "approved-{timestamp}" auto_activate: true - record_approver: name: approver_name reason: approval_reason example: approver: tech-lead reason: "Post-release validation complete" baseline_id: approved-2026-01-28T15-30
Scheduled Updates
trigger: type: schedule description: "Baseline on fixed schedule" schedules: nightly: cron: "0 2 * * *" description: "Nightly baseline capture" auto_activate: false weekly: cron: "0 3 * * 0" description: "Weekly baseline for long-term tracking" auto_activate: false monthly: cron: "0 4 1 * *" description: "Monthly baseline archive" auto_activate: false behavior: on_schedule_trigger: - check_if_tests_healthy - create_baseline: name: "{schedule}-{date}" auto_activate: false - generate_drift_report
Baseline Storage Strategies
Local File Storage
storage: type: local location: .aiwg/testing/baselines/ structure: - functional/ - visual/ - performance/ - api/ retention: active: 1 # Keep 1 active baseline per type archive: 10 # Keep 10 archived versions compress: true # Compress archives with gzip cleanup: prune_after_days: 90 keep_tagged: true # Never prune tagged baselines
Git LFS
storage: type: git_lfs description: "Use Git LFS for large baselines (screenshots, large JSON)" configuration: track_patterns: - "*.png" - "*.jpg" - "baselines/**/*.json" max_file_size: 10MB behavior: on_baseline_create: - compress_if_needed - add_to_git_lfs - commit_with_metadata
Cloud Storage (S3/GCS)
storage: type: cloud provider: s3 # or gcs, azure_blob configuration: bucket: my-project-baselines path_prefix: baselines/{environment}/ encryption: AES256 behavior: on_baseline_create: - upload_to_cloud - generate_signed_url: expiry: 7 days - store_url_in_manifest - keep_local_copy: false # Optional: delete after upload retention: lifecycle_policy: - after_30_days: transition_to_infrequent_access - after_90_days: transition_to_glacier - after_365_days: delete
Artifact Registry
storage: type: artifact_registry description: "Integrate with artifact registries (npm, Maven, Docker)" configuration: registry_url: https://registry.example.com repository: baseline-artifacts versioning: semver # Follow package versioning behavior: on_baseline_create: - package_baseline: format: tarball name: "{project}-baseline" version: "{tag_version}" - publish_to_registry - tag_as_latest: true
Baseline Validation
Pre-Activation Checks
validation: before_activation: - test_pass_rate: minimum: 95% description: "At least 95% of tests must pass" - quality_gates: all_must_pass: true description: "All quality gates must be green" - stability_check: runs: 3 consistency: 100% description: "Baseline must be 100% consistent across 3 runs" - critical_issues: max_open: 0 description: "No critical issues can be open" - human_approval: required_for: - production_baselines - breaking_changes approvers: [tech-lead, qa-lead]
Baseline Diff Detection
validation: diff_detection: compare_to_previous: true flag_changes: - new_failures - performance_degradation - unexpected_schema_changes - visual_differences severity_classification: critical: - new_test_failures - security_regressions high: - performance_degradation_gt_20_percent - breaking_api_changes medium: - new_warnings - minor_visual_differences low: - timestamp_changes - non-functional_metadata action_on_critical: - block_activation - notify_stakeholders - require_manual_review
Environment-Specific Baselines
environments: development: auto_baseline: true trigger: merge_to_develop retention: 5 approval_required: false staging: auto_baseline: true trigger: deployment_success retention: 10 approval_required: true approvers: [qa-lead] production: auto_baseline: true trigger: semantic_version_tag retention: 20 approval_required: true approvers: [tech-lead, release-manager] validation_level: strict
Selective Baseline Updates
selective_updates: description: "Update only specific baseline types" update_policies: functional: trigger: every_merge auto_activate: true visual: trigger: design_system_update auto_activate: false require_approval: true performance: trigger: weekly auto_activate: false threshold_change: 10% # Only update if >10% improvement api: trigger: version_tag auto_activate: true breaking_changes_require_approval: true
Integration with CI/CD
GitHub Actions Example
# .github/workflows/auto-baseline.yml name: Auto-Baseline Management on: push: tags: - 'v*.*.*' workflow_dispatch: inputs: baseline_type: description: 'Baseline type to update' required: true type: choice options: - all - functional - visual - performance - api jobs: auto_baseline: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run tests and capture baseline run: | npm test -- --coverage npm run test:integration npm run test:e2e - name: Create baseline run: | npx aiwg baseline create auto-${GITHUB_REF_NAME} \ --type all \ --tag ${GITHUB_REF_NAME} \ --auto-activate - name: Upload to S3 run: | aws s3 sync .aiwg/testing/baselines/ \ s3://my-baselines/baselines/${GITHUB_REF_NAME}/ - name: Update manifest run: | git add .aiwg/testing/baselines/manifest.yaml git commit -m "chore: update baseline for ${GITHUB_REF_NAME}" git push
GitLab CI Example
# .gitlab-ci.yml baseline_on_release: stage: deploy only: - tags script: - npm test - npx aiwg baseline create release-${CI_COMMIT_TAG} - npx aiwg baseline upload --storage s3 artifacts: paths: - .aiwg/testing/baselines/ expire_in: 30 days
Conflict Resolution
Concurrent Baseline Updates
concurrency_control: mode: lock_file lock_file: .aiwg/testing/baselines/.lock on_conflict: - check_lock_file - if_locked: wait_timeout: 300 # 5 minutes action_on_timeout: fail_with_message distributed_lock: enabled: true backend: redis key: "baseline_update_lock" ttl: 600 # 10 minutes
Merge Conflicts
merge_strategy: on_baseline_conflict: - prefer: newer_timestamp - fallback: manual_resolution resolution_workflow: - detect_conflict - preserve_both_baselines - notify_team_for_manual_merge - provide_diff_comparison
Baseline Rollback
rollback: enabled: true preserve_rollback_history: true rollback_triggers: - manual_command - quality_regression_detected - deployment_failure rollback_process: - identify_previous_baseline - validate_previous_baseline_integrity - deactivate_current_baseline - activate_previous_baseline - update_manifest - notify_stakeholders - record_rollback_reason example: current: baseline-v2.1.0 rollback_to: baseline-v2.0.5 reason: "Critical regression in v2.1.0" triggered_by: tech-lead
Baseline Diff Reports
# Baseline Diff Report **Date**: 2026-01-28 **Previous Baseline**: functional-auth-v2 **New Baseline**: functional-auth-v3 **Trigger**: semantic_version_tag v2.1.0 ## Executive Summary **Changes Detected**: 12 differences **Severity**: Medium **Recommendation**: Approve with review of API changes | Category | Previous | New | Change | |----------|----------|-----|--------| | Tests Passing | 45/45 | 47/47 | +2 new tests | | Functional Outputs | 100% match | 98% match | 2 API changes | | Performance | Baseline | +5% faster | Improvement | | Visual | Baseline | 100% match | No change | ## Detailed Changes ### Functional Changes #### Change 1: New field in login response **Type**: API Schema Change **Severity**: Medium - Breaking for strict clients **Previous** (baseline-v2): ```json { "token": "jwt.header.payload", "user": { "id": "uuid", "email": "user@example.com" } }
New (baseline-v3):
{ "token": "jwt.header.payload", "user": { "id": "uuid", "email": "user@example.com", "display_name": "John Doe" } }
Impact: Clients using strict schema validation may fail Recommendation: Document as breaking change in CHANGELOG
Change 2: Error message improved
Type: Error Message Update Severity: Low - Informational
Previous:
Error: Invalid input
New:
ValidationError: Email format is invalid. Expected format: user@example.com
Impact: Better user experience, non-breaking Recommendation: Approve
Performance Changes
| Metric | Previous | New | Change | Status |
|---|---|---|---|---|
| Login p50 | 45ms | 42ms | -7% | ✅ Improved |
| Login p95 | 120ms | 115ms | -4% | ✅ Improved |
| Registration p50 | 85ms | 80ms | -6% | ✅ Improved |
New Tests Added
-
test/auth/display-name-validation.test.ts- Tests display name field validation
- Covers new feature
-
test/auth/error-message-quality.test.ts- Tests improved error messaging
- Regression test for clarity
Recommendations
Immediate Actions
- Review API schema change for breaking impact
- Update API documentation with new field
- Notify API consumers via changelog
- Approve baseline if change is intentional
Follow-up
- Add versioning to API if not present
- Consider deprecation period for schema changes
- Update client SDKs with new field
Approval
To approve this baseline:
aiwg baseline approve functional-auth-v3 \ --reviewed-by "tech-lead" \ --reason "API schema change documented and approved"
To reject and rollback:
aiwg baseline rollback functional-auth-v3 \ --to functional-auth-v2 \ --reason "Breaking change not acceptable"
## Usage Examples ### Auto-Baseline on Release Tag
Event: Git tag v2.1.0 created
Skill activates automatically:
- Detect tag creation event
- Wait for CI to complete
- Check test pass rate: 100% (45/45)
- Run baseline capture
- Validate baseline stability (3 runs)
- Compare to previous baseline
- Generate diff report
Output: "Auto-Baseline Created: release-v2.1.0
Trigger: Git tag v2.1.0 Previous: release-v2.0.5 Changes: 3 API fields added, 2 tests added Quality: 100% tests passing Stability: 100% consistent (3/3 runs)
Status: Pending approval Approvers: tech-lead, release-manager
Diff report: .aiwg/testing/baseline-diff-v2.1.0.md
Approve with: aiwg baseline approve release-v2.1.0"
### Auto-Baseline on Merge to Main
Event: PR #456 merged to main
Skill activates:
- Detect merge event
- Wait for CI success
- Check test pass rate: 98% (44/45) - 1 flaky test
- Decision: Pass rate below 100%, require approval
Output: "Auto-Baseline Blocked: main-abc123d
Trigger: Merge to main (commit abc123d) Test pass rate: 98% (44/45) Reason: 1 test flaky, below 100% threshold
Action required:
- Stabilize flaky test: test/api/timeout.test.ts
- Re-run baseline after fix
Or override with:
aiwg baseline create main-abc123d
--force
--approved-by tech-lead
--reason 'Known flaky test acceptable'"
### Scheduled Weekly Baseline
Event: Weekly schedule (Sunday 3 AM)
Skill activates:
- Detect schedule trigger
- Run full test suite
- Capture baseline: weekly-2026-01-28
- Compare to last week
- Generate drift report
Output: "Weekly Baseline Created: weekly-2026-01-28
Previous: weekly-2026-01-21 Drift: Minimal (0.2% performance improvement) New tests: 3 added Test health: 100% passing
Auto-activated: No (scheduled baselines require approval)
Drift report: .aiwg/testing/weekly-drift-2026-01-28.md
Review and approve: aiwg baseline review weekly-2026-01-28"
## Integration This skill uses: - `regression-baseline`: Core baseline creation and comparison - `regression-bisect`: Find commits that changed baseline - `regression-metrics`: Track baseline drift over time - `test-coverage`: Ensure baseline captures critical paths - `issue-auto-sync`: Create issues for baseline approval - `project-awareness`: Detect CI/CD configuration ## Agent Orchestration ```yaml agents: baseline_creation: agent: test-engineer focus: Capture and validate baseline diff_analysis: agent: test-architect focus: Analyze baseline changes approval: agent: tech-lead focus: Review and approve baselines automation: agent: devops-engineer focus: CI/CD integration
Configuration
Auto-Baseline Rules
# aiwg.yml regression: auto_baseline: enabled: true triggers: semantic_version_tag: enabled: true pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$" auto_activate: false approval_required: true merge_to_main: enabled: true branches: [main, master] auto_activate: false test_pass_threshold: 100% deployment_success: enabled: true environments: [production, staging] auto_activate: true scheduled: enabled: true nightly: cron: "0 2 * * *" auto_activate: false weekly: cron: "0 3 * * 0" auto_activate: false validation: minimum_test_pass_rate: 95% stability_runs: 3 require_human_approval_for: - production_baselines - breaking_changes - critical_severity_changes storage: primary: local backup: s3 retention: active: 1 archive: 10 compress: true notifications: on_baseline_created: [slack, email] on_approval_needed: [slack, issue_comment] on_baseline_activated: [slack]
Retention Policy
retention: by_tag: release_tags: keep_forever development_tags: 30_days untagged: 7_days by_environment: production: 365_days staging: 90_days development: 30_days by_type: functional: 90_days visual: 60_days performance: 120_days api: 90_days compression: after_days: 7 algorithm: gzip keep_uncompressed: false
Notification Templates
Baseline Created
📸 Auto-Baseline Created Baseline: {baseline_id} Trigger: {trigger_type} Environment: {environment} Status: {status} Changes from previous: • {change_count} differences detected • Severity: {max_severity} Diff report: {diff_report_url} {approval_required ? "Approval required from: {approvers}" : "Auto-activated"}
Approval Needed
⚠️ Baseline Approval Required Baseline: {baseline_id} Reason: {approval_reason} Changes: {change_summary} Review: {diff_report_url} Approve: aiwg baseline approve {baseline_id} --reason "..." Reject: aiwg baseline reject {baseline_id} --reason "..." Expires: {approval_timeout}
Baseline Activated
✅ Baseline Activated Baseline: {baseline_id} Activated at: {activation_timestamp} Replaces: {previous_baseline_id} This baseline is now active for regression testing. All future test runs will be compared against this baseline.
Output Locations
- Baselines:
.aiwg/testing/baselines/{type}/ - Manifest:
.aiwg/testing/baselines/manifest.yaml - Diff reports:
.aiwg/testing/baseline-diffs/ - Approval requests:
.aiwg/testing/baseline-approvals/ - Archive:
.aiwg/testing/baselines/archive/ - Locks:
.aiwg/testing/baselines/.lock
References
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-baseline/SKILL.md - Manual baseline creation
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-bisect/SKILL.md - Find breaking commits
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-metrics/SKILL.md - Baseline drift tracking
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/regression-check.md - Regression testing command
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md - Test Engineer agent
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Executable feedback rules