Aiwg regression-metrics
Track and analyze regression statistics, trends, hotspots, and health indicators across test suites
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/regression-metrics" ~/.claude/skills/jmagly-aiwg-regression-metrics && rm -rf "$T"
.agents/skills/regression-metrics/SKILL.mdregression-metrics
Track and analyze regression statistics, trends, and health indicators.
Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "regression KPIs" → regression metric dashboard
- "flakiness score" → test stability metrics
Purpose
This skill provides regression analytics by:
- Tracking regression occurrence rates
- Measuring time-to-detection and time-to-fix
- Analyzing regression patterns and hotspots
- Identifying high-risk areas
- Trending regression metrics over time
- Generating regression health dashboards
Behavior
When triggered, this skill:
-
Collects regression data:
- Parse regression test results
- Load historical regression records
- Gather bisect findings
- Import baseline comparisons
- Aggregate issue tracker data
-
Calculates key metrics:
- Regression rate (per sprint/release)
- Mean time to detect (MTTD)
- Mean time to fix (MTTF)
- Regression recurrence rate
- Escape rate (production regressions)
-
Identifies patterns:
- Common root causes
- High-regression components
- Time-of-day/sprint patterns
- Correlation with code changes
-
Analyzes trends:
- Regression rate over time
- Detection speed improvements
- Fix time trends
- Quality trajectory
-
Generates visualizations:
- Regression heatmaps
- Trend charts
- Burn-down tracking
- Risk matrices
-
Produces actionable insights:
- Prioritize high-risk areas
- Recommend test improvements
- Suggest process changes
- Set quality goals
Key Metrics
Regression Rate
regression_rate: description: Number of regressions per time period formula: regressions_detected / time_period units: regressions per sprint/week/release targets: excellent: "< 2 per sprint" good: "2-5 per sprint" acceptable: "5-10 per sprint" poor: "> 10 per sprint" calculation: count: new regressions introduced period: sprint, release, or month exclude: known issues, flaky tests
Mean Time to Detect (MTTD)
mttd: description: Average time from regression introduction to detection formula: sum(detection_time) / regression_count units: hours or days targets: excellent: "< 4 hours" good: "< 24 hours" acceptable: "< 7 days" poor: "> 7 days" calculation: detection_time: commit_time_to_failure_report includes: automated and manual detection
Mean Time to Fix (MTTF)
mttf: description: Average time from detection to fix deployment formula: sum(fix_time) / regression_count units: hours or days targets: critical: "< 4 hours" high: "< 24 hours" medium: "< 7 days" low: "< 30 days" calculation: fix_time: detection_to_fix_deployed severity_weighted: true
Escape Rate
escape_rate: description: Percentage of regressions reaching production formula: (production_regressions / total_regressions) * 100 units: percentage targets: excellent: "< 5%" good: "5-10%" acceptable: "10-20%" poor: "> 20%" calculation: production_regressions: found by users/monitoring total_regressions: all detected including pre-release
Recurrence Rate
recurrence_rate: description: Percentage of regressions that recur after fix formula: (recurring_regressions / total_fixed) * 100 units: percentage targets: excellent: "< 5%" good: "5-10%" acceptable: "10-15%" poor: "> 15%" indicates: - insufficient test coverage - lack of regression tests - poor fix quality
Metrics Dashboard
# Regression Metrics Dashboard **Period**: Last 30 Days (2025-12-29 to 2026-01-28) **Project**: User Service ## Executive Summary | Metric | Current | Target | Status | Trend | |--------|---------|--------|--------|-------| | Regression Rate | 4.2/sprint | < 5 | ✅ Good | ↓ Improving | | MTTD | 8.5 hours | < 24h | ✅ Good | ↓ Improving | | MTTF | 18.7 hours | < 24h | ⚠️ Close | → Stable | | Escape Rate | 12% | < 10% | ⚠️ Above Target | ↑ Worsening | | Recurrence Rate | 7% | < 10% | ✅ Good | → Stable | **Overall Health**: ⚠️ Good with Concerns **Priority Focus**: Reduce production escapes ## Regression Trend (Last 6 Sprints)
Sprint 8: ██████████ 10 regressions Sprint 9: ████████ 8 regressions Sprint 10: ██████ 6 regressions Sprint 11: █████ 5 regressions Sprint 12: ████ 4 regressions Sprint 13: ████ 4 regressions ↓ -60% improvement since Sprint 8
**Analysis**: Significant improvement trend. Stabilizing around 4-5 per sprint. ## Detection Speed Trend
Week 1: 24h ████████████████████████ Week 2: 18h ██████████████████ Week 3: 12h ████████████ Week 4: 9h █████████ Week 5: 8h ████████ ↓ -67% improvement in 5 weeks
**Analysis**: Automation improvements paying off. Most regressions now caught within hours. ## Component Heatmap Regressions by component (last 30 days): | Component | Regressions | Change | Risk Level | |-----------|-------------|--------|------------| | src/auth/ | 🔴🔴🔴 3 | +1 | High | | src/api/ | 🟡🟡 2 | 0 | Medium | | src/db/ | 🟡🟡 2 | -1 | Medium | | src/user/ | 🟡 1 | -2 | Low | | src/utils/ | 🟢 0 | 0 | Low | **Hotspot Alert**: `src/auth/` showing increased regression rate ## Root Cause Analysis | Root Cause | Count | % | Trend | |------------|-------|---|-------| | Missing test coverage | 5 | 42% | → | | Integration not tested | 3 | 25% | ↑ | | Edge case not considered | 2 | 17% | ↓ | | Flaky test masking issue | 1 | 8% | → | | Breaking dependency change | 1 | 8% | → | **Insight**: 67% of regressions preventable with better coverage/integration testing ## Severity Distribution | Severity | Count | MTTF | Status | |----------|-------|------|--------| | Critical | 1 | 3.2h | ✅ Fast response | | High | 4 | 12.5h | ✅ Within target | | Medium | 6 | 28.4h | ⚠️ Above target | | Low | 1 | 72h | ✅ Acceptable | ## Time-to-Detection Analysis
Detection Method: Automated Tests: 75% (avg 4.2h detection) Manual Testing: 17% (avg 32h detection) Production: 8% (avg 96h detection)
**Insight**: Automation catching most issues early. Need to reduce production escapes. ## Time-to-Fix Analysis
Fix Duration by Severity: Critical: ▓▓▓ 3.2h (target: 4h) ✅ High: ▓▓▓▓▓▓ 12.5h (target: 24h) ✅ Medium: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 28.4h (target: 24h) ⚠️ Low: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 72h ✅
**Issue**: Medium-severity regressions taking slightly longer than target ## Regression Recurrence | Original Issue | Recurred | Reason | |----------------|----------|--------| | AUTH-101 | ✅ Yes | Missing regression test | | API-205 | ❌ No | Regression test added | | DB-089 | ❌ No | Regression test added | | USER-145 | ❌ No | Regression test added | **Recurrence Rate**: 25% (1 of 4) - One regression lacked test ## Production Escapes Regressions that reached production: | Issue | Severity | Detection | Impact | MTTD | |-------|----------|-----------|--------|------| | AUTH-203 | High | User report | 500 users | 12h | **Analysis**: 1 escape this period. Auth module regression bypassed staging tests. ## Recommendations ### High Priority 1. **Add integration tests for auth flows** - Reason: 3 regressions in auth, 1 production escape - Impact: Reduce auth regressions by ~60% - Effort: 2 days 2. **Improve staging test coverage** - Reason: Production escape indicates gap - Impact: Reduce escape rate to <5% - Effort: 1 week 3. **Reduce medium-severity MTTF** - Reason: 28.4h vs 24h target - Impact: Faster user impact resolution - Effort: Process improvement ### Medium Priority 4. **Add regression tests for all fixes** - Reason: 25% recurrence rate on fixes without tests - Impact: Zero recurrence for tested fixes - Effort: Ongoing discipline 5. **Monitor auth module closely** - Reason: Highest regression count - Impact: Early detection of issues - Effort: Weekly review ## Historical Comparison | Period | Reg Rate | MTTD | MTTF | Escape % | |--------|----------|------|------|----------| | 3 months ago | 8.2 | 36h | 48h | 18% | | 2 months ago | 6.5 | 24h | 36h | 15% | | 1 month ago | 5.1 | 12h | 24h | 13% | | Current | 4.2 | 8.5h | 18.7h | 12% | **Trend**: All metrics improving. Regression rate down 49%, detection 76% faster. ## Goals for Next Period | Metric | Current | Goal | Strategy | |--------|---------|------|----------| | Regression Rate | 4.2 | < 4 | Improve auth testing | | MTTD | 8.5h | < 8h | Add more automation | | MTTF | 18.7h | < 18h | Faster review process | | Escape Rate | 12% | < 10% | Better staging tests | ## Data Sources - Regression tests: `.aiwg/testing/regression-results/` - Bisect reports: `.aiwg/testing/regression-bisect-*/` - Baseline comparisons: `.aiwg/testing/baseline-comparisons/` - Issue tracker: GitHub Issues (label: regression) - CI/CD logs: GitHub Actions
Usage Examples
View Current Metrics
User: "Show regression metrics" Skill executes: 1. Aggregate data from last 30 days 2. Calculate key metrics 3. Generate dashboard 4. Identify trends Output: "Regression Metrics (Last 30 Days) Overall Health: ⚠️ Good with Concerns Key Metrics: - Regression Rate: 4.2/sprint ✅ (target < 5) - MTTD: 8.5 hours ✅ (target < 24h) - MTTF: 18.7 hours ⚠️ (target < 24h) - Escape Rate: 12% ⚠️ (target < 10%) Hotspots: 🔴 src/auth/ - 3 regressions this period 🟡 src/api/ - 2 regressions Top Recommendation: Add integration tests for auth Full dashboard: .aiwg/testing/regression-metrics-dashboard.md"
Regression Trends
User: "Regression trends over time" Skill analyzes: - Last 6 sprints of data - Calculate trend direction - Identify patterns Output: "Regression Trends (Last 6 Sprints) Sprint 8: 10 regressions Sprint 9: 8 regressions (-20%) Sprint 10: 6 regressions (-25%) Sprint 11: 5 regressions (-17%) Sprint 12: 4 regressions (-20%) Sprint 13: 4 regressions (stable) Overall: ↓ -60% improvement Status: Stabilizing around 4-5/sprint MTTD: 36h → 8.5h (-76%) MTTF: 48h → 18.7h (-61%) Conclusion: Strong improvement trend. Approaching best-in-class levels."
Component Heatmap
User: "Which components have most regressions?" Skill generates: "Component Regression Heatmap (Last 30 Days) High Risk: 🔴 src/auth/ - 3 regressions (+1 from last period) Most common: Missing integration tests Medium Risk: 🟡 src/api/ - 2 regressions (no change) 🟡 src/db/ - 2 regressions (-1 from last period) Low Risk: 🟢 src/user/ - 1 regression (-2 from last period) 🟢 src/utils/ - 0 regressions Recommendation: Focus testing efforts on auth module"
Integration
This skill uses:
: Import bisect findingsregression-bisect
: Analyze baseline drift patternsregression-baseline
: Correlate coverage with regression ratestest-coverage
: Detect sprint/release boundariesproject-awareness
Agent Orchestration
agents: analysis: agent: metrics-analyst focus: Statistical analysis and trends visualization: agent: technical-writer focus: Dashboard and report generation recommendations: agent: test-architect focus: Process improvement suggestions
Configuration
Metric Collection
collection_config: data_sources: - regression_test_results - bisect_reports - baseline_comparisons - issue_tracker - ci_cd_logs update_frequency: daily retention: 90 days aggregation: sprint, week, month
Thresholds
thresholds: regression_rate: excellent: 2 good: 5 acceptable: 10 mttd_hours: excellent: 4 good: 24 acceptable: 168 # 7 days mttf_hours: critical: 4 high: 24 medium: 168 # 7 days escape_rate_percent: excellent: 5 good: 10 acceptable: 20
Alert Rules
alerts: regression_spike: condition: regression_rate > 10 severity: high notification: team-channel escape_rate_high: condition: escape_rate > 20% severity: critical notification: leadership mttd_degrading: condition: mttd_trend_increase > 50% severity: medium notification: test-team
Output Locations
- Dashboards:
.aiwg/testing/regression-metrics-dashboard.md - Trends:
.aiwg/testing/regression-trends.json - Heatmaps:
.aiwg/testing/regression-heatmap.json - Historical data:
.aiwg/testing/metrics-history/
References
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/metrics/regression-metrics-schema.yaml
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/metrics-analyst.md
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/metrics-dashboard.md