Aiwg regression-performance

Detect performance regressions by comparing benchmarks across versions with latency, throughput, and statistical significance analysis

install

source · Clone the upstream repo

git clone https://github.com/jmagly/aiwg

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/regression-performance" ~/.claude/skills/jmagly-aiwg-regression-performance && rm -rf "$T"

manifest: .agents/skills/regression-performance/SKILL.md

source content

regression-performance

Detect performance regressions by comparing benchmarks across versions, analyzing latency/throughput degradation, and providing statistical significance testing.

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

"latency regression" → performance benchmark comparison
"p99" / "p95" → percentile-based performance metrics
"benchmark diff" → performance baseline comparison

Purpose

This skill detects performance regressions across software versions by:

Comparing latency metrics (p50, p95, p99) between baseline and current versions
Detecting throughput regressions (requests/sec, transactions/sec)
Identifying memory regressions (heap growth, memory leaks)
Analyzing resource utilization (CPU, disk I/O, network)
Running benchmark comparisons with statistical significance testing
Generating performance regression reports with visualizations

Behavior

When triggered, this skill:

Identifies baseline version:
- Detect last known good version from git tags
- Load baseline benchmark results
- Extract performance metrics from monitoring
Runs performance benchmarks:
- Execute load tests using k6, Artillery, or wrk
- Capture latency distributions (p50, p95, p99)
- Measure throughput (req/s, TPS)
- Profile memory usage and heap growth
- Monitor CPU and I/O utilization
Performs statistical comparison:
- Calculate delta and percentage change
- Apply statistical significance tests (t-test, Mann-Whitney U)
- Determine if degradation exceeds threshold
- Account for variance and noise
Detects regression patterns:
- Latency spikes at specific percentiles
- Throughput capacity reduction
- Memory leak indicators (growing heap)
- CPU saturation points
- I/O bottlenecks
Generates regression report:
- Performance comparison tables
- Percentile distribution graphs
- Time-series trend analysis
- Root cause indicators
- Recommendations
Logs regression findings:
- Create regression register entry
- Tag commits with performance impact
- Alert on threshold violations

Performance Metrics Model

┌─────────────────────┐
│   BASELINE v2.3.0   │
├─────────────────────┤
│ p50:  45ms          │
│ p95: 120ms          │
│ p99: 180ms          │
│ RPS: 2500           │
│ Mem: 256MB          │
└─────────────────────┘
         │
         ▼ Compare
┌─────────────────────┐
│   CURRENT v2.4.0    │
├─────────────────────┤
│ p50:  52ms (+15%)   │  ⚠️ REGRESSION
│ p95: 145ms (+21%)   │  ⚠️ REGRESSION
│ p99: 220ms (+22%)   │  ⚠️ REGRESSION
│ RPS: 2100 (-16%)    │  ⚠️ REGRESSION
│ Mem: 312MB (+22%)   │  ⚠️ REGRESSION
└─────────────────────┘
         │
         ▼
┌─────────────────────┐
│ REGRESSION REPORT   │
│                     │
│ Type: Latency       │
│ Severity: HIGH      │
│ Confidence: 99.5%   │
│ Root Cause: TBD     │
└─────────────────────┘

Metric Categories

Latency Metrics

Metric	Description	Threshold	Tool
p50 (median)	50th percentile latency	+10%	k6, Artillery, wrk
p95	95th percentile latency	+15%	k6, Artillery, wrk
p99	99th percentile latency	+20%	k6, Artillery, wrk
max	Maximum observed latency	+30%	k6, Artillery, wrk

Throughput Metrics

Metric	Description	Threshold	Tool
Requests/sec	HTTP requests per second	-10%	k6, wrk, ab
Transactions/sec	Business transactions per second	-10%	Custom
Bytes/sec	Network throughput	-15%	iperf3, iftop
Queries/sec	Database query throughput	-10%	pgbench, sysbench

Memory Metrics

Metric	Description	Threshold	Tool
Heap size	JavaScript heap usage	+20%	Node.js heap snapshot
RSS	Resident set size	+20%	ps, top
Memory growth rate	MB/hour increase	>10 MB/hour	Continuous profiling
GC pressure	Garbage collection frequency	+30%	Node.js --trace-gc

Resource Metrics

Metric	Description	Threshold	Tool
CPU utilization	Average CPU usage	+20%	mpstat, top
Disk I/O wait	I/O wait percentage	+25%	iostat
Network bandwidth	Network utilization	+15%	iftop, nethogs
File descriptors	Open file handles	+30%	lsof

Benchmark Tools Integration

k6 Load Testing

// benchmark.k6.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up
    { duration: '5m', target: 100 },   // Steady state
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    'http_req_duration': ['p(50)<100', 'p(95)<200', 'p(99)<300'],
    'http_req_failed': ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/endpoint');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}

Running comparison:

# Baseline
k6 run --out json=baseline-results.json benchmark.k6.js

# Current version
k6 run --out json=current-results.json benchmark.k6.js

# Compare
./compare-k6-results.sh baseline-results.json current-results.json

Artillery Load Testing

# artillery-config.yml
config:
  target: 'https://api.example.com'
  phases:
    - duration: 120
      arrivalRate: 10
      rampTo: 50
    - duration: 300
      arrivalRate: 50
    - duration: 120
      arrivalRate: 50
      rampTo: 0
  plugins:
    metrics-by-endpoint:
      stripQueryString: true

scenarios:
  - name: "API Performance Test"
    flow:
      - get:
          url: "/api/users"
      - get:
          url: "/api/products"
      - post:
          url: "/api/orders"
          json:
            product_id: 123
            quantity: 2

Running comparison:

# Baseline
artillery run --output baseline.json artillery-config.yml

# Current
artillery run --output current.json artillery-config.yml

# Compare
artillery report baseline.json --output baseline-report.html
artillery report current.json --output current-report.html
./compare-artillery-results.sh baseline.json current.json

wrk HTTP Benchmarking

# Simple throughput test
wrk_benchmark() {
  local version=$1
  local output_file=$2

  wrk -t12 -c400 -d30s \
      --latency \
      --timeout 10s \
      https://api.example.com/endpoint \
      > "$output_file"
}

# Baseline
wrk_benchmark "v2.3.0" "wrk-baseline.txt"

# Current
wrk_benchmark "v2.4.0" "wrk-current.txt"

# Compare
./parse-wrk-results.sh wrk-baseline.txt wrk-current.txt

Apache Bench (ab)

# Quick regression check
ab_compare() {
  local baseline_version=$1
  local current_version=$2

  echo "=== Baseline ${baseline_version} ==="
  ab -n 10000 -c 100 https://api.example.com/ > ab-baseline.txt

  echo "=== Current ${current_version} ==="
  ab -n 10000 -c 100 https://api.example.com/ > ab-current.txt

  # Extract key metrics
  echo "Comparison:"
  echo "Baseline RPS: $(grep 'Requests per second' ab-baseline.txt | awk '{print $4}')"
  echo "Current RPS: $(grep 'Requests per second' ab-current.txt | awk '{print $4}')"
}

Hyperfine (CLI tool benchmarking)

# For CLI tool performance
hyperfine \
  --warmup 3 \
  --export-json comparison.json \
  'git checkout v2.3.0 && npm run build && npm test' \
  'git checkout v2.4.0 && npm run build && npm test'

Statistical Significance Testing

T-Test for Mean Comparison

function detectLatencyRegression(
  baseline: number[],
  current: number[]
): RegressionAnalysis {
  const baselineMean = mean(baseline);
  const currentMean = mean(current);
  const delta = currentMean - baselineMean;
  const deltaPercent = (delta / baselineMean) * 100;

  // Perform Welch's t-test
  const tTestResult = welchTTest(baseline, current);

  return {
    metric: 'latency',
    baseline: {
      mean: baselineMean,
      stddev: stddev(baseline),
      n: baseline.length,
    },
    current: {
      mean: currentMean,
      stddev: stddev(current),
      n: current.length,
    },
    delta,
    deltaPercent,
    pValue: tTestResult.pValue,
    isSignificant: tTestResult.pValue < 0.05,
    isRegression: deltaPercent > 10 && tTestResult.pValue < 0.05,
    confidence: 1 - tTestResult.pValue,
  };
}

Mann-Whitney U Test (Non-Parametric)

function detectThroughputRegression(
  baseline: number[],
  current: number[]
): RegressionAnalysis {
  const baselineMedian = median(baseline);
  const currentMedian = median(current);
  const delta = currentMedian - baselineMedian;
  const deltaPercent = (delta / baselineMedian) * 100;

  // Use Mann-Whitney U for non-normal distributions
  const uTestResult = mannWhitneyU(baseline, current);

  return {
    metric: 'throughput',
    baseline: {
      median: baselineMedian,
      iqr: iqr(baseline),
      n: baseline.length,
    },
    current: {
      median: currentMedian,
      iqr: iqr(current),
      n: current.length,
    },
    delta,
    deltaPercent,
    pValue: uTestResult.pValue,
    isSignificant: uTestResult.pValue < 0.05,
    isRegression: deltaPercent < -10 && uTestResult.pValue < 0.05,
    confidence: 1 - uTestResult.pValue,
  };
}

Multiple Comparison Correction

// Apply Bonferroni correction when testing multiple metrics
function detectMultiMetricRegressions(
  metrics: MetricComparison[],
  familyAlpha: number = 0.05
): RegressionResult[] {
  const adjustedAlpha = familyAlpha / metrics.length;

  return metrics.map(metric => ({
    ...metric,
    adjustedAlpha,
    isSignificantCorrected: metric.pValue < adjustedAlpha,
  }));
}

Threshold Configuration

Configure regression detection thresholds in

.aiwg/config/performance-thresholds.yaml

performance_thresholds:
  latency:
    p50:
      warning: 10%    # Warn if p50 increases by >10%
      critical: 20%   # Critical if p50 increases by >20%
    p95:
      warning: 15%
      critical: 25%
    p99:
      warning: 20%
      critical: 30%

  throughput:
    requests_per_sec:
      warning: -10%   # Warn if RPS decreases by >10%
      critical: -20%
    transactions_per_sec:
      warning: -10%
      critical: -20%

  memory:
    heap_size:
      warning: 20%
      critical: 40%
    growth_rate:
      warning: 10      # MB/hour
      critical: 25

  resources:
    cpu_utilization:
      warning: 20%
      critical: 40%
    io_wait:
      warning: 25%
      critical: 50%

  statistical:
    significance_level: 0.05  # p-value threshold
    min_sample_size: 100      # Minimum observations
    multiple_comparison_correction: bonferroni

Memory Leak Detection

Heap Snapshot Comparison

interface HeapAnalysis {
  version: string;
  timestamp: string;
  totalSize: number;
  usedSize: number;
  objectCounts: Record<string, number>;
  retainedPaths: RetainedPath[];
}

function detectMemoryLeak(
  baseline: HeapAnalysis,
  current: HeapAnalysis
): MemoryLeakReport {
  const growthRate = (current.usedSize - baseline.usedSize) /
                     (current.timestamp - baseline.timestamp);

  const suspectObjects = Object.keys(current.objectCounts)
    .filter(type => {
      const baseCount = baseline.objectCounts[type] || 0;
      const currCount = current.objectCounts[type];
      const growth = currCount - baseCount;
      return growth > 1000; // >1000 additional objects
    })
    .map(type => ({
      type,
      baseline: baseline.objectCounts[type] || 0,
      current: current.objectCounts[type],
      delta: current.objectCounts[type] - (baseline.objectCounts[type] || 0),
    }));

  return {
    isLeak: growthRate > 10 * 1024 * 1024, // >10MB/hour
    growthRate,
    suspectObjects,
    recommendation: suspectObjects.length > 0
      ? `Investigate ${suspectObjects[0].type} accumulation`
      : 'Run extended profiling session',
  };
}

Continuous Memory Monitoring

#!/bin/bash
# monitor-memory-regression.sh

monitor_memory() {
  local duration_minutes=$1
  local interval_seconds=$2
  local output_file=$3

  echo "timestamp,rss,heap_used,heap_total,external" > "$output_file"

  for i in $(seq 1 $((duration_minutes * 60 / interval_seconds))); do
    local stats=$(curl -s http://localhost:3000/metrics/memory)
    echo "$(date +%s),$stats" >> "$output_file"
    sleep "$interval_seconds"
  done
}

# Run for 30 minutes, sample every 10 seconds
monitor_memory 30 10 memory-profile.csv

# Analyze for leaks
python analyze-memory-trend.py memory-profile.csv

Performance Regression Report Format

Executive Summary

# Performance Regression Report

**Project**: my-api-service
**Analysis Date**: 2024-01-25
**Baseline Version**: v2.3.0
**Current Version**: v2.4.0
**Test Duration**: 10 minutes
**Load Profile**: 100 concurrent users, steady state

## Executive Summary

**Regression Detected**: YES (HIGH severity)
**Metrics Affected**: 5 of 8 metrics show significant degradation
**Statistical Confidence**: 99.5%
**Recommendation**: DO NOT DEPLOY - requires investigation

| Category | Status | Details |
|----------|--------|---------|
| Latency | ⚠️ REGRESSION | p50: +15%, p95: +21%, p99: +22% |
| Throughput | ⚠️ REGRESSION | -16% requests/sec |
| Memory | ⚠️ REGRESSION | +22% heap, potential leak |
| CPU | ✅ PASS | Within acceptable range |
| I/O | ✅ PASS | No significant change |

Detailed Metrics Comparison

## Latency Metrics

| Metric | Baseline | Current | Delta | % Change | p-value | Regression? |
|--------|----------|---------|-------|----------|---------|-------------|
| p50 | 45ms | 52ms | +7ms | +15.6% | 0.001 | ⚠️ YES |
| p75 | 78ms | 92ms | +14ms | +17.9% | 0.002 | ⚠️ YES |
| p90 | 105ms | 128ms | +23ms | +21.9% | 0.001 | ⚠️ YES |
| p95 | 120ms | 145ms | +25ms | +20.8% | 0.001 | ⚠️ YES |
| p99 | 180ms | 220ms | +40ms | +22.2% | 0.001 | ⚠️ YES |
| max | 450ms | 580ms | +130ms | +28.9% | 0.023 | ⚠️ YES |

**Statistical Test**: Welch's t-test
**Significance Level**: α = 0.05 (Bonferroni corrected: 0.0083)
**Confidence**: 99.9% that degradation is real

### Latency Distribution

Baseline v2.3.0 Current v2.4.0 │ │ 600 │ │ * │ │ 500 │ │ * │ │ * 400 │ │ * │ │ 300 │ │ ** │ │ * 200 │ * │ ** │ ** │ ** 100 │ **** │ ** │ ***** │** 0 └───────────────── └───────────────── 0 25 50 75 95 99 0 25 50 75 95 99 Percentile Percentile

Throughput Regression

## Throughput Metrics

| Metric | Baseline | Current | Delta | % Change | p-value | Regression? |
|--------|----------|---------|-------|----------|---------|-------------|
| Requests/sec | 2500 | 2100 | -400 | -16.0% | 0.003 | ⚠️ YES |
| Total requests | 150000 | 126000 | -24000 | -16.0% | - | - |
| Failed requests | 150 (0.1%) | 1260 (1.0%) | +1110 | +740% | 0.001 | ⚠️ YES |

**Statistical Test**: Mann-Whitney U test
**Confidence**: 99.7% that throughput decreased

### Time Series Comparison

Requests per Second Over Time

3000 ┤ │ Baseline ───── 2500 │ ████████████████████████████████████ │ 2000 │ Current ─ ─ ─ ─ │ ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ 1500 │ │ 1000 │ │ 500 │ │ 0 └──────────────────────────────────────── 0 2m 4m 6m 8m 10m Time

Memory Analysis

## Memory Metrics

| Metric | Baseline | Current | Delta | % Change | Leak Detected? |
|--------|----------|---------|-------|----------|----------------|
| Heap Used | 256 MB | 312 MB | +56 MB | +21.9% | ⚠️ Suspected |
| RSS | 384 MB | 468 MB | +84 MB | +21.9% | - |
| Growth Rate | 2 MB/hour | 18 MB/hour | +16 MB/hour | +800% | ⚠️ YES |
| GC Frequency | 12/min | 18/min | +6/min | +50% | ⚠️ Pressure |

**Memory Leak Analysis**:
- **Leak Detected**: YES (high confidence)
- **Growth Rate**: 18 MB/hour (threshold: 10 MB/hour)
- **Suspect Objects**: Array (+12500 instances), Closure (+8200 instances)
- **Recommendation**: Take heap snapshot and analyze retained paths

### Heap Growth Over Time

Heap Size (MB)

400 ┤ Current ╱╱ │ ╱╱╱╱╱ 350 │ ╱╱╱╱╱ │ ╱╱╱╱╱ 300 │ ╱╱╱╱╱ │ ╱╱╱╱╱ 250 │ ╱╱╱╱╱ Baseline ──── │ ╱╱╱╱╱ ────────────────────────── 200 │ │ 150 │ │ 100 │ └──────────────────────────────────────── 0 10 20 30 40 50 60 Time (minutes)

Root Cause Indicators

## Root Cause Analysis

**Automated Analysis**:

### Likely Culprits

1. **Synchronous I/O in Request Path** (HIGH confidence)
   - Evidence: p99 latency +22%, I/O wait +15%
   - Pattern: Blocking operations correlate with latency spikes
   - Recommendation: Profile with `clinic doctor` or `0x`

2. **Memory Accumulation** (HIGH confidence)
   - Evidence: Linear heap growth at 18 MB/hour
   - Pattern: Array and Closure object growth
   - Recommendation: Take heap snapshots at t=0 and t=30min, analyze diff

3. **Increased Computation** (MEDIUM confidence)
   - Evidence: CPU +12%, throughput -16%
   - Pattern: Less throughput despite similar CPU usage
   - Recommendation: Profile with `perf` or `clinic flame`

### Recommended Investigation Steps

1. **Git Bisect**: Identify introducing commit
   ```bash
   git bisect start HEAD v2.3.0
   git bisect run ./performance-test.sh

Heap Snapshot Diff:

node --heap-prof app.js  # Baseline
# Run load test
node --heap-prof app.js  # Current
# Compare snapshots

Flame Graph Analysis:

clinic flame -- node app.js
# Analyze hot paths


### Recommendations

```markdown
## Recommendations

### Immediate Actions (DO NOT DEPLOY)

- [ ] **Do not deploy v2.4.0** - Severity is HIGH
- [ ] Run git bisect to identify introducing commit
- [ ] Take heap snapshots and analyze memory accumulation
- [ ] Profile CPU with flame graphs to identify hot paths

### Investigation Priority

| Priority | Action | Owner | ETA |
|----------|--------|-------|-----|
| P0 | Identify regression commit via bisect | Regression Analyst | 2 hours |
| P0 | Analyze heap snapshots for leak | Performance Engineer | 4 hours |
| P1 | Profile CPU for hot paths | Performance Engineer | 4 hours |
| P1 | Review recent I/O changes | Developer | 2 hours |

### Prevention (After Fix)

- [ ] Add performance regression tests to CI
- [ ] Set up continuous memory profiling
- [ ] Enable automated alerts for p99 > 200ms
- [ ] Add load testing to pull request checks

Usage Examples

Full Performance Comparison

User: "Performance regression check between v2.3.0 and v2.4.0"

Skill executes:
1. Checkout v2.3.0, run k6 benchmark → baseline-results.json
2. Checkout v2.4.0, run k6 benchmark → current-results.json
3. Statistical comparison → 5 of 8 metrics regressed
4. Memory profiling → leak detected (+18 MB/hour)
5. Generate report → .aiwg/reports/performance-regression-20240125.md

Output:
"⚠️ PERFORMANCE REGRESSION DETECTED

Severity: HIGH
Confidence: 99.5%
Metrics Affected: Latency (+15-22%), Throughput (-16%), Memory Leak

Report: .aiwg/reports/performance-regression-20240125.md

DO NOT DEPLOY - Investigation required"

Quick Latency Check

User: "Check latency regression"

Skill executes:
1. Compare current p99 against baseline
2. Statistical t-test for significance
3. Quick report

Output:
"Latency Regression Detected:
  p50: 52ms (baseline: 45ms) → +15.6% ⚠️
  p95: 145ms (baseline: 120ms) → +20.8% ⚠️
  p99: 220ms (baseline: 180ms) → +22.2% ⚠️

Statistical confidence: 99.9%
Exceeds threshold: YES (>10%)
Recommendation: Investigate before deploying"

Memory Leak Detection

User: "Detect memory leak"

Skill executes:
1. Monitor memory for 30 minutes
2. Calculate growth rate
3. Identify accumulating objects

Output:
"⚠️ MEMORY LEAK DETECTED

Growth Rate: 18 MB/hour (threshold: 10 MB/hour)
Suspect Objects:
  - Array: +12500 instances
  - Closure: +8200 instances

Heap snapshots saved:
  - .aiwg/profiling/heap-t0.heapsnapshot
  - .aiwg/profiling/heap-t30.heapsnapshot

Recommendation: Analyze with Chrome DevTools Memory Profiler"

Benchmark Comparison

User: "Compare performance with main branch"

Skill executes:
1. Stash current changes
2. Benchmark main branch
3. Restore changes
4. Benchmark current branch
5. Statistical comparison

Output:
"Performance Comparison: feature-branch vs main

Latency (p99):
  main: 180ms
  feature-branch: 185ms
  Delta: +2.8% (within threshold ✓)

Throughput:
  main: 2500 req/s
  feature-branch: 2480 req/s
  Delta: -0.8% (within threshold ✓)

Verdict: NO REGRESSION DETECTED
Safe to merge"

Integration

This skill uses:

```
artifact-metadata
```
: Load benchmark results from previous runs
```
project-awareness
```
: Detect git tags and version history
```
regression-analyst
```
agent: For detailed root cause analysis when regression detected

This skill is used by:

CI/CD pipelines for automated regression gates
Developers for pre-merge performance validation
Release managers for go/no-go decisions

Automated Detection in CI

GitHub Actions Example

# .github/workflows/performance-gate.yml
name: Performance Regression Gate

on:
  pull_request:
    branches: [main]

jobs:
  performance-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Full history for baseline

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Benchmark baseline (main branch)
        run: |
          git checkout main
          npm run build
          k6 run --out json=baseline.json benchmark.k6.js

      - name: Benchmark current (PR branch)
        run: |
          git checkout ${{ github.head_ref }}
          npm run build
          k6 run --out json=current.json benchmark.k6.js

      - name: Performance regression analysis
        id: regression
        run: |
          npm run analyze:performance -- \
            --baseline baseline.json \
            --current current.json \
            --threshold 10 \
            --output regression-report.md

      - name: Comment PR with results
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('regression-report.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: report
            });

      - name: Fail if regression detected
        if: steps.regression.outputs.regression == 'true'
        run: |
          echo "Performance regression detected - see report"
          exit 1

Output Locations

Performance reports:

.aiwg/reports/performance-regression-{date}.md

Benchmark data:

.aiwg/benchmarks/{version}-{timestamp}.json

Heap snapshots:

.aiwg/profiling/heap-{version}-{timestamp}.heapsnapshot

Flame graphs:

.aiwg/profiling/flame-{version}-{timestamp}.html

Regression register:

.aiwg/testing/regression-register/REG-PERF-{id}.yaml

References

@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/regression-analyst.md - Regression analysis agent
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/performance-engineer.md - Performance optimization agent
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema
@.aiwg/research/findings/REF-013-metagpt.md - Debug memory pattern for performance history
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Execution validation requirements