Agentic-qe test-metrics-dashboard
Use when querying test history, analyzing flakiness rates, tracking MTTR, or building quality trend dashboards from test execution data.
install
source · Clone the upstream repo
git clone https://github.com/proffesor-for-testing/agentic-qe
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/proffesor-for-testing/agentic-qe "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/test-metrics-dashboard" ~/.claude/skills/proffesor-for-testing-agentic-qe-test-metrics-dashboard && rm -rf "$T"
manifest:
.claude/skills/test-metrics-dashboard/SKILL.mdsource content
Test Metrics Dashboard
Data & Analysis skill for querying test execution history, identifying trends, and surfacing actionable quality metrics.
Activation
/test-metrics-dashboard
Key Metrics
Test Health Metrics
| Metric | Formula | Target | Alert |
|---|---|---|---|
| Pass Rate | Passed / Total | > 95% | < 90% |
| Flakiness Rate | Flaky / Total | < 5% | > 10% |
| MTTR | Avg time from failure to fix | < 4 hours | > 24 hours |
| Execution Time | Total suite duration | < 10 min | > 20 min |
| Coverage Delta | Current - Previous | >= 0% | < -2% |
Data Collection
# Export Jest results to JSON npx jest --json --outputFile=test-results/$(date +%Y-%m-%d).json # Parse results for dashboard jq '{ date: .startTime, total: .numTotalTests, passed: .numPassedTests, failed: .numFailedTests, duration_ms: (.testResults | map(.endTime - .startTime) | add), pass_rate: ((.numPassedTests / .numTotalTests) * 100), flaky: [.testResults[] | select(.numPendingTests > 0)] | length }' test-results/$(date +%Y-%m-%d).json
Trend Analysis
# Compare last 5 runs for f in $(ls -t test-results/*.json | head -5); do jq --arg file "$f" '{ file: $file, pass_rate: ((.numPassedTests / .numTotalTests) * 100 | floor), duration_s: ((.testResults | map(.endTime - .startTime) | add) / 1000 | floor) }' "$f" done
Top Failing Tests
# Find most frequently failing tests across runs for f in test-results/*.json; do jq -r '.testResults[] | select(.numFailingTests > 0) | .testFilePath' "$f" done | sort | uniq -c | sort -rn | head -10
Run History
Store dashboard data in
${CLAUDE_PLUGIN_DATA}/test-metrics.log:
2026-03-18|95.2|4.1|312|82.5|3
Format:
date|pass_rate|flakiness_rate|duration_s|coverage_pct|failed_count
Read history for trend detection:
# Coverage trending down? tail -5 "${CLAUDE_PLUGIN_DATA}/test-metrics.log" | awk -F'|' '{print $5}' | sort -n | head -1
Composition
Feeds into:
— quality gate decisions based on metrics/qe-quality-assessment
— investigate top failing tests/test-failure-investigator
— when coverage trends down/coverage-drop-investigator
Gotchas
- Metrics without baselines are meaningless — establish baselines before tracking trends
- Flakiness rate is underreported — a test that fails 1/100 times still breaks CI weekly
- Duration trends upward over time as test count grows — set alerts on rate of increase, not absolute value
- Agent may report metrics from a single run as "trends" — need 5+ data points for meaningful trends