Harness-engineering harness-perf
<!-- Generated by harness generate-slash-commands. Do not edit. -->
git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/commands/codex/harness/harness-perf" ~/.claude/skills/intense-visions-harness-engineering-harness-perf && rm -rf "$T"
agents/commands/codex/harness/harness-perf/SKILL.mdHarness Perf
Performance enforcement and benchmark management. Tier-based gates block commits and merges based on complexity, coupling, and runtime regression severity.
When to Use
- After code changes to verify performance hasn't degraded
- On PRs to enforce performance budgets
- For periodic performance audits
- NOT for initial development (use harness-tdd for that)
- NOT for brainstorming performance improvements (use harness-brainstorming)
Process
Iron Law
No merge with Tier 1 performance violations. No commit with cyclomatic complexity exceeding the error threshold.
Tier 1 violations are non-negotiable blockers. If a Tier 1 violation is detected, execution halts and the violation must be resolved before any further progress. Do not attempt workarounds.
Phase 1: ANALYZE — Structural and Coupling Checks
-
Run structural checks. Execute
to compute complexity metrics for all changed files:harness check-perf --structural- Cyclomatic complexity per function
- Nesting depth per function
- File length (lines of code)
- Parameter count per function
-
Run coupling checks. Execute
to compute coupling metrics:harness check-perf --coupling- Fan-in and fan-out per module
- Afferent and efferent coupling
- Transitive dependency depth
- Circular dependency detection
-
Classify violations by tier:
- Tier 1 (error, block commit): Cyclomatic complexity > 15, circular dependencies, hotspot in top 5%
- Tier 2 (warning, block merge): Complexity > 10, nesting > 4, fan-out > 10, size budget exceeded
- Tier 3 (info, no gate): File length > 300, fan-in > 20, transitive depth > 30
-
If Tier 1 violations found, report them immediately and STOP. Do not proceed to benchmarks. The violations must be fixed first.
-
If no violations found, proceed to Phase 2.
Graph Availability
Hotspot scoring and coupling analysis benefit from the knowledge graph but work without it.
Staleness sensitivity: Medium -- auto-refresh if >10 commits stale. Hotspot scoring uses churn data which does not change rapidly.
| Feature | With Graph | Without Graph |
|---|---|---|
| Hotspot scoring (churn x complexity) | computes from graph nodes | for per-file commit count; complexity from output; multiply manually |
| Coupling ratio | computes from graph edges | Parse import statements, count fan-out/fan-in per file |
| Critical path resolution | Graph inference (high fan-in) + annotations | annotations only; grep for decorator/comment |
| Transitive dep depth | Graph BFS depth | Import chain follow, 2 levels deep |
Notice when running without graph: "Running without graph (run
harness scan to enable hotspot scoring and coupling analysis)"
Impact on tiers: Without graph, Tier 1 hotspot detection is degraded. Hotspot scoring falls back to churn-only (no complexity multiplication). This limitation is documented in the performance report output.
Phase 2: BENCHMARK — Runtime Performance
This phase runs only when
.bench.ts files exist in the project. If none are found, skip to Phase 3.
-
Check baseline lock-in. Before running benchmarks, verify baselines are kept in sync:
- List all
files changed in this PR:.bench.tsgit diff --name-only | grep '.bench.ts' - If any
files are new or modified:.bench.ts- Check if
is also modified in this PR.harness/perf/baselines.json - If NOT modified: flag as Tier 2 warning: "Benchmark files changed but baselines not updated. Run
and commit the result."harness perf baselines update - If modified: verify the updated baselines include entries for all changed benchmarks
- Check if
- If no
files changed: skip this check.bench.ts - This check also runs standalone via
flag--check-baselines
- List all
-
Check for benchmark files. Scan the project for
files. If none exist, skip this phase entirely.*.bench.ts -
Verify clean working tree. Run
. If there are uncommitted changes, STOP. Benchmarks on dirty trees produce unreliable results.git status --porcelain -
Run benchmarks. Execute
to run all benchmark suites.harness perf bench -
Load baselines. Read
for previous benchmark results. If no baselines exist, treat this as a baseline-capture run..harness/perf/baselines.json -
Compare results against baselines using the
:RegressionDetector- Calculate percentage change for each benchmark
- Apply noise margin (default: 3%) before flagging regressions
- Distinguish between critical-path and non-critical-path benchmarks
-
Resolve critical paths via
:CriticalPathResolver- Check
annotations in source files@perf-critical - Check graph fan-in data (functions called by many consumers)
- Functions in the critical path set have stricter thresholds
- Check
-
Flag regressions by tier:
- Tier 1: >5% regression on a critical path benchmark
- Tier 2: >10% regression on a non-critical-path benchmark
- Tier 3: >5% regression on a non-critical-path benchmark (within noise margin consideration)
-
If this is a baseline-capture run, report results without regression comparison. Recommend running
to persist.harness perf baselines update
Phase 3: REPORT — Generate Performance Report
-
Format violations by tier. Present Tier 1 violations first (most severe), then Tier 2, then Tier 3. Each violation entry includes:
- File path and function name
- Metric name and current value
- Threshold that was exceeded
- Tier classification and gate impact
-
Show hotspot scores for top functions if knowledge graph data is available:
- Query the graph for functions with high churn + high fan-in
- Rank by composite hotspot score
- Flag any hotspots that also have performance violations
-
Show benchmark regression summary if benchmarks ran:
- Table of benchmark name, baseline, current, delta percentage, tier
- Highlight critical-path benchmarks with a marker
- Show noise margin and whether the regression exceeds it
-
Recommend specific actions for each Tier 1 and Tier 2 violation:
- For high complexity: suggest extract-method or strategy pattern refactoring
- For high coupling: suggest interface extraction or dependency inversion
- For benchmark regressions: suggest profiling the specific code path
- For size budget violations: suggest module decomposition
-
Output the report in structured markdown format suitable for PR comments or CI output.
Phase 4: ENFORCE — Apply Gate Decisions
-
Tier 1 violations present — FAIL. Block commit and merge. List all Tier 1 violations with their locations and values. The developer must fix these before proceeding.
-
Tier 2 violations present, no Tier 1 — WARN. Allow commit but block merge until addressed. List all Tier 2 violations. These must be resolved before the PR can be merged.
-
Only Tier 3 or no violations — PASS. Proceed normally. Log Tier 3 violations as informational notes.
-
Record gate decision in
under a.harness/state.json
key:perfGate{ "perfGate": { "result": "pass|warn|fail", "tier1Count": 0, "tier2Count": 0, "tier3Count": 0, "timestamp": "ISO-8601" } } -
Exit with appropriate code: 0 for pass, 1 for fail, 0 for warn (with warning output).
Harness Integration
— Primary command for all performance checks. Runs structural and coupling analysis.harness check-perf
— Run only structural complexity checks.harness check-perf --structural
— Run only coupling analysis.harness check-perf --coupling
— Run benchmarks only. Requires clean working tree.harness perf bench
— View current benchmark baselines.harness perf baselines show
— Persist current benchmark results as new baselines.harness perf baselines update
-- Verify baseline file is updated when benchmarks change. Runs the baseline lock-in check standalone.harness perf --check-baselines
— View the current critical path set and how it was determined.harness perf critical-paths
— Run after enforcement to verify overall project health.harness validate
— Refresh knowledge graph for accurate hotspot scoring.harness graph scan
Tier Classification
| Tier | Severity | Gate | Examples |
|---|---|---|---|
| 1 | error | Block commit | Cyclomatic complexity > 15, >5% regression on critical path, hotspot in top 5%, circular dependency |
| 2 | warning | Block merge | Complexity > 10, nesting > 4, >10% regression elsewhere, fan-out > 10, size budget exceeded |
| 3 | info | None | File length > 300 lines, fan-in > 20, transitive depth > 30, >5% non-critical regression |
Success Criteria
- All Tier 1 violations are resolved before proceeding
- Performance report follows structured format with tier classification
- Benchmark regressions are compared against noise margin before flagging
- Gate decision is recorded in state
passes after enforcementharness validate
Examples
Example: PR with High Complexity Function
Phase 1: ANALYZE harness check-perf --structural Result: processOrderBatch() in src/orders/processor.ts has cyclomatic complexity 18 (Tier 1, threshold: 15) Phase 2: BENCHMARK — skipped (Tier 1 violation found) Phase 3: REPORT TIER 1 VIOLATIONS (1): - src/orders/processor.ts:processOrderBatch — complexity 18 > 15 Recommendation: Extract validation and transformation into separate functions Phase 4: ENFORCE Result: FAIL — 1 Tier 1 violation. Commit blocked.
Example: Benchmark Regression on Critical Path
Phase 1: ANALYZE — no structural violations Phase 2: BENCHMARK harness perf bench Baseline: parseDocument 4.2ms, current: 4.8ms (+14.3%) parseDocument is @perf-critical — Tier 1 threshold applies (>5%) Phase 3: REPORT TIER 1 VIOLATIONS (1): - parseDocument: 14.3% regression on critical path (threshold: 5%) Recommendation: Profile parseDocument to identify the regression source Phase 4: ENFORCE Result: FAIL — 1 Tier 1 violation. Merge blocked.
Example: Clean PR with Minor Warnings
Phase 1: ANALYZE harness check-perf --structural --coupling Result: src/utils/formatter.ts has 320 lines (Tier 3, threshold: 300) Phase 2: BENCHMARK harness perf bench — all within noise margin Phase 3: REPORT TIER 3 INFO (1): - src/utils/formatter.ts: 320 lines > 300 line threshold No Tier 1 or Tier 2 violations. Phase 4: ENFORCE Result: PASS — no blocking violations.
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
| "The cyclomatic complexity is 16 but the function is straightforward, so I can override the Tier 1 threshold" | Tier 1 violations are non-negotiable blockers. No merge with Tier 1 performance violations. If a threshold needs adjustment, reconfigure with documented justification. |
| "The benchmark regression is only 6% and it is probably just noise" | The noise margin (default 3%) is applied before flagging. A 6% regression on a perf-critical path exceeds the Tier 1 threshold even after noise consideration. |
| "The working tree has a small uncommitted change but it should not affect benchmark results" | No running benchmarks with a dirty working tree. Uncommitted changes invalidate benchmark results. |
| "I will update the baselines to match the new performance numbers rather than fixing the regression" | Baselines must come from fresh runs against committed code. Silently moving the goalposts defeats the purpose of performance gates. |
Gates
- No ignoring Tier 1 violations. They must be fixed or the threshold must be reconfigured (with documented justification).
- No running benchmarks with dirty working tree. Uncommitted changes invalidate benchmark results.
- No updating baselines without running benchmarks. Baselines must come from fresh runs against committed code.
- No suppressing violations without documentation. If a threshold is relaxed, the rationale must be documented in the project configuration.
Escalation
- When Tier 1 violations cannot be fixed within the current task: Propose refactoring the function into smaller units, or raising the threshold with a documented justification. Do not silently skip the violation.
- When benchmark results are noisy or inconsistent: Increase warmup iterations, pin the runtime environment, or run benchmarks in isolation. Report the noise level so the developer can make an informed decision.
- When critical path detection seems wrong: Check
annotations in source files and verify graph fan-in thresholds. The critical path set can be overridden in@perf-critical
..harness/perf/critical-paths.json - When a violation is a false positive: Document it with a
comment and add the exception to// perf-ignore: <reason>
..harness/perf/exceptions.json