Harness-engineering harness-perf

Harness Perf

install
source · Clone the upstream repo
git clone https://github.com/Intense-Visions/harness-engineering
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-perf" ~/.claude/skills/intense-visions-harness-engineering-harness-perf-9f062b && rm -rf "$T"
manifest: agents/skills/claude-code/harness-perf/SKILL.md
source content

Harness Perf

Performance enforcement and benchmark management. Tier-based gates block commits and merges based on complexity, coupling, and runtime regression severity.

When to Use

  • After code changes to verify performance hasn't degraded
  • On PRs to enforce performance budgets
  • For periodic performance audits
  • NOT for initial development (use harness-tdd for that)
  • NOT for brainstorming performance improvements (use harness-brainstorming)

Process

Iron Law

No merge with Tier 1 performance violations. No commit with cyclomatic complexity exceeding the error threshold.

Tier 1 violations are non-negotiable blockers. If a Tier 1 violation is detected, execution halts and the violation must be resolved before any further progress. Do not attempt workarounds.


Phase 1: ANALYZE — Structural and Coupling Checks

  1. Run structural checks. Execute

    harness check-perf --structural
    to compute complexity metrics for all changed files:

    • Cyclomatic complexity per function
    • Nesting depth per function
    • File length (lines of code)
    • Parameter count per function
  2. Run coupling checks. Execute

    harness check-perf --coupling
    to compute coupling metrics:

    • Fan-in and fan-out per module
    • Afferent and efferent coupling
    • Transitive dependency depth
    • Circular dependency detection
  3. Classify violations by tier:

    • Tier 1 (error, block commit): Cyclomatic complexity > 15, circular dependencies, hotspot in top 5%
    • Tier 2 (warning, block merge): Complexity > 10, nesting > 4, fan-out > 10, size budget exceeded
    • Tier 3 (info, no gate): File length > 300, fan-in > 20, transitive depth > 30
  4. If Tier 1 violations found, report them immediately and STOP. Do not proceed to benchmarks. The violations must be fixed first.

  5. If no violations found, proceed to Phase 2.


Graph Availability

Hotspot scoring and coupling analysis benefit from the knowledge graph but work without it.

Staleness sensitivity: Medium -- auto-refresh if >10 commits stale. Hotspot scoring uses churn data which does not change rapidly.

FeatureWith GraphWithout Graph
Hotspot scoring (churn x complexity)
GraphComplexityAdapter
computes from graph nodes
git log --format="%H" -- <file>
for per-file commit count; complexity from
check-perf --structural
output; multiply manually
Coupling ratio
GraphCouplingAdapter
computes from graph edges
Parse import statements, count fan-out/fan-in per file
Critical path resolutionGraph inference (high fan-in) +
@perf-critical
annotations
@perf-critical
annotations only; grep for decorator/comment
Transitive dep depthGraph BFS depthImport chain follow, 2 levels deep

Notice when running without graph: "Running without graph (run

harness scan
to enable hotspot scoring and coupling analysis)"

Impact on tiers: Without graph, Tier 1 hotspot detection is degraded. Hotspot scoring falls back to churn-only (no complexity multiplication). This limitation is documented in the performance report output.


Phase 2: BENCHMARK — Runtime Performance

This phase runs only when

.bench.ts
files exist in the project. If none are found, skip to Phase 3.

  1. Check baseline lock-in. Before running benchmarks, verify baselines are kept in sync:

    • List all
      .bench.ts
      files changed in this PR:
      git diff --name-only | grep '.bench.ts'
    • If any
      .bench.ts
      files are new or modified:
      • Check if
        .harness/perf/baselines.json
        is also modified in this PR
      • If NOT modified: flag as Tier 2 warning: "Benchmark files changed but baselines not updated. Run
        harness perf baselines update
        and commit the result."
      • If modified: verify the updated baselines include entries for all changed benchmarks
    • If no
      .bench.ts
      files changed: skip this check
    • This check also runs standalone via
      --check-baselines
      flag
  2. Check for benchmark files. Scan the project for

    *.bench.ts
    files. If none exist, skip this phase entirely.

  3. Verify clean working tree. Run

    git status --porcelain
    . If there are uncommitted changes, STOP. Benchmarks on dirty trees produce unreliable results.

  4. Run benchmarks. Execute

    harness perf bench
    to run all benchmark suites.

  5. Load baselines. Read

    .harness/perf/baselines.json
    for previous benchmark results. If no baselines exist, treat this as a baseline-capture run.

  6. Compare results against baselines using the

    RegressionDetector
    :

    • Calculate percentage change for each benchmark
    • Apply noise margin (default: 3%) before flagging regressions
    • Distinguish between critical-path and non-critical-path benchmarks
  7. Resolve critical paths via

    CriticalPathResolver
    :

    • Check
      @perf-critical
      annotations in source files
    • Check graph fan-in data (functions called by many consumers)
    • Functions in the critical path set have stricter thresholds
  8. Flag regressions by tier:

    • Tier 1: >5% regression on a critical path benchmark
    • Tier 2: >10% regression on a non-critical-path benchmark
    • Tier 3: >5% regression on a non-critical-path benchmark (within noise margin consideration)
  9. If this is a baseline-capture run, report results without regression comparison. Recommend running

    harness perf baselines update
    to persist.


Phase 3: REPORT — Generate Performance Report

  1. Format violations by tier. Present Tier 1 violations first (most severe), then Tier 2, then Tier 3. Each violation entry includes:

    • File path and function name
    • Metric name and current value
    • Threshold that was exceeded
    • Tier classification and gate impact
  2. Show hotspot scores for top functions if knowledge graph data is available:

    • Query the graph for functions with high churn + high fan-in
    • Rank by composite hotspot score
    • Flag any hotspots that also have performance violations
  3. Show benchmark regression summary if benchmarks ran:

    • Table of benchmark name, baseline, current, delta percentage, tier
    • Highlight critical-path benchmarks with a marker
    • Show noise margin and whether the regression exceeds it
  4. Recommend specific actions for each Tier 1 and Tier 2 violation:

    • For high complexity: suggest extract-method or strategy pattern refactoring
    • For high coupling: suggest interface extraction or dependency inversion
    • For benchmark regressions: suggest profiling the specific code path
    • For size budget violations: suggest module decomposition
  5. Output the report in structured markdown format suitable for PR comments or CI output.


Phase 4: ENFORCE — Apply Gate Decisions

  1. Tier 1 violations present — FAIL. Block commit and merge. List all Tier 1 violations with their locations and values. The developer must fix these before proceeding.

  2. Tier 2 violations present, no Tier 1 — WARN. Allow commit but block merge until addressed. List all Tier 2 violations. These must be resolved before the PR can be merged.

  3. Only Tier 3 or no violations — PASS. Proceed normally. Log Tier 3 violations as informational notes.

  4. Record gate decision in

    .harness/state.json
    under a
    perfGate
    key:

    {
      "perfGate": {
        "result": "pass|warn|fail",
        "tier1Count": 0,
        "tier2Count": 0,
        "tier3Count": 0,
        "timestamp": "ISO-8601"
      }
    }
    
  5. Exit with appropriate code: 0 for pass, 1 for fail, 0 for warn (with warning output).


Harness Integration

  • harness check-perf
    — Primary command for all performance checks. Runs structural and coupling analysis.
  • harness check-perf --structural
    — Run only structural complexity checks.
  • harness check-perf --coupling
    — Run only coupling analysis.
  • harness perf bench
    — Run benchmarks only. Requires clean working tree.
  • harness perf baselines show
    — View current benchmark baselines.
  • harness perf baselines update
    — Persist current benchmark results as new baselines.
  • harness perf --check-baselines
    -- Verify baseline file is updated when benchmarks change. Runs the baseline lock-in check standalone.
  • harness perf critical-paths
    — View the current critical path set and how it was determined.
  • harness validate
    — Run after enforcement to verify overall project health.
  • harness graph scan
    — Refresh knowledge graph for accurate hotspot scoring.

Tier Classification

TierSeverityGateExamples
1errorBlock commitCyclomatic complexity > 15, >5% regression on critical path, hotspot in top 5%, circular dependency
2warningBlock mergeComplexity > 10, nesting > 4, >10% regression elsewhere, fan-out > 10, size budget exceeded
3infoNoneFile length > 300 lines, fan-in > 20, transitive depth > 30, >5% non-critical regression

Success Criteria

  • All Tier 1 violations are resolved before proceeding
  • Performance report follows structured format with tier classification
  • Benchmark regressions are compared against noise margin before flagging
  • Gate decision is recorded in state
  • harness validate
    passes after enforcement

Rationalizations to Reject

These are common rationalizations that sound reasonable but lead to incorrect results. When you catch yourself thinking any of these, stop and follow the documented process instead.

RationalizationWhy It Is Wrong
"The cyclomatic complexity is 16 but the function is straightforward, so I can override the Tier 1 threshold"Tier 1 violations are non-negotiable blockers. No merge with Tier 1 performance violations. If a threshold needs adjustment, reconfigure with documented justification.
"The benchmark regression is only 6% and it is probably just noise"The noise margin (default 3%) is applied before flagging. A 6% regression on a perf-critical path exceeds the Tier 1 threshold even after noise consideration.
"The working tree has a small uncommitted change but it should not affect benchmark results"No running benchmarks with a dirty working tree. Uncommitted changes invalidate benchmark results.
"I will update the baselines to match the new performance numbers rather than fixing the regression"Baselines must come from fresh runs against committed code. Silently moving the goalposts defeats the purpose of performance gates.

Examples

Example: PR with High Complexity Function

Phase 1: ANALYZE
  harness check-perf --structural
  Result: processOrderBatch() in src/orders/processor.ts has cyclomatic complexity 18 (Tier 1, threshold: 15)

Phase 2: BENCHMARK — skipped (Tier 1 violation found)

Phase 3: REPORT
  TIER 1 VIOLATIONS (1):
    - src/orders/processor.ts:processOrderBatch — complexity 18 > 15
      Recommendation: Extract validation and transformation into separate functions

Phase 4: ENFORCE
  Result: FAIL — 1 Tier 1 violation. Commit blocked.

Example: Benchmark Regression on Critical Path

Phase 1: ANALYZE — no structural violations

Phase 2: BENCHMARK
  harness perf bench
  Baseline: parseDocument 4.2ms, current: 4.8ms (+14.3%)
  parseDocument is @perf-critical — Tier 1 threshold applies (>5%)

Phase 3: REPORT
  TIER 1 VIOLATIONS (1):
    - parseDocument: 14.3% regression on critical path (threshold: 5%)
      Recommendation: Profile parseDocument to identify the regression source

Phase 4: ENFORCE
  Result: FAIL — 1 Tier 1 violation. Merge blocked.

Example: Clean PR with Minor Warnings

Phase 1: ANALYZE
  harness check-perf --structural --coupling
  Result: src/utils/formatter.ts has 320 lines (Tier 3, threshold: 300)

Phase 2: BENCHMARK
  harness perf bench — all within noise margin

Phase 3: REPORT
  TIER 3 INFO (1):
    - src/utils/formatter.ts: 320 lines > 300 line threshold
  No Tier 1 or Tier 2 violations.

Phase 4: ENFORCE
  Result: PASS — no blocking violations.

Gates

  • No ignoring Tier 1 violations. They must be fixed or the threshold must be reconfigured (with documented justification).
  • No running benchmarks with dirty working tree. Uncommitted changes invalidate benchmark results.
  • No updating baselines without running benchmarks. Baselines must come from fresh runs against committed code.
  • No suppressing violations without documentation. If a threshold is relaxed, the rationale must be documented in the project configuration.

Escalation

  • When Tier 1 violations cannot be fixed within the current task: Propose refactoring the function into smaller units, or raising the threshold with a documented justification. Do not silently skip the violation.
  • When benchmark results are noisy or inconsistent: Increase warmup iterations, pin the runtime environment, or run benchmarks in isolation. Report the noise level so the developer can make an informed decision.
  • When critical path detection seems wrong: Check
    @perf-critical
    annotations in source files and verify graph fan-in thresholds. The critical path set can be overridden in
    .harness/perf/critical-paths.json
    .
  • When a violation is a false positive: Document it with a
    // perf-ignore: <reason>
    comment and add the exception to
    .harness/perf/exceptions.json
    .