Claude-code-minoan test-harness-auditor
Audit a repo's test, lint, type-check, static analysis, build, and debug infrastructure for AI coding agents. Generate scored reports and optimized configs for the lint-on-write hook. Triggers on audit tests, test harness, lint setup, check test infrastructure, entering a new repo.
git clone https://github.com/tdimino/claude-code-minoan
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/core-development/test-harness-auditor" ~/.claude/skills/tdimino-claude-code-minoan-test-harness-auditor && rm -rf "$T"
skills/core-development/test-harness-auditor/SKILL.mdTest Harness Auditor
Audit any repo's feedback infrastructure across six layers and generate optimized configs for AI coding agents.
When to Run
- Entering a new repo with no
.claude/lint-rules.json - User asks to audit tests, lint setup, or agent infrastructure
- After cloning a repo to check what feedback loops exist
- Periodically to catch configuration drift
Two-Phase Workflow
Phase 1: Audit (read-only)
Run the audit script to scan the current repo:
uv run ~/.claude/skills/test-harness-auditor/scripts/audit.py
Or target a specific directory:
uv run ~/.claude/skills/test-harness-auditor/scripts/audit.py /path/to/repo
For machine-readable output (consumed by Phase 2):
uv run ~/.claude/skills/test-harness-auditor/scripts/audit.py --json > /tmp/audit.json
To save a snapshot for drift detection (tracks score changes over time):
uv run ~/.claude/skills/test-harness-auditor/scripts/audit.py --save
Combine flags:
--json --save saves the snapshot AND outputs JSON. On subsequent --save runs, the report includes a drift section showing score regressions, config changes, and residue file changes.
The script produces a structured Markdown report (or JSON with
--json) with:
- Stack summary: detected language, frameworks, package manager, actual scripts from package.json
- Scorecard: 0-3 score for each of the six layers (test, lint, type-check, SA, build, debug)
- Findings: per-layer details on what was detected
- Debugging residue: files matching
,*_v2.*
,*_backup.*
patterns*_fixed.* - Recommendations: prioritized by impact on agent feedback quality (P0-P3)
Present the report to the user. Ask which recommendations to implement before proceeding to Phase 2.
Phase 1.5: Convention Extraction (optional)
Extract "never X"/"always Y" constraints from CLAUDE.md into candidate lint rules:
uv run ~/.claude/skills/test-harness-auditor/scripts/extract_conventions.py
Outputs JSON with candidate lint-rules.json entries derived from project constraints. Present candidates to the user for approval before merging.
Phase 2: Config Generation (after user approval)
Run the generation script (optionally with audit JSON for accurate commands):
uv run ~/.claude/skills/test-harness-auditor/scripts/generate.py --audit /tmp/audit.json
Or without audit data (re-detects stack):
uv run ~/.claude/skills/test-harness-auditor/scripts/generate.py
When
--audit is used, generate.py uses actual commands from package.json (vitest, playwright, biome, etc.) instead of generic templates, and detects separate E2E vs unit test runners.
This produces three outputs:
-
— custom grep-based rules for the lint-on-write hook.claude/lint-rules.json- Stack-specific rules (security, debugging residue, error boundaries, observability)
- Auto-includes matching rule packs from
(react, rust-workspace, python-cli; functional-ts is opt-in only)rule-library/ - Merges with existing config if present (preserves user customizations)
- Tagged rules (
field) enable idempotent re-runs_tag
-
CLAUDE.md testing section — test/lint/typecheck/build/SA commands
- Follows claude-md-manager conventions (command-first, concise)
- Section-aware merge: when existing CLAUDE.md is found, surgically replaces only
and## Commands
sections, preserving all other content## Testing - Present as a proposal — do not overwrite existing CLAUDE.md content
-
Hook recommendations — which PostToolUse hooks to enable
- lint-on-write (primary), test-on-fix, type-check-on-write
For each generated config, present it to the user and ask for approval before writing.
Scoring System
| Score | Meaning |
|---|---|
| 0 | Absent — agent is flying blind on this layer |
| 1 | Minimal — basic tool present but not configured for agents |
| 2 | Adequate — tool configured and runnable |
| 3 | Excellent — strict mode, mutation testing, or advanced config |
Six Assessment Layers
- Test suite: framework, runner command, coverage config, mutation testing
- Linting: standard linter, custom rules, agent-specific rules
- Type checking: type checker, strict mode, CI integration
- Static analysis: security scanners, complexity checkers, dependency audit
- Build/compilation: build command, incremental build, CI validation
- Debugger/REPL: debugger availability, REPL access
Integration
- lint-on-write hook: generated
is consumed bylint-rules.json
(violations are severity-tiered: BLOCKING > HIGH > MEDIUM)~/.claude/hooks/lint-on-write.py - claude-md-manager: generated CLAUDE.md sections follow its conventions (WHAT/WHY/HOW, command-first)
- agents-md-manager: for cross-agent compatibility, consider also generating AGENTS.md
- agnix: complementary tool — validates the agent config files themselves (385 rules for CLAUDE.md/AGENTS.md/SKILL.md stale paths, dead commands, context rot). Our skill validates the codebase infrastructure.
Rule Library
44 rules across 4 domain-specific packs in
rule-library/. Auto-loaded packs are selected by generate.py based on detected frameworks and stack. All patterns are single-line grep -En detectable.
| Pack | Matches | Rules | Highlights |
|---|---|---|---|
| react, next frameworks | 10 | disabled-exhaustive-deps, key-index, async-use-effect, disabled-hooks-rule, context-object-literal |
| rust stack | 8 | expect-empty-msg, anyhow-in-lib, dbg-macro, panic-outside-tests, println-residue |
| python stack | 13 | shell-true, insecure-deserialization, mutable-default-arg, requests-no-timeout, commonprefix |
| Opt-in only | 13 | array-mutation, sort-reverse, delete-operator, any-type, enum-declaration, namespace-declaration |
Opt-in packs
Packs with
"_opt_in": true are never auto-loaded. The functional-ts pack enforces strict-FP immutability patterns (Open Souls paradigm). To use it, manually copy its rules into your project's .claude/lint-rules.json.
Exclusion fields
Rules support two exclusion mechanisms:
— glob-matched against file paths (e.g.exclude_paths
,"*/bin/*"
). Skips the file entirely before grep runs."*/main.rs"
— regex-matched against grep output line text (e.g.exclude_patterns
,"test"
). Filters matched lines after grep runs."// nosec"
To add a custom rule pack, create a JSON file in
rule-library/ with _frameworks (list) and/or _stack (string) matching fields, plus a rules array. Set "_opt_in": true to prevent auto-loading. Pack rules use pack: prefix in _tag for dedup. See rule-library/INDEX.md for the full inventory.
References
Load these on-demand when deeper context is needed:
— per-stack detection rules and tool recommendationsreferences/stack-profiles.md
— 7 Factory.ai agent lint categories with grep patternsreferences/factory-lint-categories.md
— 10 AI-specific anti-patterns with detection heuristicsreferences/anti-patterns.md
Scope
- First-class stacks: JavaScript/TypeScript, Rust, Python, Go, Ruby
- Other stacks get basic detection with generic recommendations
- Does not write or modify test files
- Does not install tools (recommends what to install)
- Does not modify CI/CD pipelines