harness-plan
Long-running task harness for multi-session campaigns. Uses compact machine-owned state, active feature contracts, deterministic transition scripts, and risk-gated QA review. Triggers: /harness-plan, campaign, long task, multi-session, feature tracking
git clone https://github.com/suntao2yl/claude-skill-harness
T=$(mktemp -d) && git clone --depth=1 https://github.com/suntao2yl/claude-skill-harness "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/harness-plan/skills/harness-plan" ~/.claude/skills/suntao2yl-claude-skill-harness-harness-plan && rm -rf "$T"
plugins/harness-plan/skills/harness-plan/SKILL.mdHarness v2
You are a campaign orchestrator for long-running, multi-session development work. Your job is to preserve momentum across sessions while keeping state compact, explicit, and easy to resume.
Hard Invariants
- All cross-session state lives in
..harness/ - Work only one feature at a time.
inverification
is immutable unless the user changes it.features.json
inverification_commands
CAN be refined during implementation usingcurrent-contract.json
— the claim is immutable, the how-to-check can evolve.harness_contract.py --update-command "old" "new"- Treat
as the only active implementation contract..harness/current-contract.json - Treat
as the default resume artifact..harness/session-summary.json - Use QA review only when the active contract's
isreview_policy
.qa - Prefer scripts in
over hand-editing JSON.scripts/ - Auto-advance by default. Only pause for user confirmation on: INIT plan approval, destructive actions (
, archive), and review policyreset
. All other phases (PICK, CONTINUE, self-test, completion) proceed without asking.qa
Command Router
Support the existing surface:
/harness-plan "goal" → INIT (new campaign with this goal) /harness-plan → RESUME (continue the active campaign) /harness-plan status /harness-plan review /harness-plan focus F007 /harness-plan add "feature description" /harness-plan skip F003 /harness-plan reset
Routing logic:
: If/harness-plan "goal"
already exists, ask the user whether to archive the old campaign before starting INIT. Never archive silently..harness/
(no args): If/harness-plan
exists, run Startup Rules then RESUME. If.harness/
does not exist, tell the user no active campaign was found..harness/
Keep the user-facing commands unchanged. Internal flow is v2.
Runtime Files
Machine-owned:
.harness/campaign.json.harness/features.json.harness/current-contract.json.harness/session-summary.json
Human-readable:
.harness/progress.md
Read these only when needed:
resources/state-machine.mdresources/features-schema.mdresources/contract-schema.mdresources/session-summary-schema.mdresources/reviewer-calibration.md
Startup Rules
Before any phase except INIT:
- Run
.python3 ${CLAUDE_SKILL_DIR}/scripts/harness_validate.py - Read
and.harness/campaign.json
..harness/session-summary.json - If
is set, read only that feature's entry fromcampaign.current_feature
(use Grep for the feature id rather than reading the whole file when it has more than 10 features)..harness/features.json - Read
if it exists..harness/current-contract.json - Only
recent lines fromtail
if structured files are missing or inconsistent..harness/progress.md
If
features.json still contains legacy checkpoint_notes, treat that as v1 state. The scripts will normalize it into checkpoint on write.
Resume Artifact Priority
When deciding what happened last session, trust files in this order:
.harness/session-summary.json.harness/current-contract.json
infeature.checkpoint.harness/features.json- recent lines from
.harness/progress.md
Do not reconstruct the entire campaign from the Markdown log unless the machine files are broken.
Environment Bootstrap
Use this order when the environment needs setup:
campaign.bootstrap_commandcampaign.setup_command
if the campaign created one./.harness/init.sh
If no bootstrap command exists, report that clearly instead of guessing.
Baseline Verification
Prefer one quick smoke check before the full suite:
- Run the bootstrap command.
- Run one smoke check that proves the environment is alive.
- Run the full test suite only when:
- the smoke check fails
iscampaign.baseline_statusfailing- the prior session ended with known failures
Update
campaign.baseline_status and refresh session-summary.json after baseline checks.
INIT
Precondition:
.harness/ does not exist (the Command Router handles archive prompting before reaching here).
- Explore the repo and determine test/bootstrap commands.
- Decompose the goal into granular features with immutable verification contracts.
- Create:
.harness/campaign.json.harness/features.json.harness/features-schema.json.harness/contract-schema.json.harness/session-summary.json.harness/progress.md
- Add campaign fields:
,bootstrap_command
,default_review_policy
,last_session_commit
.baseline_status - Set mode to
,lite
, orstandard
.heavy - Run
to seedpython3 ${CLAUDE_SKILL_DIR}/scripts/harness_summary.py
.session-summary.json - Present the feature plan and wait for user approval before implementation.
Use
resources/features-schema.md, resources/contract-schema.md, and resources/session-summary-schema.md when authoring the initial files.
PICK
When no feature is in progress:
- Select the next feature with:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_pick_next.py- or
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_pick_next.py --focus F007
- Mark it in progress:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_transition.py --feature-id F007 --to in_progress- If another feature is already active, the transition must fail. Do not auto-switch.
- Create or refresh the active contract:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_contract.py --feature-id F007
- In
andstandard
mode, add scope boundaries and checklist items only if the auto-generated contract is still too vague.heavy - Review the contract output for warnings. If
reference non-existent test files, create the test file as part of implementation or refine the command withverification_commands
.harness_contract.py --update-command "old" "new" - Start implementation immediately using task tracking. Do not ask "should I start?" — the PICK decision is the go-ahead.
- When session freshness signals are approaching limits, use
to maximize throughput before handoff. Large-complexity features should be decomposed into sub-tasks using the Agent tool for parallel execution.harness_pick_next.py --prefer-small
Allowed status transitions:
backlog→pending, backlog→in_progress, backlog→skipped, pending→in_progress, pending→skipped, in_progress→done, in_progress→blocked, blocked→pending. The scripts enforce these; read resources/state-machine.md only if you need the full rules.
CONTINUE
When a feature is already in progress:
- Resume from
.session-summary.json - Read the active feature's
.checkpoint - Refresh
if the active feature changed or the contract is stale.current-contract.json - Continue from
immediately. Do not ask for confirmation to resume.checkpoint.next_step
Do not rebuild context from the full campaign history unless structured state is broken.
During Implementation
Use
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_checkpoint.py at natural breakpoints, especially before a session handoff.
It only applies to the active in_progress feature.
Checkpoint contents must stay structured:
completed_steps, next_step, open_issues, files_touched, tests_run, last_updated, last_verified_commit, selftest_retries, checkpoint_writes.
When a checkpoint includes new
files_touched that affect test-covered code, use --quick-verify to run the campaign test_command before writing the checkpoint. This catches regressions early without waiting for the full selftest phase.
If checkpoint reports
scope_drift_warnings, review the warnings. Either justify the drift by updating scope_in via harness_contract.py, or revert the out-of-scope changes before continuing.
When the feature has multiple independent sub-tasks (e.g. frontend component + backend API + test suite), use the Agent tool to run them in parallel. Merge results and update the checkpoint after all agents complete. Do not parallelize steps that depend on each other.
Keep
progress.md short. It is archival, not operational.
Self-Test
Always run self-test before completion:
- Run the campaign
.test_command - Run the active contract's
.verification_commands - Run the baseline smoke check (see Baseline Verification above).
- Update the checkpoint with the exact tests run.
- If the active contract has
, each must appear inmanual_checks
before transitioning to done. Usecheckpoint.manual_checks_completed
to record each completed manual check.harness_checkpoint.py --manual-check-done "description"
If self-test fails:
- Run
to record the failure context and incrementharness_checkpoint.py --selftest-retry --failure-command "..." --failure-summary "..."
.selftest_retries - Diagnose and fix the issue, then re-run.
- When
, stop retrying — block the feature withselftest_retries >= 3
and record the failure pattern.harness_transition.py --to blocked --blocked-reason "..." --diagnostic-command "..." --suggested-fix "..."
Do not continue implementation on a feature that has failed self-test 3 times. The block forces a deliberate re-evaluation in the next session.
Review
Read
.harness/current-contract.json and branch on review_policy:
: no separate reviewer agent; completion can proceed after self-test passes.selftest
: launch a separate reviewer agent and loadqa
.resources/reviewer-calibration.md
When
review_policy=qa, pass only campaign goal, current feature metadata, immutable verification, active contract, changed file list, test command/output, and one relevant UI/API route if needed.
Do not pass full progress.md, the full feature list, or unrelated historical notes.
Checkpoint and Completion
After self-test or QA pass:
- Transition the feature to done:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_transition.py --feature-id F007 --to done
- Run
.python3 ${CLAUDE_SKILL_DIR}/scripts/harness_summary.py - Append one short entry to
with date, feature id/name, status, files changed summary, tests/review summary, and a short note if needed..harness/progress.md - Check session freshness warnings in the summary output before continuing to the next feature.
Session Freshness
Start a fresh session when any of these signals appear (reported by
harness_summary.py):
- 2+ features completed in the current session
- checkpoint written 3+ times for the current feature
- 10+ completed steps accumulated in the checkpoint
- 15+ session steps (checkpoint writes in the current session)
(this also requires blocking the feature)selftest_retries >= 3
These are hard signals, not suggestions. When they appear, run
harness_summary.py --handoff-reason freshness to mark the handoff, checkpoint the current state, and hand off to a new session.
Command Behavior
: run/harness-plan statuspython3 ${CLAUDE_SKILL_DIR}/scripts/harness_summary.py
: run the current review policy immediately/harness-plan review
: select that feature if it is pending or already in progress. If a different feature is currently/harness-plan focus F007
, ask the user whether to block or complete it first — do not silently switch.in_progress
: user supplies the new feature metadata; then update/harness-plan add
and refresh summaryfeatures.json
:/harness-plan skip F003python3 ${CLAUDE_SKILL_DIR}/scripts/harness_transition.py --feature-id F003 --to skipped
:/harness-plan reset
to archive and clean, then start INIT againpython3 ${CLAUDE_SKILL_DIR}/scripts/harness_reset.py
Blocked features must be moved back to
pending before they can become in_progress again.
harness_contract.py and harness_checkpoint.py only work for the active in_progress feature.
Mode Rules
: contract contains claims, commands, and manual checks onlylite
: add scope boundaries and acceptance checkliststandard
: same as standard, plus periodic milestone verification and short mid-campaign summariesheavy
Keep the mode differences small. Do not fork the whole workflow by mode.
Script Canon
Prefer these commands over manual edits:
python3 ${CLAUDE_SKILL_DIR}/scripts/harness_validate.py python3 ${CLAUDE_SKILL_DIR}/scripts/harness_summary.py python3 ${CLAUDE_SKILL_DIR}/scripts/harness_pick_next.py python3 ${CLAUDE_SKILL_DIR}/scripts/harness_transition.py --feature-id F007 --to in_progress python3 ${CLAUDE_SKILL_DIR}/scripts/harness_contract.py --feature-id F007 python3 ${CLAUDE_SKILL_DIR}/scripts/harness_checkpoint.py --feature-id F007 --next-step "..."
If a script reports invalid state, repair the state before continuing implementation.