Harness-engineering harness-execution

Harness Execution

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-execution" ~/.claude/skills/intense-visions-harness-engineering-harness-execution-087981 && rm -rf "$T"

manifest: agents/skills/claude-code/harness-execution/SKILL.md

source content

Harness Execution

Execute a plan task by task with atomic commits, checkpoint protocol, and persistent knowledge capture. Stop on blockers. Do not guess.

When to Use

When an approved plan exists (output of harness-planning) and implementation should begin
When resuming execution of a previously started plan after a context reset
When
```
on_new_feature
```
or
```
on_bug_fix
```
triggers fire and a plan is already in place
NOT when no plan exists (use harness-planning first)
NOT when the plan needs revision (update the plan first, then resume execution)
NOT when exploring or brainstorming (use harness-brainstorming)
NOT for ad-hoc single-task work that does not follow a plan

Process

Iron Law

Execute the plan as written. If the plan is wrong, stop and fix the plan — do not improvise.

Deviating mid-execution introduces untested assumptions, breaks atomicity, and makes progress untraceable. If a task cannot be completed as written, that is a blocker. Record it and stop.

Argument Resolution

When invoked by autopilot (or with explicit arguments), resolve paths before starting:

Session slug: If

session-slug

argument provided, set

{sessionDir} = .harness/sessions/<session-slug>/

. Pass to

gather_context({ session: "<session-slug>" })

. All state/handoff writes go to

{sessionDir}/

Plan path: If
```
plan-path
```
argument provided, read plan from that path. Otherwise, discover from
```
{sessionDir}/handoff.json
```
(read upstream planning output) or search
```
docs/plans/
```
.

When no arguments are provided (standalone invocation), discover plan from

docs/plans/

or prompt. Global

.harness/

paths used as fallback.

Phase 1: PREPARE — Load State and Verify Prerequisites

Load the plan. If
```
plan-path
```
argument was resolved, read from that path. Otherwise read from
```
docs/plans/
```
. Identify total task count and checkpoints.

Gather context in one call. Use

gather_context

to load all working context:

gather_context({
  path: "<project-root>",
  intent: "Execute plan tasks starting from current position",
  skill: "harness-execution",
  session: "<session-slug-if-known>",
  include: ["state", "learnings", "handoff", "validation"]
})

If session slug is known, include

session

to scope reads/writes to

.harness/sessions/<slug>/

. If unknown, omit it — falls back to

.harness/

. Returns

state

(current position, null = fresh start),

learnings

(prior insights — do not ignore),

handoff

(context from previous skill),

validation

(project health). Failed constituents return null with errors in

meta.errors

Load session summary for cold start. If resuming (session slug known):
- Call
```
listActiveSessions()
```
  to read the session index.
- Call
```
loadSessionSummary()
```
  for the target session.
- If ambiguous, present the index and ask which session to resume.
Check for known dead ends. Review
```
learnings
```
tagged
```
[outcome:failure]
```
. Warn if any match current plan approaches.
Verify prerequisites for the current task:
- Dependency tasks marked complete in state?
- Referenced files exist?
- Test suite passes? Run
```
harness validate
```
  for clean baseline.
If prerequisites fail, do not proceed. Report what is missing and which task is blocked.

Graph-Enhanced Context (when available)

When a knowledge graph exists at

.harness/graph/

```
query_graph
```
— check file overlap between tasks for conflict detection
```
get_impact
```
— understand blast radius before executing a task

Fall back to file-based commands if no graph is available.

Uncertainty Surfacing

When you encounter an unknown during task execution, classify it immediately:

Blocking: Cannot complete the task as written without resolving this (e.g., referenced file doesn't exist, spec behavior undefined for this scenario). STOP. Record as a blocker and report.
Assumption: Can proceed if assumption is stated (e.g., "the API returns JSON, not XML"). Document the assumption in the commit message. If wrong, the task must be revisited.
Deferrable: Does not affect the current task (e.g., whether a later task will need a different approach). Note in learnings for future tasks.

Do not improvise past unknowns. An assumption that turns out wrong is cheaper than an improvised solution that hides the unknown.

Read-only constraint for Phase 1: Phase 1 PREPARE is research and state loading. Do not write production code, create files, or make commits during PREPARE. If prerequisites fail, report the failure — do not attempt to fix prerequisites yourself.

Phase 2: EXECUTE — Implement Tasks Atomically

Report progress with:

**[Phase N/M]** Task N — <description>

For each task, starting from current position:

Read task instructions completely before writing any code.
Follow instructions exactly. The plan contains exact file paths, code, and commands. Execute as written.
TDD rhythm:
- Write the test as specified
- Run test — observe it fail (for the right reason)
- Write the implementation as specified
- Run test — observe it pass
- Run
```
harness validate
```
Commit atomically. One commit per task. Use the plan's commit message, or write a descriptive one.

Run mechanical gate. After each commit, run

assess_project

assess_project({ path: "<project-root>", checks: ["validate", "deps", "lint"], mode: "summary" })

Then run the test suite. Binary pass/fail:

All pass → proceed to next task.
Any fail → retry with error context (max 2 attempts).
Still failing → record in
```
.harness/failures.md
```
, escalate, stop.

Update state after each task. Write to

.harness/state.json

{
  "schemaVersion": 1,
  "position": { "phase": "execute", "task": "Task N" },
  "progress": { "Task 1": "complete", "Task 2": "complete", "Task 3": "in_progress" },
  "lastSession": { "date": "YYYY-MM-DD", "summary": "Completed Tasks 1-2, starting Task 3" }
}

Handle checkpoints per the checkpoint protocol below.

Checkpoint Protocol

Three checkpoint types. Each requires pausing execution.

[checkpoint:human-verify]

— Show and Confirm

Stop. Present via

emit_interaction

emit_interaction({
  path: "<project-root>",
  type: "confirmation",
  confirmation: {
    text: "Task N complete. Output: <summary>. Continue to Task N+1?",
    context: "<test output or diff summary>",
    impact: "Continuing proceeds to next task. Declining pauses for review.",
    risk: "low"
  }
})

Wait for human confirmation.

[checkpoint:decision]

— Present Options and Wait

Stop. Present via

emit_interaction

emit_interaction({
  path: "<project-root>",
  type: "question",
  question: {
    text: "Task N requires a decision: <description>",
    options: [
      { label: "<option A>", pros: ["..."], cons: ["..."], risk: "low", effort: "low" },
      { label: "<option B>", pros: ["..."], cons: ["..."], risk: "medium", effort: "medium" }
    ],
    recommendation: { optionIndex: 0, reason: "<why>", confidence: "medium" }
  }
})

Wait for human choice.

[checkpoint:human-action]

— Instruct and Wait

Stop. Tell the human exactly what to do (e.g., "Create an API key at [URL] and paste it here"). State: "Task N requires your action: [instructions]. Let me know when done." Wait for confirmation.

Phase 3: VERIFY — Two-Tier Validation

Quick gate (default): The mechanical gate in Phase 2 Step 5 IS the standard verification. Every task commit must pass it. No additional step needed for normal execution.

Deep audit (on-demand): When

--deep

is passed or at milestone boundaries, invoke

harness-verification

for 3-level audit:

EXISTS — Do claimed artifacts actually exist?
SUBSTANTIVE — Do they contain meaningful, correct content (not stubs)?
WIRED — Are they integrated (imported, routed, tested, reachable)?

If deep audit fails, treat as blocker. Record and stop.

After all tasks pass:

emit_interaction({
  path: "<project-root>",
  type: "transition",
  transition: {
    completedPhase: "execution",
    suggestedNext: "verification",
    reason: "All plan tasks executed and verified",
    artifacts: ["<created/modified files>"],
    qualityGate: {
      checks: [
        { name: "all-tasks-complete", passed: true, detail: "<N>/<N> tasks" },
        { name: "harness-validate", passed: true },
        { name: "tests-pass", passed: true }
      ],
      allPassed: true
    }
  }
})

Phase 4: PERSIST — Save Progress and Learnings

All session-scoped files use

{sessionDir}/

when session is known, otherwise

.harness/

. Session-scoped files include:

handoff.json

state.json

learnings.md

artifacts.json

Update state with current position, progress, and

lastSession

{ "lastSession": { "lastSkill": "harness-execution", "pendingTasks": ["Task 4", "Task 5"] } }

Graph Refresh: If

.harness/graph/

exists, run

harness scan [path]

after code changes. Skipping causes stale graph query results.

Append tagged learnings to

learnings.md

. Tag every entry:

## YYYY-MM-DD — Task N: <task name>

- [skill:harness-execution] [outcome:success] What was accomplished
- [skill:harness-execution] [outcome:gotcha] What was surprising
- [skill:harness-execution] [outcome:decision] What was decided and why

Record failures in
```
failures.md
```
if any task was escalated after retry exhaustion. Include approach attempted and why it failed.

Write handoff. Write to the session-scoped path when session slug is known, otherwise fall back to global:

Session-scoped (preferred):

.harness/sessions/<session-slug>/handoff.json

Global (fallback, deprecated):
```
.harness/handoff.json
```

[DEPRECATED] Writing to
.harness/handoff.json
is deprecated. In autopilot sessions, always write to
.harness/sessions/<slug>/handoff.json
.

{
  "fromSkill": "harness-execution",
  "timestamp": "YYYY-MM-DDTHH:MM:SSZ",
  "summary": "Completed Tasks 1-3. Task 4 blocked on missing API endpoint.",
  "pendingTasks": ["Task 4", "Task 5"],
  "blockers": ["Task 4: /api/notifications endpoint not implemented"],
  "learnings": ["Date comparison needs UTC normalization"]
}

Write session summary for cold-start restoration via

writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill, phase, status, spec, plan, keyContext, nextStep })

Sync roadmap (mandatory when present). If
```
docs/roadmap.md
```
exists, call
```
manage_roadmap
```
with
```
sync
```
and
```
apply: true
```
. Do not use
```
force_sync: true
```
. If unavailable, fall back to
```
syncRoadmap()
```
from core and warn. If no roadmap, skip silently.
Learnings are append-only. Never edit or delete previous learnings.

Auto-transition to verification. When ALL tasks complete (not mid-plan), call:

emit_interaction({ type: "transition", transition: { completedPhase: "execution", suggestedNext: "verification", requiresConfirmation: false, summary: "<tasks completed summary>", qualityGate: { checks: [{ name: "all-tasks-complete", passed: true }, { name: "harness-validate", passed: true }, { name: "tests-pass", passed: true }, { name: "no-blockers", passed: true }], allPassed: true } } })

Immediately invoke harness-verification without waiting for user input.

Important: Only emit when all tasks complete. If stopped due to blocker/checkpoint/partial completion, write handoff and stop instead.

Stopping Conditions

Non-negotiable. When any condition is met, stop immediately.

Hit a blocker. Task cannot be completed as written. Do not guess or improvise. Record and report: "Blocked on Task N: [issue]. The plan needs to be updated."
Test failure after implementation. Do not retry blindly. Diagnose root cause. Fix if within task scope; otherwise stop.
Unclear instruction. Do not interpret ambiguity. Ask: "Task N says [quote]. I interpret this as [interpretation]. Correct?"
Harness validation failure. Do not proceed. Fix the violation before moving on.
Three consecutive failures. Task design is likely wrong. Report: "Task N failed 3 times. Root cause: [analysis]. Plan may need revision."

Session State

This skill reads/writes session sections via

manage_state

Section	R/W	Purpose
terminology	both	Domain terms for consistent naming; adds terms discovered during implementation
decisions	both	Planning decisions for context; records implementation decisions
constraints	both	Constraints to respect boundaries; adds constraints discovered during coding
risks	both	Risks for awareness; updates status as mitigated or realized
openQuestions	both	Questions for context; resolves questions answered by implementation
evidence	both	Prior evidence; writes file:line citations, test outputs, diff references

Write: After each task, append relevant entries. Write evidence for every significant technical assertion. Mark openQuestions as resolved when answered.

Read: During PREPARE, read all sections via

gather_context

with

include: ["sessions"]

Evidence Requirements

Claims about task completion, test results, or code behavior MUST cite evidence:

File reference:

file:line

format (e.g.,

src/services/notification-service.ts:42

)

Test output: Actual command and output (e.g.,
```
$ npx vitest run ... → PASS (8 tests)
```
)
Diff evidence: Before/after with file path for modifications
Harness output:
```
harness validate
```
output as project health evidence
Session evidence: Write to
```
evidence
```
section via
```
manage_state
```
after each task

When to cite: After every task completion. Every commit claim must be backed by test output or file reference.

Uncited claims: Prefix with

[UNVERIFIED]

. Uncited claims are flagged during review.

Harness Integration

harness validate
— Run after every task. Mandatory. No task complete without passing.
gather_context
— PREPARE phase: load state, learnings, handoff, validation in one call.
harness check-deps
— Run when tasks add new imports/modules.
harness state show
— View current position and progress.
harness state learn "<message>"
— Append a learning from CLI.
State/Learnings files — Session-scoped when session known, otherwise
```
.harness/
```
. State updated after every task; learnings append-only.
Roadmap sync — After plan completion,
```
manage_roadmap sync
```
with
```
apply: true
```
. Mandatory when roadmap exists. No
```
force_sync: true
```
.
emit_interaction
— Auto-transition to harness-verification at plan completion.

Success Criteria

Every task executed in order, atomically, one commit per task
```
.harness/state.json
```
accurately reflects position and progress
```
.harness/learnings.md
```
has entries for sessions with non-trivial discoveries
```
harness validate
```
passes after every task
Checkpoints honored: execution paused at every
```
[checkpoint:*]
```
marker
No improvisation: tasks executed as written, or stopped with blocker reported
All stopping conditions respected

Red Flags

Flag	Corrective Action
"The plan says X but Y would be cleaner — I'll improvise"	STOP. Iron Law: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising introduces untested assumptions.
"I'll skip the test for this task since it's just configuration"	STOP. The TDD rhythm is not optional. Configuration changes need tests too — they prove the config does what the task requires.
"I'll handle this edge case the plan didn't mention"	STOP. Unplanned work is scope creep. If the edge case matters, it's a plan deficiency — record it as a blocker.
`// TODO: come back to this` or `// skipped for now` in committed code	STOP. Every commit must be atomic and complete for its task. TODOs in committed code are incomplete tasks disguised as progress.

Rationalizations to Reject

Rationalization	Reality
"The plan says to do X, but doing Y would be cleaner -- I will improvise"	The Iron Law states: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising mid-execution introduces untested assumptions.
"This task depends on Task 3 which I know is done, so I can skip verifying prerequisites"	Prerequisites must be verified mechanically, not from memory. Check that dependency tasks are marked complete in state and that referenced files exist.
"The checkpoint is just a confirmation step and the output looks correct, so I will auto-continue"	Checkpoints are non-negotiable pause points. If a task has a checkpoint marker, execution must pause.
"Harness validate passed on the previous task and nothing changed structurally, so I can skip it for this one"	Validation runs after every task with no exceptions. Each task may introduce subtle architectural drift that only harness validate catches.
"The task failed but I can see the fix — I'll apply it and move on without recording a blocker"	A failed task is a blocker. Record it, report it, and stop. Applying unplanned fixes mid-execution makes progress untraceable and may cascade into later tasks.
"Phase 1 prerequisites are missing but I can create them as part of this task"	PREPARE is read-only. Missing prerequisites mean a prior task or the plan is deficient. Report the gap — do not fix prerequisites during execution setup.

Examples

Example: Executing a 5-Task Notification Plan

Session Start (fresh):

Read plan: docs/plans/2026-03-14-notifications-plan.md (5 tasks)
Read state: .harness/state.json — not found (fresh start, Task 1)
Read learnings: .harness/learnings.md — not found
Run: harness validate — passes. Clean baseline.

Task 1: Define notification types

1. Create src/types/notification.ts with Notification interface
2. harness validate — passes
3. Commit: "feat(notifications): define Notification type"
4. Update state: { position: Task 2, progress: { "Task 1": "complete" } }

Task 2: Create notification service (TDD)

1. Write test: src/services/notification-service.test.ts
2. Run test: FAIL — NotificationService not defined (correct)
3. Implement: src/services/notification-service.ts
4. Run test: PASS
5. harness validate — passes
6. Commit: "feat(notifications): add NotificationService.create"
7. Update state: { position: Task 3, Tasks 1-2 complete }

Task 3: Add list and expiry (TDD) — has checkpoint

[checkpoint:human-verify] — "Tasks 1-2 complete. Tests pass. Continue to Task 3?"
Human: "Continue."
1. Write tests: list by userId, filter expired
2. Run tests: FAIL (not implemented)
3. Implement list() and isExpired()
4. Run tests: PASS
5. harness validate — passes
6. Commit: "feat(notifications): add list and expiry"
7. Append learning: [gotcha] Date comparison needed UTC normalization

Context reset (resume at Task 4):

Read state: position Task 4, Tasks 1-3 complete
Read learnings: "Date comparison needed UTC normalization"
harness validate — passes. Resume Task 4.

Gates

Hard stops. Violating any gate means the process has broken down.

Phase 1 PREPARE is read-only. Do not write production code, create files, or commit during preparation. If prerequisites are missing, report the gap — do not fix it yourself.
No execution without a plan. If no plan exists, do not start. Use harness-planning.
No improvisation. Execute as written. Do not add "improvements" not in the plan.
No skipping tasks. Tasks are dependency-ordered. Execute in order.
No skipping validation.
```
harness validate
```
after every task. No exceptions.
No ignoring checkpoints.
```
[checkpoint:*]
```
markers require pausing. No auto-continue.
No guessing past blockers. Cannot complete as written? Stop. Report. Do not invent workarounds.
State must be updated. After every task, state must reflect new position.

Escalation

Task fails, fix outside scope: "Task N failed because [reason]. Fix requires changes to [outside scope]. Plan needs updating at Tasks [X, Y]."
Plan references missing files: "Task N references [file] which does not exist. Plan may need regeneration."
Tests pass but behavior seems wrong: "Task N passes all tests, but I notice [observation]. Should I investigate?"
State corrupted: If state says Task 5 complete but code missing, report inconsistency. Re-verify from Task 1 if needed.
Human wants to skip ahead: "Skipping Task N means Tasks [X, Y] may fail. Update the plan to remove the dependency?" Get explicit approval.

Trace Output (Optional)

When

.harness/gate.json

has

"trace": true

--verbose

is passed, append to

.harness/trace.md

**[PREPARE 14:32:07]** Loaded plan with 5 tasks, resuming from Task 3.
**[EXECUTE 14:32:15]** Task 3 committed; gate passed first attempt.
**[VERIFY 14:35:42]** Deep audit at milestone; all 3 levels passed.
**[PERSIST 14:35:50]** State updated, handoff written with 2 pending tasks.

For human debugging only. Not required for normal execution.