git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-execution" ~/.claude/skills/intense-visions-harness-engineering-harness-execution-087981 && rm -rf "$T"
agents/skills/claude-code/harness-execution/SKILL.mdHarness Execution
Execute a plan task by task with atomic commits, checkpoint protocol, and persistent knowledge capture. Stop on blockers. Do not guess.
When to Use
- When an approved plan exists (output of harness-planning) and implementation should begin
- When resuming execution of a previously started plan after a context reset
- When
oron_new_feature
triggers fire and a plan is already in placeon_bug_fix - NOT when no plan exists (use harness-planning first)
- NOT when the plan needs revision (update the plan first, then resume execution)
- NOT when exploring or brainstorming (use harness-brainstorming)
- NOT for ad-hoc single-task work that does not follow a plan
Process
Iron Law
Execute the plan as written. If the plan is wrong, stop and fix the plan — do not improvise.
Deviating mid-execution introduces untested assumptions, breaks atomicity, and makes progress untraceable. If a task cannot be completed as written, that is a blocker. Record it and stop.
Argument Resolution
When invoked by autopilot (or with explicit arguments), resolve paths before starting:
- Session slug: If
argument provided, setsession-slug
. Pass to{sessionDir} = .harness/sessions/<session-slug>/
. All state/handoff writes go togather_context({ session: "<session-slug>" })
.{sessionDir}/ - Plan path: If
argument provided, read plan from that path. Otherwise, discover fromplan-path
(read upstream planning output) or search{sessionDir}/handoff.json
.docs/plans/
When no arguments are provided (standalone invocation), discover plan from
docs/plans/ or prompt. Global .harness/ paths used as fallback.
Phase 1: PREPARE — Load State and Verify Prerequisites
-
Load the plan. If
argument was resolved, read from that path. Otherwise read fromplan-path
. Identify total task count and checkpoints.docs/plans/ -
Gather context in one call. Use
to load all working context:gather_contextgather_context({ path: "<project-root>", intent: "Execute plan tasks starting from current position", skill: "harness-execution", session: "<session-slug-if-known>", include: ["state", "learnings", "handoff", "validation"] })If session slug is known, include
to scope reads/writes tosession
. If unknown, omit it — falls back to.harness/sessions/<slug>/
. Returns.harness/
(current position, null = fresh start),state
(prior insights — do not ignore),learnings
(context from previous skill),handoff
(project health). Failed constituents return null with errors invalidation
.meta.errors -
Load session summary for cold start. If resuming (session slug known):
- Call
to read the session index.listActiveSessions() - Call
for the target session.loadSessionSummary() - If ambiguous, present the index and ask which session to resume.
- Call
-
Check for known dead ends. Review
taggedlearnings
. Warn if any match current plan approaches.[outcome:failure] -
Verify prerequisites for the current task:
- Dependency tasks marked complete in state?
- Referenced files exist?
- Test suite passes? Run
for clean baseline.harness validate
-
If prerequisites fail, do not proceed. Report what is missing and which task is blocked.
Graph-Enhanced Context (when available)
When a knowledge graph exists at
.harness/graph/:
— check file overlap between tasks for conflict detectionquery_graph
— understand blast radius before executing a taskget_impact
Fall back to file-based commands if no graph is available.
Uncertainty Surfacing
When you encounter an unknown during task execution, classify it immediately:
- Blocking: Cannot complete the task as written without resolving this (e.g., referenced file doesn't exist, spec behavior undefined for this scenario). STOP. Record as a blocker and report.
- Assumption: Can proceed if assumption is stated (e.g., "the API returns JSON, not XML"). Document the assumption in the commit message. If wrong, the task must be revisited.
- Deferrable: Does not affect the current task (e.g., whether a later task will need a different approach). Note in learnings for future tasks.
Do not improvise past unknowns. An assumption that turns out wrong is cheaper than an improvised solution that hides the unknown.
Read-only constraint for Phase 1: Phase 1 PREPARE is research and state loading. Do not write production code, create files, or make commits during PREPARE. If prerequisites fail, report the failure — do not attempt to fix prerequisites yourself.
Phase 2: EXECUTE — Implement Tasks Atomically
Report progress with:
**[Phase N/M]** Task N — <description>
For each task, starting from current position:
-
Read task instructions completely before writing any code.
-
Follow instructions exactly. The plan contains exact file paths, code, and commands. Execute as written.
-
TDD rhythm:
- Write the test as specified
- Run test — observe it fail (for the right reason)
- Write the implementation as specified
- Run test — observe it pass
- Run
harness validate
-
Commit atomically. One commit per task. Use the plan's commit message, or write a descriptive one.
-
Run mechanical gate. After each commit, run
:assess_projectassess_project({ path: "<project-root>", checks: ["validate", "deps", "lint"], mode: "summary" })Then run the test suite. Binary pass/fail:
- All pass → proceed to next task.
- Any fail → retry with error context (max 2 attempts).
- Still failing → record in
, escalate, stop..harness/failures.md
-
Update state after each task. Write to
:.harness/state.json{ "schemaVersion": 1, "position": { "phase": "execute", "task": "Task N" }, "progress": { "Task 1": "complete", "Task 2": "complete", "Task 3": "in_progress" }, "lastSession": { "date": "YYYY-MM-DD", "summary": "Completed Tasks 1-2, starting Task 3" } } -
Handle checkpoints per the checkpoint protocol below.
Checkpoint Protocol
Three checkpoint types. Each requires pausing execution.
— Show and Confirm[checkpoint:human-verify]
Stop. Present via
emit_interaction:
emit_interaction({ path: "<project-root>", type: "confirmation", confirmation: { text: "Task N complete. Output: <summary>. Continue to Task N+1?", context: "<test output or diff summary>", impact: "Continuing proceeds to next task. Declining pauses for review.", risk: "low" } })
Wait for human confirmation.
— Present Options and Wait[checkpoint:decision]
Stop. Present via
emit_interaction:
emit_interaction({ path: "<project-root>", type: "question", question: { text: "Task N requires a decision: <description>", options: [ { label: "<option A>", pros: ["..."], cons: ["..."], risk: "low", effort: "low" }, { label: "<option B>", pros: ["..."], cons: ["..."], risk: "medium", effort: "medium" } ], recommendation: { optionIndex: 0, reason: "<why>", confidence: "medium" } } })
Wait for human choice.
— Instruct and Wait[checkpoint:human-action]
Stop. Tell the human exactly what to do (e.g., "Create an API key at [URL] and paste it here"). State: "Task N requires your action: [instructions]. Let me know when done." Wait for confirmation.
Phase 3: VERIFY — Two-Tier Validation
Quick gate (default): The mechanical gate in Phase 2 Step 5 IS the standard verification. Every task commit must pass it. No additional step needed for normal execution.
Deep audit (on-demand): When
--deep is passed or at milestone boundaries, invoke harness-verification for 3-level audit:
- EXISTS — Do claimed artifacts actually exist?
- SUBSTANTIVE — Do they contain meaningful, correct content (not stubs)?
- WIRED — Are they integrated (imported, routed, tested, reachable)?
If deep audit fails, treat as blocker. Record and stop.
After all tasks pass:
emit_interaction({ path: "<project-root>", type: "transition", transition: { completedPhase: "execution", suggestedNext: "verification", reason: "All plan tasks executed and verified", artifacts: ["<created/modified files>"], qualityGate: { checks: [ { name: "all-tasks-complete", passed: true, detail: "<N>/<N> tasks" }, { name: "harness-validate", passed: true }, { name: "tests-pass", passed: true } ], allPassed: true } } })
Phase 4: PERSIST — Save Progress and Learnings
All session-scoped files use
{sessionDir}/ when session is known, otherwise .harness/. Session-scoped files include: handoff.json, state.json, learnings.md, artifacts.json.
-
Update state with current position, progress, and
:lastSession{ "lastSession": { "lastSkill": "harness-execution", "pendingTasks": ["Task 4", "Task 5"] } }
Graph Refresh: If
.harness/graph/ exists, run harness scan [path] after code changes. Skipping causes stale graph query results.
-
Append tagged learnings to
. Tag every entry:learnings.md## YYYY-MM-DD — Task N: <task name> - [skill:harness-execution] [outcome:success] What was accomplished - [skill:harness-execution] [outcome:gotcha] What was surprising - [skill:harness-execution] [outcome:decision] What was decided and why -
Record failures in
if any task was escalated after retry exhaustion. Include approach attempted and why it failed.failures.md -
Write handoff. Write to the session-scoped path when session slug is known, otherwise fall back to global:
- Session-scoped (preferred):
.harness/sessions/<session-slug>/handoff.json - Global (fallback, deprecated):
.harness/handoff.json
[DEPRECATED] Writing to
is deprecated. In autopilot sessions, always write to.harness/handoff.json
..harness/sessions/<slug>/handoff.json{ "fromSkill": "harness-execution", "timestamp": "YYYY-MM-DDTHH:MM:SSZ", "summary": "Completed Tasks 1-3. Task 4 blocked on missing API endpoint.", "pendingTasks": ["Task 4", "Task 5"], "blockers": ["Task 4: /api/notifications endpoint not implemented"], "learnings": ["Date comparison needs UTC normalization"] } - Session-scoped (preferred):
-
Write session summary for cold-start restoration via
.writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill, phase, status, spec, plan, keyContext, nextStep }) -
Sync roadmap (mandatory when present). If
exists, calldocs/roadmap.md
withmanage_roadmap
andsync
. Do not useapply: true
. If unavailable, fall back toforce_sync: true
from core and warn. If no roadmap, skip silently.syncRoadmap() -
Learnings are append-only. Never edit or delete previous learnings.
-
Auto-transition to verification. When ALL tasks complete (not mid-plan), call:
emit_interaction({ type: "transition", transition: { completedPhase: "execution", suggestedNext: "verification", requiresConfirmation: false, summary: "<tasks completed summary>", qualityGate: { checks: [{ name: "all-tasks-complete", passed: true }, { name: "harness-validate", passed: true }, { name: "tests-pass", passed: true }, { name: "no-blockers", passed: true }], allPassed: true } } })Immediately invoke harness-verification without waiting for user input.
Important: Only emit when all tasks complete. If stopped due to blocker/checkpoint/partial completion, write handoff and stop instead.
Stopping Conditions
Non-negotiable. When any condition is met, stop immediately.
- Hit a blocker. Task cannot be completed as written. Do not guess or improvise. Record and report: "Blocked on Task N: [issue]. The plan needs to be updated."
- Test failure after implementation. Do not retry blindly. Diagnose root cause. Fix if within task scope; otherwise stop.
- Unclear instruction. Do not interpret ambiguity. Ask: "Task N says [quote]. I interpret this as [interpretation]. Correct?"
- Harness validation failure. Do not proceed. Fix the violation before moving on.
- Three consecutive failures. Task design is likely wrong. Report: "Task N failed 3 times. Root cause: [analysis]. Plan may need revision."
Session State
This skill reads/writes session sections via
manage_state:
| Section | R/W | Purpose |
|---|---|---|
| terminology | both | Domain terms for consistent naming; adds terms discovered during implementation |
| decisions | both | Planning decisions for context; records implementation decisions |
| constraints | both | Constraints to respect boundaries; adds constraints discovered during coding |
| risks | both | Risks for awareness; updates status as mitigated or realized |
| openQuestions | both | Questions for context; resolves questions answered by implementation |
| evidence | both | Prior evidence; writes file:line citations, test outputs, diff references |
Write: After each task, append relevant entries. Write evidence for every significant technical assertion. Mark openQuestions as resolved when answered.
Read: During PREPARE, read all sections via
gather_context with include: ["sessions"].
Evidence Requirements
Claims about task completion, test results, or code behavior MUST cite evidence:
- File reference:
format (e.g.,file:line
)src/services/notification-service.ts:42 - Test output: Actual command and output (e.g.,
)$ npx vitest run ... → PASS (8 tests) - Diff evidence: Before/after with file path for modifications
- Harness output:
output as project health evidenceharness validate - Session evidence: Write to
section viaevidence
after each taskmanage_state
When to cite: After every task completion. Every commit claim must be backed by test output or file reference.
Uncited claims: Prefix with
[UNVERIFIED]. Uncited claims are flagged during review.
Harness Integration
— Run after every task. Mandatory. No task complete without passing.harness validate
— PREPARE phase: load state, learnings, handoff, validation in one call.gather_context
— Run when tasks add new imports/modules.harness check-deps
— View current position and progress.harness state show
— Append a learning from CLI.harness state learn "<message>"- State/Learnings files — Session-scoped when session known, otherwise
. State updated after every task; learnings append-only..harness/ - Roadmap sync — After plan completion,
withmanage_roadmap sync
. Mandatory when roadmap exists. Noapply: true
.force_sync: true
— Auto-transition to harness-verification at plan completion.emit_interaction
Success Criteria
- Every task executed in order, atomically, one commit per task
accurately reflects position and progress.harness/state.json
has entries for sessions with non-trivial discoveries.harness/learnings.md
passes after every taskharness validate- Checkpoints honored: execution paused at every
marker[checkpoint:*] - No improvisation: tasks executed as written, or stopped with blocker reported
- All stopping conditions respected
Red Flags
| Flag | Corrective Action |
|---|---|
| "The plan says X but Y would be cleaner — I'll improvise" | STOP. Iron Law: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising introduces untested assumptions. |
| "I'll skip the test for this task since it's just configuration" | STOP. The TDD rhythm is not optional. Configuration changes need tests too — they prove the config does what the task requires. |
| "I'll handle this edge case the plan didn't mention" | STOP. Unplanned work is scope creep. If the edge case matters, it's a plan deficiency — record it as a blocker. |
or in committed code | STOP. Every commit must be atomic and complete for its task. TODOs in committed code are incomplete tasks disguised as progress. |
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
| "The plan says to do X, but doing Y would be cleaner -- I will improvise" | The Iron Law states: execute the plan as written. If the plan is wrong, stop and fix the plan. Improvising mid-execution introduces untested assumptions. |
| "This task depends on Task 3 which I know is done, so I can skip verifying prerequisites" | Prerequisites must be verified mechanically, not from memory. Check that dependency tasks are marked complete in state and that referenced files exist. |
| "The checkpoint is just a confirmation step and the output looks correct, so I will auto-continue" | Checkpoints are non-negotiable pause points. If a task has a checkpoint marker, execution must pause. |
| "Harness validate passed on the previous task and nothing changed structurally, so I can skip it for this one" | Validation runs after every task with no exceptions. Each task may introduce subtle architectural drift that only harness validate catches. |
| "The task failed but I can see the fix — I'll apply it and move on without recording a blocker" | A failed task is a blocker. Record it, report it, and stop. Applying unplanned fixes mid-execution makes progress untraceable and may cascade into later tasks. |
| "Phase 1 prerequisites are missing but I can create them as part of this task" | PREPARE is read-only. Missing prerequisites mean a prior task or the plan is deficient. Report the gap — do not fix prerequisites during execution setup. |
Examples
Example: Executing a 5-Task Notification Plan
Session Start (fresh):
Read plan: docs/plans/2026-03-14-notifications-plan.md (5 tasks) Read state: .harness/state.json — not found (fresh start, Task 1) Read learnings: .harness/learnings.md — not found Run: harness validate — passes. Clean baseline.
Task 1: Define notification types
1. Create src/types/notification.ts with Notification interface 2. harness validate — passes 3. Commit: "feat(notifications): define Notification type" 4. Update state: { position: Task 2, progress: { "Task 1": "complete" } }
Task 2: Create notification service (TDD)
1. Write test: src/services/notification-service.test.ts 2. Run test: FAIL — NotificationService not defined (correct) 3. Implement: src/services/notification-service.ts 4. Run test: PASS 5. harness validate — passes 6. Commit: "feat(notifications): add NotificationService.create" 7. Update state: { position: Task 3, Tasks 1-2 complete }
Task 3: Add list and expiry (TDD) — has checkpoint
[checkpoint:human-verify] — "Tasks 1-2 complete. Tests pass. Continue to Task 3?" Human: "Continue." 1. Write tests: list by userId, filter expired 2. Run tests: FAIL (not implemented) 3. Implement list() and isExpired() 4. Run tests: PASS 5. harness validate — passes 6. Commit: "feat(notifications): add list and expiry" 7. Append learning: [gotcha] Date comparison needed UTC normalization
Context reset (resume at Task 4):
Read state: position Task 4, Tasks 1-3 complete Read learnings: "Date comparison needed UTC normalization" harness validate — passes. Resume Task 4.
Gates
Hard stops. Violating any gate means the process has broken down.
- Phase 1 PREPARE is read-only. Do not write production code, create files, or commit during preparation. If prerequisites are missing, report the gap — do not fix it yourself.
- No execution without a plan. If no plan exists, do not start. Use harness-planning.
- No improvisation. Execute as written. Do not add "improvements" not in the plan.
- No skipping tasks. Tasks are dependency-ordered. Execute in order.
- No skipping validation.
after every task. No exceptions.harness validate - No ignoring checkpoints.
markers require pausing. No auto-continue.[checkpoint:*] - No guessing past blockers. Cannot complete as written? Stop. Report. Do not invent workarounds.
- State must be updated. After every task, state must reflect new position.
Escalation
- Task fails, fix outside scope: "Task N failed because [reason]. Fix requires changes to [outside scope]. Plan needs updating at Tasks [X, Y]."
- Plan references missing files: "Task N references [file] which does not exist. Plan may need regeneration."
- Tests pass but behavior seems wrong: "Task N passes all tests, but I notice [observation]. Should I investigate?"
- State corrupted: If state says Task 5 complete but code missing, report inconsistency. Re-verify from Task 1 if needed.
- Human wants to skip ahead: "Skipping Task N means Tasks [X, Y] may fail. Update the plan to remove the dependency?" Get explicit approval.
Trace Output (Optional)
When
.harness/gate.json has "trace": true or --verbose is passed, append to .harness/trace.md:
**[PREPARE 14:32:07]** Loaded plan with 5 tasks, resuming from Task 3. **[EXECUTE 14:32:15]** Task 3 committed; gate passed first attempt. **[VERIFY 14:35:42]** Deep audit at milestone; all 3 levels passed. **[PERSIST 14:35:50]** State updated, handoff written with 2 pending tasks.
For human debugging only. Not required for normal execution.