git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-autopilot" ~/.claude/skills/intense-visions-harness-engineering-harness-autopilot-15acd5 && rm -rf "$T"
agents/skills/claude-code/harness-autopilot/SKILL.mdHarness Autopilot
Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.
When to Use
- After a multi-phase spec is approved and you want automated phase execution
- When a project has 2+ implementation phases requiring repeated skill invocations
- NOT for single-phase work (use harness-execution directly)
- NOT when the spec is not yet approved (use harness-brainstorming first)
- NOT for CI/headless execution (conversational skill)
Persona Agents
| Skill | | State(s) |
|---|---|---|
| harness-planning | | PLAN |
| harness-execution | | EXECUTE |
| harness-verification | | VERIFY |
| harness-code-review | | REVIEW, FINAL_REVIEW |
Iron Law: Autopilot delegates, never reimplements. If writing plan/execute/verify/review logic, STOP — delegate via
subagent_type. Always use dedicated persona agents, never general-purpose agents.
Rigor Levels
Set at INIT (
--fast / --thorough); persists for session. Default: standard.
| State | | | |
|---|---|---|---|
| PLAN | Skip skeleton pass | Default | Always skeleton with approval |
| APPROVE_PLAN | Auto-approve, skip signals | Signal-based | Force human review |
| EXECUTE | Skip scratchpad | Scratchpad >500 words | Verbose scratchpad |
| VERIFY | only | Full pipeline | Expanded checks |
State Machine
INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE │ [next phase?] │ │ ASSESS FINAL_REVIEW → DONE
INIT
- Resolve spec path (argument or prompt).
- Derive session slug: strip
, dropdocs/
, replace.md
and/
with.
, lowercase. Set--
.sessionDir = .harness/sessions/<slug>/ - Check for existing state: read
. If present and not DONE: report "Resuming from{sessionDir}/autopilot-state.json
, phase {N}: {name}." Apply schema migration if{currentState}
(backfill missing fields). Jump to recorded state.schemaVersion < 5 - Fresh start: read spec, parse
for phases (## Implementation Order
+### Phase N: Name
, default:<!-- complexity: low|medium|high -->
). Capturemedium
viastartingCommit
. Writegit rev-parse HEAD
(schemaVersion: 5, currentState: "ASSESS", currentPhase: 0).autopilot-state.json - Flags:
→--fast
.rigorLevel: "fast"
→--thorough
.rigorLevel: "thorough"
→--review-plans
. Both flags together → reject with error.reviewPlans: true - Call
.gather_context({ path, skill: "harness-autopilot", session: slug, include: ["state", "learnings", "handoff", "validation"] }) - → ASSESS.
ASSESS
- Read current phase at
.currentPhase - If
set and file exists: → APPROVE_PLAN.planPath - Complexity routing:
/low
: auto-plan via harness-planner → PLAN.medium
: pause. Instruct: "Runhigh
interactively, then re-invoke/harness:planning
." Wait for re-invocation./harness:autopilot
- Update
.currentState: "PLAN"
PLAN
Auto-plan (low/medium): Dispatch harness-planner:
subagent_type: "harness-planner" prompt: "Phase {N}: {name}. Spec: {specPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Follow harness-planning. Write plan to docs/plans/. Write {sessionDir}/handoff.json when done."
On return: read
planPath from {sessionDir}/handoff.json. Complexity override check: low + tasks>10 or checkpoints>3 → "medium"; tasks>20 or checkpoints>6 → "high". Update state planPath. → APPROVE_PLAN.
Interactive plan (high): Check for plan file at
docs/plans/*{phase-name}* or planPath in handoff. If found: update planPath → APPROVE_PLAN. If not: remind and wait.
APPROVE_PLAN
- Gather: task count, checkpoint count, concerns from
(default{sessionDir}/handoff.json
).[]
→ auto-approve, record"fast"
, → EXECUTE."auto_approved_plan_fast"
→ force"thorough"
.shouldPauseForReview = true- Signals (any true → pause; all false → auto-approve):
reviewPlans: truephase.complexity === "high"phase.complexityOverride !== null- Handoff
non-emptyconcerns - Task count > 15
- Auto-approve: emit report (mode, complexity, concerns, task count). Record decision with signal snapshot in
. → EXECUTE.decisions[] - Pause: show triggered signals. Ask "Approve? (yes / revise / skip phase / stop)." Record decision. Route accordingly.
EXECUTE
Dispatch harness-task-executor:
subagent_type: "harness-task-executor" prompt: "Phase {N}: {name}. Plan: {planPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Update {sessionDir}/state.json per task. Write {sessionDir}/handoff.json when done or blocked."
Checkpoints:
[checkpoint:human-verify] → show output, confirm, resume. [checkpoint:decision] → present options, record choice, resume. [checkpoint:human-action] → instruct user, wait for confirmation, resume. After each passing checkpoint: commitAtCheckpoint().
Outcome: All tasks complete → VERIFY. Task fails → retry logic:
- Attempt 1: read error, apply obvious fix, re-dispatch for failed task.
- Attempt 2: expand context — read related files, check
, re-dispatch.learnings.md - Attempt 3: full context — test output, imports, plan instructions, re-dispatch.
- Budget exhausted: recovery commit (
prefix in message), record in[autopilot][recovery]
. Ask: "fix manually and continue / revise plan / stop.".harness/failures.md
VERIFY
: run"fast"
. Pass → REVIEW. Fail → surface to user.harness validate
/"standard"
: dispatch harness-verifier:"thorough"
Pass → REVIEW. Fail → ask "fix / skip verification / stop."subagent_type: "harness-verifier" prompt: "Phase {N}: {name}. Session: {sessionSlug}. Rigor: {rigorLevel}. Verify and report pass/fail with findings."
: re-enter EXECUTE (retry budget resets).fix
REVIEW
Dispatch harness-code-reviewer:
subagent_type: "harness-code-reviewer" prompt: "Phase {N}: {name}. Session: {sessionSlug}. Follow harness-code-review. Report findings (critical / important / suggestion)."
Persist findings to
{sessionDir}/phase-{N}-review.json. No blocking → PHASE_COMPLETE. Blocking → ask "fix / override / stop." fix: re-enter EXECUTE. override: record decision in decisions[] → PHASE_COMPLETE.
PHASE_COMPLETE
- Present summary: name, tasks completed, retries used, verification result, review findings count, elapsed time.
- Record in
: phase index, name, startedAt, completedAt, tasksCompleted, retriesUsed, verificationPassed, reviewFindings.history[] - Mark phase
in state. Clear scratchpad:complete
.clearScratchpad({ session, phase, projectPath }) - Sync roadmap:
(skip if no roadmap; nevermanage_roadmap sync apply:true
).force_sync: true - Write session summary:
.writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill: "harness-autopilot", phase, status, spec, plan, keyContext, nextStep }) - More phases: "Phase {N} complete. Next: {N+1}: {name} ({complexity}). Continue? (yes / stop)."
→ incrementyes
, resetcurrentPhase
, → ASSESS.retryBudget
→ save and exit.stop - No more phases: → FINAL_REVIEW.
FINAL_REVIEW
- Set
,currentState: "FINAL_REVIEW"
.finalReview.status: "in_progress" - Gather per-phase findings from
files.{sessionDir}/phase-{N}-review.json - Dispatch harness-code-reviewer:
subagent_type: "harness-code-reviewer" prompt: "Final cross-phase review. Diff: git diff {startingCommit}..HEAD. Session: {sessionSlug}. Prior findings: {collected}. Focus on cross-phase coherence: naming, duplicated utilities, architectural drift. Report findings (critical / important / suggestion)." - No blocking: store in
, setfinalReview.findings
→ DONE."passed" - Blocking: ask "fix / override / stop."
: incrementfix
(max 3). Dispatch harness-task-executor: "Fix these blocking findings: {findings with file, line, title}. Session: {sessionSlug}. Commit each fix atomically." RunfinalReview.retryCount
. Re-run FINAL_REVIEW from step 1. If retryCount > 3: stop, record inharness validate
..harness/failures.md
: record rationale inoverride
. Setdecisions[]
→ DONE."overridden"
: save state and exit (resumable).stop
DONE
- Present: total phases, tasks, retries, time,
+ findings count, any overridden findings.finalReview.status - Ask "Create a PR? (yes / no)."
- Write final handoff to
. Append learnings to{sessionDir}/handoff.json
. Call.harness/learnings.md
. If learnings count > 30, suggestpromoteSessionLearnings(projectPath, sessionSlug)
.harness learnings prune - If
exists: calldocs/roadmap.md
to set feature done. Skip if not found.manage_roadmap update - Write final
. SetwriteSessionSummary()
in autopilot-state.json.currentState: "DONE"
Process
- INIT — Resolve spec, derive session slug, check for existing state, parse phases.
- ASSESS — Route by complexity: low/medium auto-plans, high pauses for interactive planning.
- PLAN → APPROVE — Dispatch harness-planner, check approval signals, auto-approve or pause.
- EXECUTE — Dispatch harness-task-executor with plan path, handle checkpoints and retries (max 3).
- VERIFY → REVIEW — Dispatch harness-verifier and harness-code-reviewer, fix blocking findings.
- PHASE_COMPLETE — Summarize, sync roadmap, loop to ASSESS for next phase or proceed to FINAL_REVIEW.
- FINAL_REVIEW → DONE — Cross-phase review, offer PR creation, write final handoff.
Harness Integration
- State:
(orchestration) +{sessionDir}/autopilot-state.json
(task-level, written by harness-execution).{sessionDir}/state.json - Handoff:
— written by each delegated skill, read by next. Autopilot writes final handoff at DONE.{sessionDir}/handoff.json - Checkpoint commits:
after passing checkpoints. Recovery commits usecommitAtCheckpoint()
prefix.[autopilot][recovery] - Scratchpad: cleared at PHASE_COMPLETE via
. Skipped atclearScratchpad()
.rigorLevel: "fast"
Gates
- No reimplementing delegated skills. Writing planning/execution/verification/review logic → STOP. Delegate via
.subagent_type - No executing without plan approval. Every plan passes APPROVE_PLAN. No exceptions.
- No skipping VERIFY or REVIEW. Human can override findings; steps cannot be skipped.
- No infinite retries. EXECUTE budget: 3 attempts. FINAL_REVIEW: 3 cycles. If exhausted, stop and surface.
- No modifying state files manually. If corrupted, start fresh.
Escalation
- Spec missing Implementation Order: Cannot identify phases. Ask user to add phase annotations or provide roadmap.
- Delegated skill fails to produce output: Check
. Report and ask: retry or stop.{sessionDir}/handoff.json - User wants to reorder phases mid-run: Update
(mark skipped, adjustphases[]
). Do not re-run completed phases.currentPhase - Context limits approaching: Persist state immediately. "State saved. Re-invoke
to continue."/harness:autopilot - 2 consecutive phase failures: Suggest reviewing spec for systemic issues.
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
| "Low complexity means I can skip APPROVE_PLAN" | Low complexity means auto-approval only when no signals fire. Signals override complexity. |
| "I can inline planning logic instead of dispatching to harness-planner" | Iron Law. Autopilot delegates, never reimplements. No exceptions. |
| "Retry budget exhausted but one more approach might work" | 3-attempt budget prevents compounding failure. Exceeding it without human input is unrecoverable. |
| "Keeping research in conversation is faster than scratchpad" | Scratchpad gated by rigor level. At standard/thorough, >500 words must go to scratchpad. |
| "Plan auto-approved, so I can skip recording the decision" | Every approval—auto or manual—is recorded in . That array is the audit trail. |
Success Criteria
- All phases in the spec are executed in order with plan → execute → verify → review per phase
- Every plan approval is recorded in
(auto or manual)decisions[] - Retry budget (3 attempts) is enforced — exhausted retries surface to user, never silently continue
- FINAL_REVIEW runs on
diff and catches cross-phase coherence issuesstartingCommit..HEAD - State is persisted to
after every state transition — re-invocation resumes correctlyautopilot-state.json
passes after every phaseharness validate
Examples
Invocation:
/harness:autopilot docs/changes/security-scanner/proposal.md
INIT: 3 phases found: Phase 1: Core Scanner (low), Phase 2: Rule Engine (high), Phase 3: CLI Integration (low).
Phase 1 — ASSESS → PLAN: harness-planner dispatched. Returns plan:
docs/plans/2026-03-19-core-scanner-plan.md (8 tasks).
Phase 1 — APPROVE_PLAN (auto): All signals false. "Auto-approved Phase 1: Core Scanner | auto | low | no concerns | 8 tasks."
Phase 1 — EXECUTE: harness-task-executor dispatched with plan path + session. 8 tasks complete. 2 checkpoint commits.
Phase 1 — VERIFY: harness-verifier dispatched. Pass. REVIEW: harness-code-reviewer. 0 blocking, 2 notes.
Phase 1 — PHASE_COMPLETE: "Phase 1 complete. Next: Phase 2: Rule Engine (high). Continue? → yes"
Phase 2 — ASSESS: High complexity. "Run
/harness:planning interactively, then re-invoke." [User plans interactively. Re-invokes.]
INIT (resume): "Resuming from PLAN, phase 2: Rule Engine. Found plan: docs/plans/2026-03-19-rule-engine-plan.md"
Phase 2 — APPROVE_PLAN (paused): Complexity: high triggered. "Approve? → yes" EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE. 14 tasks, 1 retry.
Phase 3: auto-plans and executes. FINAL_REVIEW: harness-code-reviewer on
startingCommit..HEAD. 0 blocking, 1 warning. Passed.
DONE: 3 phases, 30 tasks, 1 retry. "Create PR? → yes"
Retry exhaustion (during any phase):
Task 4 fails → Retry 1/3: obvious fix applied, still fails → Retry 2/3: expanded context (related files + learnings), still fails → Retry 3/3: full context gather (test output + imports + plan), still fails Budget exhausted. Recorded in .harness/failures.md. Fix manually and continue / revise plan / stop?