git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-planning" ~/.claude/skills/intense-visions-harness-engineering-harness-planning-e6eced && rm -rf "$T"
agents/skills/claude-code/harness-planning/SKILL.mdHarness Planning
Implementation planning with atomic tasks, goal-backward must-haves, and complete executable instructions. Every task fits in one context window.
When to Use
- After a design spec is approved (output of harness-brainstorming) and implementation needs planning
- When starting a new feature or project needing structured task decomposition
- When
oron_new_feature
triggers fire and the work is non-trivialon_project_init - When resuming a stalled project that needs a fresh plan
- NOT for small tasks (under 15 minutes, single file — just do it)
- NOT for problem exploration (use harness-brainstorming)
- NOT when a plan exists and needs execution (use harness-execution)
Process
Iron Law
Every task in the plan must be completable in one context window (2-5 minutes). If a task is larger, split it.
A plan with vague tasks like "add validation" or "implement the service" is not a plan — it is a wish list. Every task must contain exact file paths, exact commands, and complete code snippets.
Rigor Levels
The
rigorLevel is passed by autopilot (or set via --fast/--thorough flags). Default is standard.
| Phase | | (default) | |
|---|---|---|---|
| SCOPE | No change. | No change. | No change. |
| DECOMPOSE | Skip skeleton. Full tasks directly after file map. | Skeleton if tasks >= 8; full tasks if < 8. | Always skeleton. Require approval before expanding. |
| SEQUENCE | No change. | No change. | No change. |
| VALIDATE | No change. | No change. | No change. |
The skeleton pass is the primary rigor lever. Fast mode goes straight to full detail. Thorough mode validates direction before investing tokens in expansion.
Argument Resolution
When invoked by autopilot (or with explicit arguments), resolve paths before starting:
- Session slug: If
argument provided, setsession-slug
. Pass to{sessionDir} = .harness/sessions/<session-slug>/
. All handoff writes go togather_context({ session: "<session-slug>" })
.{sessionDir}/handoff.json - Spec path: If
argument provided, read spec from that path. Otherwise, discover fromspec-path
(read upstream brainstorming output) or prompt the user.{sessionDir}/handoff.json - Rigor level: If
/fast
argument provided, use it. Otherwise default tothorough
.standard
When no arguments are provided (standalone invocation), discover spec from context or prompt. Global
.harness/ paths used as fallback.
Phase 1: SCOPE — Derive Must-Haves from Goals
Work backward from the goal. Start with "what must be true when we are done?"
-
State the goal. One sentence. What does the system do when this plan is complete?
-
Derive observable truths. What can be observed (running a command, opening a browser, reading a file) that proves the goal is met? Be specific:
- BAD: "The API handles errors"
- GOOD: "GET /api/users/nonexistent returns 404 with
body"{ error: 'User not found' }
-
Derive required artifacts. For each truth, what files must exist? What functions? What tests pass? List exact file paths.
-
Identify key links. How do artifacts connect? What imports what? What calls what?
-
Apply YAGNI. For every artifact: "Is this required for an observable truth?" If not, cut it.
-
Surface uncertainties. Before proceeding to Phase 2, explicitly list what you do NOT know. For each uncertainty, classify it:
- Blocking: Cannot decompose tasks without resolving this. Escalate to user.
- Assumption: Can proceed with a stated assumption. Document it. If wrong, specific tasks will need revision.
- Deferrable: Does not affect task decomposition. Note for execution phase.
Format:
## Uncertainties - [BLOCKING] How should the API handle partial failures? (Spec does not define.) - [ASSUMPTION] Database supports transactions. (If not, Task 3 needs redesign.) - [DEFERRABLE] Exact error message wording. (Can be finalized during implementation.)Read-only constraint: Steps 1-5 above are research and analysis. Do not propose task structure, file organization, or implementation approaches during SCOPE. Record what must be true (observable truths) and what you do not know (uncertainties). Solutions belong in DECOMPOSE.
When scope is ambiguous, use
:emit_interactionemit_interaction({ path: "<project-root>", type: "question", question: { text: "The spec mentions X but does not define behavior for Y. Should we:", options: [ { label: "A) Include Y in this plan", pros: ["Complete feature in one pass", "No follow-up coordination"], cons: ["Increases scope and time", "May delay delivery"], risk: "medium", effort: "high" }, { label: "B) Defer Y to a follow-up plan", pros: ["Keeps current plan focused", "Ship sooner"], cons: ["Y remains unhandled", "May need rework when Y is added"], risk: "low", effort: "low" }, { label: "C) Update the spec first", pros: ["Design is complete before planning", "No surprises during execution"], cons: ["Blocks planning until spec is updated", "Extra round-trip"], risk: "low", effort: "medium" } ], recommendation: { optionIndex: 1, reason: "Keeping the current plan focused reduces risk. Y can be addressed in a follow-up.", confidence: "medium" } } })
EARS Requirement Patterns
Use EARS (Easy Approach to Requirements Syntax) when writing observable truths. These patterns eliminate ambiguity via consistent grammatical structure.
| Pattern | Template | Use When |
|---|---|---|
| Ubiquitous | The system shall [behavior]. | Always applies, unconditionally |
| Event-driven | When [trigger], the system shall [response]. | Triggered by a specific event |
| State-driven | While [state], the system shall [behavior]. | Only during a certain state |
| Optional | Where [feature is enabled], the system shall [behavior]. | Gated by config or feature flag |
| Unwanted | If [condition], then the system shall not [behavior]. | Preventing undesirable behavior |
Worked Examples:
- Ubiquitous: "The system shall return JSON responses with
header."Content-Type: application/json - Event-driven: "When a user submits an invalid form, the system shall display field-level error messages within 200ms."
- State-driven: "While the database connection is unavailable, the system shall serve cached responses and log reconnection attempts."
- Optional: "Where rate limiting is enabled, the system shall reject requests exceeding 100/minute per API key with HTTP 429."
- Unwanted: "If the request body exceeds 10MB, then the system shall not attempt to parse it — return HTTP 413 immediately."
Apply EARS for behavioral requirements, not structural checks (e.g., file existence does not need EARS framing).
Graph-Enhanced Context (when available)
When a knowledge graph exists at
.harness/graph/, use graph queries for faster context:
— discover module dependencies for realistic task decompositionquery_graph
— estimate which modules a feature touchesget_impact
Fall back to file-based commands if no graph is available.
Phase 2: DECOMPOSE — Map File Structure and Create Tasks
Report progress:
**[Phase 2/4]** DECOMPOSE — mapping file structure and creating tasks
-
Map the file structure first. List every file to create or modify before writing tasks:
CREATE src/services/notification-service.ts CREATE src/services/notification-service.test.ts MODIFY src/services/index.ts (add export) CREATE src/types/notification.ts MODIFY src/api/routes/users.ts (add notification trigger) -
Skeleton pass (rigor-gated). Lightweight skeleton (~200 tokens) validates direction before full expansion. Gating per Rigor Levels table.
Format: Numbered logical groups with task count and time. No file paths, code, or details.
1. Foundation types and interfaces (~3 tasks, ~10 min) 2. Core scoring module with TDD (~2 tasks, ~8 min) 3. CLI integration and flag parsing (~4 tasks, ~15 min) **Estimated total:** 8 tasks, ~33 minutesApproval gate: Present via
(type:emit_interaction
, text: "Approve skeleton direction?"). If approved, proceed to step 3. If rejected, revise and re-present.confirmation -
Decompose into atomic tasks. Each task must:
- Be completable in 2-5 minutes, fit in a single context window
- Have a clear, testable outcome
- Follow TDD: write test, fail, implement, pass, commit
- Produce one atomic commit
-
Write complete instructions for each task. Not summaries — complete executable instructions:
- Exact file paths to create or modify
- Exact code to write (not "add validation logic" — write the actual code)
- Exact test commands (e.g.,
)npx vitest run src/services/notification-service.test.ts - Exact commit message
as the final stepharness validate
-
Include checkpoints. Mark tasks requiring human input:
— Pause, show result, wait for confirmation[checkpoint:human-verify]
— Pause, present options, wait for choice[checkpoint:decision]
— Pause, instruct human on required action[checkpoint:human-action]
Phase 3: SEQUENCE — Order Tasks and Identify Dependencies
- Order by dependency. Types before implementations. Implementations before integrations. Tests alongside implementations (same task, TDD style).
- Identify parallel opportunities. Tasks touching different subsystems with no shared state can be marked parallelizable.
- Number tasks sequentially. Use
,Task 1
, etc. Dependencies reference task numbers.Task 2 - Estimate total time. Sum 2-5 minutes per task. If total exceeds available time, identify a milestone boundary for pausing.
Phase 4: VALIDATE — Review and Finalize the Plan
-
Verify completeness. Every observable truth from Phase 1 must trace to specific task(s) that deliver it.
-
Verify task sizing. Could an agent complete each task in one context window without exploring or deciding? If not, split it.
-
Verify TDD compliance. Every code-producing task must include a test step. No "write tests later."
-
Run
to verify project health before writing the plan.harness validate -
Check failures log. Read
. If planned approaches match known failures, flag them..harness/failures.md -
Run soundness review. Invoke
against the draft. Do not proceed until the review converges with no remaining issues.harness-soundness-review --mode plan -
Write the plan to
. Naming:docs/plans/
. Create directory if needed.YYYY-MM-DD-<feature-name>-plan.md -
Write handoff. Write to the session-scoped path when session slug is known, otherwise fall back to global path:
- Session-scoped (preferred):
.harness/sessions/<session-slug>/handoff.json - Global (fallback, deprecated):
.harness/handoff.json
[DEPRECATED] Writing to
is deprecated. In autopilot sessions, always use.harness/handoff.json
to prevent cross-session contamination..harness/sessions/<slug>/handoff.jsonFields:
,fromSkill
,phase
,summary
,completed
,pending
,concerns
,decisions
.contextKeywords - Session-scoped (preferred):
-
Write session summary (if session is known). Call
with skill, status, plan path, keyContext, nextStep. Skip if no session slug.writeSessionSummary -
Request plan sign-off: Use
(type:emit_interaction
) with plan path, task count, and time estimate.confirmation -
Suggest transition to execution. After approval, call
with type:emit_interaction
,transition
,completedPhase: "planning"
,suggestedNext: "execution"
. IncluderequiresConfirmation: true
with checks: plan-written, harness-validate, observable-truths-traced, human-approved. If confirmed: invoke harness-execution. If declined: stop (handoff already written).qualityGate
Plan Document Structure
# Plan: <Feature Name> **Date:** YYYY-MM-DD | **Spec:** (if applicable) | **Tasks:** N | **Time:** N min ## Goal One sentence. ## Observable Truths (Acceptance Criteria) 1. [observable truth] ## File Map - CREATE path/to/file.ts - MODIFY path/to/other-file.ts ## Skeleton (if produced) 1. <group name> (~N tasks, ~N min) _Skeleton approved: yes/no._ ## Tasks ### Task 1: <descriptive name> **Depends on:** none | **Files:** path/to/file.ts, path/to/file.test.ts 1. Create test file with exact test code 2. Run test — observe failure 3. Create implementation with exact code 4. Run test — observe pass 5. Run: `harness validate` 6. Commit: `feat(scope): descriptive message` ### Task 2: <descriptive name> [checkpoint:human-verify] ...
Session State
| Section | Read | Write | Purpose |
|---|---|---|---|
| terminology | yes | no | Consistent language in plan |
| decisions | yes | yes | Brainstorming decisions; planning-phase decisions |
| constraints | yes | yes | Existing constraints; constraints discovered during decomposition |
| risks | yes | yes | Existing risks; implementation risks from task design |
| openQuestions | yes | yes | Unresolved questions; new questions; resolve answered ones |
| evidence | yes | yes | Prior evidence; file:line citations for task specs |
When to write: Phase 1 — constraints and risks. Phase 2 — decisions about task structure. Phase 4 — resolve questions.
When to read: Start of Phase 1 via
gather_context with include: ["sessions"] to inherit brainstorming context.
Evidence Requirements
When referencing existing code in task specs, cite evidence using
file:line format, code pattern references, or test output. Write to evidence session section via manage_state.
When to cite: Phase 1 (existing files), Phase 2 (file paths and patterns), file map (existing files for modification).
Uncited claims: Prefix with
[UNVERIFIED].
Harness Integration
— Run in Phase 4 (before writing plan) and included in every task.harness validate
— Referenced in tasks adding imports or creating modules.harness check-deps- Plan location —
.docs/plans/YYYY-MM-DD-<feature-name>-plan.md - Handoff — Once approved, invoke harness-execution for task-by-task implementation.
- Session directory — Session-scoped writes go to
. Structure:.harness/sessions/<slug>/
,handoff.json
,state.json
(registry of spec/plan paths and produced file lists). Globalartifacts.json
is deprecated for session-aware invocations..harness/handoff.json
— Call at end of Phase 4 to suggest transitioning to execution (confirmed transition).emit_interaction- Rigor levels —
/--fast
control skeleton pass. See Rigor Levels table.--thorough - Two-pass planning — Skeleton (~200 tokens) before full expansion. Catches directional errors early.
Change Specifications
When planning changes to existing functionality (not greenfield), express requirements as deltas:
- [ADDED] — New behavior that does not exist today
- [MODIFIED] — Existing behavior that changes
- [REMOVED] — Existing behavior that goes away
Example:
## Changes to User Authentication - [ADDED] OAuth2 refresh tokens with 7-day expiry - [MODIFIED] Login endpoint returns `refreshToken` alongside `accessToken` - [MODIFIED] Token validation accepts both JWT and OAuth2 tokens - [REMOVED] Legacy API key authentication (deprecated in v2.1)
Only apply when modifying existing documented behavior. When
docs/changes/ exists, produce docs/changes/<feature>/delta.md alongside the task plan.
Success Criteria
- Plan document exists in
with all required sectionsdocs/plans/ - Every task completable in 2-5 minutes (one context window)
- Every task includes exact file paths, exact code, and exact commands
- Every code-producing task follows TDD: test first, fail, implement, pass
- Observable truths trace to specific tasks
- File map lists every file to create or modify
- Checkpoints marked where human input is required
passes before plan is written and is in every taskharness validate- Human has reviewed and approved the plan
- Rigor level rules followed: fast skips skeleton; thorough always skeletons with approval; standard skeletons at >= 8 tasks
Red Flags
| Flag | Corrective Action |
|---|---|
| "I know the implementation well enough to skip reading the spec" | STOP. Phase 1 SCOPE starts by reading the spec. Assumptions about spec content lead to plans that implement the wrong thing. |
| "This task is self-explanatory, no need for exact file paths and commands" | STOP. Iron Law: every task must contain exact file paths, exact commands, and complete code snippets. "Implement the service" is a wish, not a task. |
| "I'll plan the happy path now and add error handling tasks later" | STOP. Error handling is not optional. The spec's success criteria include error scenarios. Plan them alongside the happy path. |
or in task descriptions | STOP. A task that defers detail to execution is a vague task. If you cannot write the exact steps now, you do not understand the task well enough to plan it. |
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
| "The task is conceptually clear so I do not need to include exact code in the plan" | Every task must have exact file paths, exact code, and exact commands. If you cannot write the code in the plan, you do not understand the task well enough to plan it. |
| "This task touches 5 files but it is logically one unit of work, so splitting it would add overhead" | Tasks touching more than 3 files must be split. The overhead of splitting is far less than the cost of a failed oversized task. |
| "Tests for this task can be added in a follow-up task since the implementation is straightforward" | No skipping TDD in tasks. Every code-producing task must start with writing a test. "Add tests later" is explicitly forbidden. |
| "The spec does not cover this edge case, but I can fill in the gap during planning" | When the spec is missing information, do not fill in the gaps yourself. Escalate. Filling gaps silently creates undocumented design decisions that no one reviewed. |
| "I discovered we need an additional file during decomposition, but updating the file map is just bookkeeping" | The file map must be complete. Every file that will be created or modified must appear in the file map before task decomposition. |
| "There are no real uncertainties — the spec is clear enough" | Every plan has unknowns. If you listed zero uncertainties, you skipped the step. Re-read the spec and list what is assumed but not stated. |
| "I already know how to structure this, no need to finish scoping" | Premature decomposition anchors on the first approach found. Complete SCOPE (observable truths + uncertainties) before proposing any task structure. |
| "The skeleton pass adds overhead for a plan this size — I will go straight to full tasks" | Rigor level rules are not optional. In thorough mode, the skeleton is always required. In standard mode, 8+ tasks require a skeleton. Skipping it risks task-level misalignment with the goal. |
| "I will write implementation code in the plan to make the tasks more concrete" | Planning produces a plan document, not code. Writing code during planning violates the phase boundary — code belongs in execution. Exact snippets in task descriptions are plan content, not executed code. |
Examples
Example: Planning a User Notification Feature
Goal: Users receive email and in-app notifications when their account is modified.
Observable Truths:
with changed fields triggers a notification record in the databasePOST /api/users/:id
returns notification with type, message, timestampGET /api/notifications?userId=:id- Notification email sent via existing email utility (verified by mock in test)
passes with 8+ testsnpx vitest run src/services/notification-service.test.ts
passesharness validate
File Map:
CREATE src/types/notification.ts CREATE src/services/notification-service.ts CREATE src/services/notification-service.test.ts MODIFY src/services/index.ts MODIFY src/api/routes/users.ts MODIFY src/api/routes/users.test.ts
Skeleton: Not produced — task count (6) below threshold (8).
Task 1: Define notification types
Files: src/types/notification.ts 1. Create src/types/notification.ts: export interface Notification { id: string; userId: string; type: 'account_modified'; message: string; read: boolean; createdAt: Date; expiresAt: Date; } 2. Run: harness validate 3. Commit: "feat(notifications): define Notification type"
Task 2 (TDD): Write test for NotificationService.create(). Observe failure. Implement. Observe pass. Validate. Commit.
Task 3 (TDD):
[checkpoint:human-verify] — Write tests for list() and isExpired(). Observe failures. Implement. Observe pass. Validate + check-deps. Commit.
Example: Skeleton (thorough mode)
Goal: Add rate limiting to all API endpoints.
Skeleton: 1) Rate limit types (~2 tasks, ~7 min) 2) Middleware with Redis (~3 tasks, ~12 min) 3) Route integration (~4 tasks, ~15 min) 4) Integration tests (~3 tasks, ~10 min). Total: 12 tasks, ~44 min. Presented for approval. Approved. Expanded to full tasks.
Gates
- No vague tasks. Every task must have exact file paths, exact code, and exact commands. If you cannot write the code, you do not understand the task well enough.
- No tasks larger than one context window. If a task requires exploring, deciding, or touching more than 3 files, split it.
- No skipping TDD. Every code-producing task starts with a test. "Add tests later" is not allowed.
- No plan without observable truths. Must start with goal-backward acceptance criteria.
- No implementation during planning. Write the plan, get approval, then use harness-execution.
- File map must be complete. Every file to create or modify must appear before task decomposition.
- Uncertainties must be surfaced. Phase 1 must produce an uncertainties list. Zero uncertainties means the step was skipped. Blocking uncertainties must be resolved before Phase 2.
Escalation
- Cannot write exact code for a task: Design is underspecified. Return to spec or brainstorm. Do not write vague placeholders.
- Task count exceeds 20: Consider splitting into multiple plans with milestone boundaries.
- Dependencies form a cycle: Re-examine file map. Break the cycle by extracting a shared type or interface.
- Spec is missing information: Do not fill gaps yourself. Escalate: "The spec does not define behavior for [scenario]. This blocks Task N."
- Estimated time exceeds available time: Identify a milestone boundary for pausing. Propose delivering in phases, each producing a usable increment.