Crucible build
Use when starting any feature development, building new functionality, implementing a design, or going from idea to working code. Triggers on "build", "implement", "add feature", or any task requiring design-through-execution.
git clone https://github.com/raddue/crucible
T=$(mktemp -d) && git clone --depth=1 https://github.com/raddue/crucible "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/build" ~/.claude/skills/raddue-crucible-build && rm -rf "$T"
skills/build/SKILL.mdBuild
Overview
<!-- CANONICAL: shared/dispatch-convention.md -->All subagent dispatches use disk-mediated dispatch. See
shared/dispatch-convention.md for the full protocol.
End-to-end development pipeline: interactive design, autonomous planning with adversarial review, team-based execution with per-task code and test review. One command, idea to completion.
Announce at start: "I'm using the build skill to run the full development pipeline."
Session index event: At startup, if session indexing is active (session index path discoverable via glob), emit a
skill_start event to the outbox: {"ts":"<now>","seq":0,"type":"skill_start","summary":"Starting /build for <user goal>","detail":{"skill":"build","goal":"<user goal>"}}. See skills/shared/session-index-convention.md for the outbox pattern.
Guiding principle: Quality over velocity. This pipeline produces correct, well-integrated, maintainable output — even if slower. Parallel execution is available for independent work, but sequential with quality gates is the default.
<!-- Trust framework: see [skills/getting-started/trust-hierarchy.md](../getting-started/trust-hierarchy.md). -->Communication Requirement (Non-Negotiable)
Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.
Every status update must include:
- Current phase — Which pipeline phase you're in
- What just completed — What the last agent reported
- What's being dispatched next — What you're about to do and why
- Task checklist — Current status of all tasks (pending/in-progress/complete)
After compaction: If you just experienced context compaction, re-read the task list from disk and output current status before continuing. Do NOT proceed silently.
Examples of GOOD narration:
"Phase 3, Task 4 complete. Reviewer found 2 Important issues — dispatching implementer to fix. Tasks: [1] ✓ [2] ✓ [3] ✓ [4] fixing [5-8] pending"
"Phase 2 complete. Plan passed review with 0 issues on round 2. Dispatching innovate on the plan."
This requirement exists because: Long-running autonomous pipelines can run for hours. Without narration, the user sees nothing but a spinner. They can't assess progress, can't decide whether to intervene, and can't learn from the pipeline's decisions.
Pipeline Discipline (Non-Negotiable)
NEVER skip quality gate steps. Every artifact must pass its quality gate before proceeding to the next phase. No exceptions, no shortcuts.
BLOCK semantics: Phase transitions are gated. You CANNOT proceed from Phase 1→2, 2→3, or 3→4 without the gate for that phase passing. If a gate fails, fix the issues and re-run the gate. Do not silently skip a gate because "it looks fine" or "we already reviewed it."
If you find yourself about to skip a gate: STOP. Re-read this section. The gate exists because skipping it has caused real production incidents and hours of wasted time. Run the gate.
Anti-Rationalization Table — build
| Rationalization | Rebuttal | Rule |
|---|---|---|
| "This task is small/simple/trivial, the quality gate would just find nits." | Small changes have the same bug density per line as large ones. QG has never run on a Crucible artifact without finding at least one real issue. | Run the quality gate on every phase artifact, regardless of size. |
| "Phase N looks fine, I can skip the gate and move on." | Self-assessment of artifact quality is exactly the bias the gate exists to counter. "Looks fine" is the failure mode, not a pass criterion. | Phase transitions are BLOCKED without a verified PASS verdict marker for the prior phase. |
| "The fix agent addressed the findings, so the gate is done." | Fixing is not passing. Fix rounds routinely introduce new issues or incompletely resolve old ones. A clean verification round is required. | The gate is only complete after a fresh red-team round returns 0 Fatal, 0 Significant. |
| "The user said 'looks good' / 'move on' — that's approval to skip the gate." | General feedback is not skip approval. Only an unambiguous instruction that explicitly references the gate counts. | Require literal (or equivalent explicit phrase) before recording . |
| "I can fix this one finding myself instead of dispatching a fix agent." | Orchestrator-applied fixes conflate coordination with remediation and bypass the fix journal. Every fix — even trivial — goes through a fix agent. | Orchestrator never edits the artifact directly; always dispatch the fix agent. |
| "Innovate/red-team seem redundant on top of the quality gate, I'll skip them." | They are not redundant. Innovate is divergent; red-team is adversarial; QG is iterative remediation. Skipping any one of them is a documented regression (). | Run innovate and red-team on every artifact, every time. |
| "I'll just finish the task list and narrate at the end." | Long-running autonomous pipelines are invisible without narration. Silent runs prevent the user from intervening or learning. | Narrate before every dispatch and after every completion — non-negotiable. |
Gate Ledger Protocol
Tamper-evident audit trail for phase transitions and gate verdicts. This is defense-in-depth — it raises the cost of gate-skipping from zero to nonzero by requiring structured state to be maintained and verified. An external enforcement hook (
gate-ledger-guard.sh) provides mechanical enforcement by blocking unauthorized PASS writes.
File location:
~/.claude/projects/<project-hash>/memory/build-gate-ledger.md
Relationship to pipeline-status.md: pipeline-status.md is ambient user awareness (overwritten at every narration point). build-gate-ledger.md is the gate verdict audit trail (updated per phase as gates pass). Both are needed; neither replaces the other.
PipelineID Generation
At pipeline start, generate a PipelineID via
date -u +build-%Y%m%d-%H%M%S. This ID:
- Is persisted in the ledger header
- Is passed to quality-gate invocations as
pipeline_id - Is used by the enforcement hook to cross-check verdict markers
- Is unique per build run (timestamp-based)
Ledger Format
# Build Gate Ledger Run: <ISO-8601 timestamp> PipelineID: <build-YYYYMMDD-HHMMSS> Goal: <user request> Mode: <feature | refactor> ## Phase 1: Design Status: NOT_STARTED ## Phase 2: Plan Status: NOT_STARTED ## Phase 3: Execute Status: NOT_STARTED ## Phase 4: Completion Status: NOT_STARTED
Format constraints:
- One key-value pair per line:
Key: value - Fixed key names:
,Status
,Gate
,Artifact
,Tasks
,Reason
,AcknowledgedPipelineID - Status values:
,NOT_STARTED
,IN_PROGRESS
,PASS
,COMPLETE
,FAIL
,SKIPPEDINFERRED - Phase headers are
— always 4 phases, always in order## Phase N: Name - No prose, no paragraphs, no nested structure
Ledger Initialization
Runs during build startup, after mode detection but before Phase 1 begins:
- Check for existing ledger at canonical path
- If found: run Run Isolation checks (see below)
- If not found (or user chose "start fresh"): write new ledger including
,Run
,PipelineID
, andGoal
header fields, then all four phases withModeStatus: NOT_STARTED - The ledger MUST exist before Phase 1 transitions to
IN_PROGRESS
After writing any ledger (fresh or reconstructed), immediately re-read the ledger header to extract the PipelineID into the active in-memory state. This is a defensive consistency practice — ensures the in-memory value always matches the persisted value.
Run Isolation
Stale detection prevents cross-run contamination:
- Compaction recovery (same run): If pipeline-status.md
timestamp matches the ledger'sStarted
timestamp, this is the same build run recovering from compaction. Auto-resume without prompting.Run - New session with existing ledger: If the ledger exists but pipeline-status.md is missing or its
timestamp doesn't match the ledger'sStarted
, prompt: "Found existing ledger for '[goal]' (started [timestamp], Phase N [status]). Resume this run? [y/n]". On "no", archive the old ledger via BashRun
tomv
. If the target filename already exists, append a counter suffix (build-gate-ledger-<old-timestamp>.md
,-2
, etc.).-3 - No existing ledger: Create fresh.
Orphan Cleanup
Requires: Active PipelineID established (from Ledger Initialization + Run Isolation). This step runs AFTER the resume/fresh decision is resolved.
Scan
~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md for verdict markers. Delete any whose PipelineID does not match the active PipelineID. If resuming: use the resumed build's PipelineID (from the existing ledger). If starting fresh: use the newly generated PipelineID.
Note: If the session is recovering via INFERRED reconstruction (new PipelineID generated), markers from the old run will be cleaned up. This is intentional — the design requires a fresh QG run for INFERRED→PASS upgrade, not reuse of old markers.
Timestamps and File Operations
- Timestamps: Obtained via Bash
(Bash is allowed fordate -u +%Y-%m-%dT%H:%M:%S
commands that don't referencedate
paths).claude/ - Ledger archival (rename): Uses Bash
since Write/Read/Edit/Glob have no rename capabilitymv - All other ledger operations (create, read, update): MUST use Write and Read tools, NOT Bash. This is a hard constraint due to
path restrictions..claude/
Enforcement Rules
Before each phase transition, read
build-gate-ledger.md and check the previous phase's status:
-
Gate check: If the previous phase's Status is NOT in {
,PASS
(Phase 3 only),COMPLETE
withSKIPPED
}, output:Acknowledged: truePHASE GATE BLOCKED: Cannot start Phase N — Phase N-1 gate has not passed. Current state: [status] Run the quality gate on Phase N-1's artifact before proceeding.This means
,INFERRED
,IN_PROGRESS
, andFAIL
all trigger BLOCKED.NOT_STARTED -
Phase 1 exception: Phase 1 (Design) has no predecessor gate — it always starts.
-
Phase 3 exception: Phase 3 transitions to
(notCOMPLETE
) when all tasks are done and per-task code reviews pass.PASS
satisfies the gate requirement for Phase 4. No verdict marker is required for Phase 3.COMPLETE
Verdict Marker Verification
After quality-gate returns with a verdict, verify the verdict marker before writing to the ledger:
- Glob for verdict markers:
~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md - Filter by
match — only markers with the current build's PipelineIDPipelineID - Sort by the
field value inside the marker file (parsed as ISO-8601), take the most recentTimestamp - Verify: marker exists,
isVerdict
,PASS
matches current build's PipelineIDPipelineID - If verification passes: write
to the ledger withPASS
timestamp andGate
pathArtifact - If verification fails:
- Normal flow (marker missing/mismatched after a just-run gate): do NOT write PASS. Output warning and re-invoke quality-gate on the same artifact.
- INFERRED recovery (PipelineID mismatch or missing marker on an INFERRED phase): prompt the user for the artifact path, then offer to run the gate or type SKIP GATE.
- After writing the ledger entry, delete the verdict marker (it has served its purpose). This applies to all verdict outcomes — PASS, FAIL, STAGNATION, and ESCALATED markers are all deleted after the corresponding ledger entry is written. [PLAN ADDITION — extends the design doc's PASS-only deletion to all verdict outcomes for cleanliness.]
Skip Escape Hatch
If the user explicitly wants to bypass a gate:
Example of a SKIPPED phase in the ledger:
## Phase 2: Plan Status: SKIPPED Gate: 2026-04-13T15:00:00 Reason: User requested skip Acknowledged: true
Confirmation protocol: [Default: option (a) — separate-turn required, matching the design doc's two-step flow. User may override to option (b) before implementation.]
- The orchestrator outputs: "Gate skip requested. Type
to confirm. This will be logged."SKIP GATE - The orchestrator halts execution and waits. The user's NEXT message must contain exactly
. ASKIP GATE
token in the same message as the skip request does NOT satisfy the confirmation requirement.SKIP GATE - The orchestrator writes
withStatus: SKIPPED
field to the ledger.Reason
Per-phase acknowledgment: SKIPPED requires one acknowledgment per phase, not per boundary. Before starting Phase N, the orchestrator checks all prior phases. Any prior phase with
Status: SKIPPED that has not yet been Acknowledged: true triggers the BLOCKED message. The user types SKIP GATE once per skipped phase, and the ledger records Acknowledged: true. Subsequent boundaries do not re-prompt for already-acknowledged skips.
Missing artifact handling: If a phase was SKIPPED because no artifact was produced, retroactive gating requires the user to supply the artifact path: "To run the gate on Phase N, provide the artifact path." If no artifact exists, retroactive gating is not possible — the phase remains SKIPPED.
Recovery from SKIPPED: If the user later wants to properly gate a skipped phase, they can ask to "run the gate on Phase N." The orchestrator transitions
SKIPPED → IN_PROGRESS, runs the quality gate on the phase's artifact, and writes the result normally.
Phase 4 completion warning: If ANY prior phase has
Status: SKIPPED, Phase 4 outputs a prominent warning listing all skipped gates before presenting finish options.
State Machine
Phase 1: Design NOT_STARTED → IN_PROGRESS (design skill starts) IN_PROGRESS → PASS (quality gate verdict marker verified) IN_PROGRESS → FAIL (quality gate escalates — stagnation/regression) FAIL → IN_PROGRESS (user directs re-work) * → SKIPPED (user types SKIP GATE — does NOT unlock next phase without acknowledgment) SKIPPED → IN_PROGRESS (user asks to run the gate retroactively) INFERRED → IN_PROGRESS (user runs gate after compaction recovery) INFERRED → SKIPPED (user types SKIP GATE after compaction recovery) Phase 2: Plan NOT_STARTED → IN_PROGRESS (requires Phase 1 Status = PASS or SKIPPED+Acknowledged) [same transitions as Phase 1] Phase 3: Execute (no quality gate — uses COMPLETE instead of PASS) NOT_STARTED → IN_PROGRESS (requires Phase 2 Status = PASS or SKIPPED+Acknowledged) IN_PROGRESS → COMPLETE (all tasks done, per-task reviews passed, verification gates green) IN_PROGRESS → FAIL (task failures, user escalation) FAIL → IN_PROGRESS (user directs re-work) * → SKIPPED (user types SKIP GATE) SKIPPED → IN_PROGRESS (user asks to run retroactively) Note: Phase 3 has no QG invocation. COMPLETE satisfies Phase 4's gate requirement. Phase 4: Completion NOT_STARTED → IN_PROGRESS (requires Phase 3 Status = COMPLETE or SKIPPED+Acknowledged. PASS is unreachable for Phase 3.) IN_PROGRESS → PASS (quality gate verdict marker verified) IN_PROGRESS → FAIL (quality gate escalates) FAIL → IN_PROGRESS (user directs re-work) * → SKIPPED (user types SKIP GATE) SKIPPED → IN_PROGRESS (user asks to run retroactively) IN_PROGRESS includes: emit skip warnings if any prior phase SKIPPED
Compaction Recovery (Ledger)
build-gate-ledger.md is on disk and survives compaction. Recovery precedence when state is partial:
- Ledger exists, handoff manifest missing: Use ledger to determine which phase to resume from. Prompt: "Gate ledger shows Phase N passed, but the phase handoff context was lost. Confirm resume from Phase N+1?" If PASS but no handoff, also prompt for Phase N inputs (design doc path, plan path, etc.) before proceeding.
- Handoff manifest exists, ledger missing: Reconstruct ledger from manifests. Mark the current phase as
(notINFERRED
). Mark predecessor phases asPASS
(handoff existence proves the boundary was crossed). Generate a new PipelineID and write it to the reconstructed ledger header. After writing, re-read the ledger header to extract the PipelineID into active state. INFERRED phases trigger the gate-blocked check — the orchestrator must run a fresh quality gate (with matching PipelineID) or the user must type SKIP GATE.PASS - Both missing: Fresh start. Prompt user.
Quality Gate Requirement (Non-Negotiable)
Every quality gate in this pipeline MUST run to completion. This is NOT optional — you may NOT self-assess whether a quality gate is "needed" based on task size, complexity, or scope.
Quality gates are unconditional at all three gate points:
- Phase 1, Step 2 — Design doc gate
- Phase 2, Step 3 — Plan gate
- Phase 4, Step 6 — Implementation gate
Common rationalizations that are NEVER valid reasons to skip:
- "This is a small change"
- "This is trivial / simple / straightforward"
- "This is just a config change / documentation update / one-liner"
- "The quality gate won't find anything on something this simple"
- "I fixed the findings, so the gate is done" — fixing findings is NOT the same as passing the gate. The iteration loop must complete with a clean verification round (0 Fatal, 0 Significant on a fresh review). Fix agents introduce new issues or incompletely resolve old ones — that is why fresh-eyes re-review exists.
This requirement exists because: Quality gates consistently find issues the pipeline misses regardless of task size. There is no category of task that is immune. In observed runs, tasks self-assessed as "trivial" had the same defect rate as complex tasks. The only way to skip a quality gate is with explicit user approval — an unambiguous instruction specifically referencing the gate, not general feedback like "looks good" or "move on."
Pipeline Status
Write a status file to
~/.claude/projects/<hash>/memory/pipeline-status.md at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.
Write Triggers
Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.
Status File Format
The status file uses this structure (overwritten in full each time):
# Pipeline Status **Updated:** <current timestamp> **Started:** <timestamp from first write — persisted across compaction> **Skill:** build **Phase:** <current phase, e.g. "3 — Execute (Autonomous)"> **Health:** <GREEN|YELLOW|RED> **Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED> **Elapsed:** <computed from Started> ## Recent Events - [HH:MM] <most recent event> - [HH:MM] <previous event> (last 5 events, newest first)
Skill-Specific Body
Append after the shared header:
## Task Progress | # | Task | Tier | Status | Duration | |---|------|------|--------|----------| | 1 | Auth middleware | T3 | DONE | 12m | | 2 | Route handlers | T2 | IN REVIEW (code, pass 1) | 18m+ | | 3 | Database layer | T1 | PENDING | — | ## Quality Gates - Design: PASSED (2 rounds) - Plan: PASSED (1 round) - Task tiers: 1x T1, 1x T2, 1x T3 - Code: not yet reached ## Checkpoints - Last checkpoint: pre-wave-3 (12:45:30) - Total checkpoints: 7 - Shadow repo: healthy ## Compression State Goal: [original user request] Key Decisions: - [accumulated decisions, max 10] Active Constraints: - [constraints affecting remaining work] Next Steps: 1. [immediate next action] 2. [subsequent actions]
The Compression State section is a semantic subset of the full Compression State Block emitted into the conversation. It omits Files Modified (recoverable from git) and Scratch State (fixed per skill). It is the first section read during compaction recovery.
Health State Machine
Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.
- Phase boundaries (reset to GREEN): Phase 1->2, 2->3, 3->4
- YELLOW: review loop round 3+, quality gate round 5+, retry in progress
- RED: escalation pending, stagnation detected, test suite failure unresolved
When health is YELLOW or RED, include
**Suggested Action:** with a concrete, context-specific sentence (e.g., "Code review looping on Task 4. Check recent events for recurring patterns.").
Inline CLI Format
Output concise inline status alongside the status file write:
- Minor transitions (dispatch, completion): one-liner, e.g.
Phase 3 [4/8] Task 4 IN REVIEW (pass 1) | GREEN | 1h 12m - Phase changes and escalations: expanded block with
separators--- - Health transitions: always expanded with old -> new health
Compaction Recovery
After compaction, before re-writing the status file: 0. Read the
## Compression State section from pipeline-status.md — recover Goal, Key Decisions, Active Constraints, and Next Steps. If the section is absent (pre-update pipeline), skip to step 1.
<!-- TRUST: dispatch manifest is L2 — produced by prior pipeline stage; prefer most recent if conflicting. -->
0.5. Check for handoff manifests (
handoff-*-to-*.md) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes the Compression State section for phase-boundary recovery. If no manifest exists, continue with CSB-based recovery.
- Read the rest of
to recoverpipeline-status.md
timestamp andStarted
bufferRecent Events - Reconstruct phase, health, and skill-specific body from internal state files
- If crucible:checkpoint was used: verify checkpoint availability by checking for the shadow repo at the computed path. Log available checkpoint count. Do not restore — just confirm checkpoints are recoverable.
- Emit a Compression State Block into the conversation to seed the new context window with recovered state
4.5. Read session index summary (supplementary): If the CSB Scratch State contains a
path, or if globbingSession Index:
finds a recent file, read~/.claude/projects/<hash>/memory/session-index/*/summary.md
. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. Ifsummary.md
lacks detail for a specific event type (e.g., errors, decisions, file changes), usesummary.md
to query/recall
with filters for targeted recovery.events.jsonl - Write the updated status file
- Output inline status to CLI
Compression State Block
At checkpoint boundaries (see Checkpoint Timing below), emit the following structured block into the conversation. This block signals to the auto-compactor which state is critical to preserve. Also persist the semantic subset (Goal, Key Decisions, Active Constraints, Next Steps) to the
## Compression State section of pipeline-status.md.
===COMPRESSION_STATE=== Goal: [original user request, one sentence] Skill: [skill name] Phase: [current phase identifier] Health: [GREEN|YELLOW|RED] Mode: [skill-specific mode if applicable, omit otherwise] Progress: - [completed milestone 1] - [completed milestone 2] - [current work in progress] Key Decisions (this session): - [DEC-1] [decision]: [reasoning, one line] - [DEC-2] [decision]: [reasoning, one line] Active Constraints: - [constraint that affects remaining work] - [constraint from prior phase that still applies] Files Modified: - [file path]: [what changed, one line] Scratch State: - Location: [scratch directory path] - Session Index: [~/.claude/projects/<hash>/memory/session-index/<session-id>/ if active, omit if not] - Recovery: [which files to read first, in order] Next Steps: 1. [immediate next action] 2. [action after that] 3. [remaining work summary] ===END_COMPRESSION_STATE===
Rules:
- Key Decisions list is capped at 10. When adding an 11th, compress the oldest low-impact decision into a single-line Progress entry annotated "[compressed from decisions]".
- Each Compression State Block includes the FULL accumulated decision list, not just new decisions since the last block. Decisions accumulate across compressions.
- Progress entries are cumulative — include all completed milestones, not just since the last block.
- Files Modified lists only files changed since the last block emission. On first block of a session, list all files changed so far.
- Goal must be the original user request verbatim or a faithful one-sentence paraphrase. Do not let it drift across compressions.
Checkpoint Timing
Emit a Compression State Block into the conversation AND update the
## Compression State section in pipeline-status.md at these points:
- Phase transitions: 1→2, 2→3, 3→4 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these points
- Phase 3 progress: After every 3 task completions
- Quality gate entry/exit: Before first quality gate round dispatch and after gate completes (pass or escalation)
- Escalations: Before any escalation to user
- Health transitions: On any GREEN->YELLOW or YELLOW->RED transition
These triggers are a superset of the existing pipeline-status.md write triggers. The Compression State Block is emitted alongside (not instead of) the normal narration and status file write.
Phase Handoff Manifest
At phase boundaries (1→2, 2→3, 3→4), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist, not a denylist. Everything not on the manifest is shed.
Format:
# Phase Handoff: N → M **Timestamp:** ISO-8601 **Goal:** [original user request, verbatim] **Mode:** feature | refactor ## Inputs for Phase M - **[Input name]:** [disk path or inline value] ## Decisions Carried Forward - [DEC-N] [decision]: [reasoning, one line] ## Active Constraints - [constraint affecting remaining work] ## Shed Receipt - [what was shed] → [where it lives on disk]
Rules:
- After writing the manifest, emit an explicit shed statement: list what context is no longer needed, where it lives on disk, and that the orchestrator operates from manifest inputs only going forward.
- After writing the manifest, update the
section in pipeline-status.md with the manifest contents (Goal, Decisions, Constraints, and the Inputs as Next Steps). This ensures compaction recovery can reconstruct state even if the manifest is lost.## Compression State - CSBs continue at all non-boundary checkpoint triggers (intra-phase progress, quality gate entry/exit, escalations, health transitions).
- Backward compatibility: If a handoff manifest does not exist at a recovery point, fall back to CSB-based recovery (existing behavior).
Mode Detection
Before dispatching the design skill, determine whether this build is:
- Feature mode (default) — adding new capability. Success = new acceptance tests pass.
- Refactor mode — restructuring existing code. Success = existing behavior preserved + structural goals met.
Detection: If the user's intent is ambiguous, ask directly before proceeding:
"Is this adding new behavior, or restructuring existing code without changing what it does?"
The user's answer sets the mode for the entire pipeline. No special syntax needed.
Mode Propagation
Propagate refactor mode to subagents through:
- New refactor-specific prompt templates —
andcontract-test-writer-prompt.md
are standalone files used only in refactor mode. Select these instead of (or in addition to) the feature-mode equivalents.refactor-implementer-addendum.md - Appended context blocks — For existing prompts that serve both modes (
,plan-writer-prompt.md
), append a "Refactor Mode Context" section when composing the dispatch file. The templates remain flat markdown — the orchestrator decides what to include.build-implementer-prompt.md - Scratch file for compaction recovery — Persist the current mode in
containing/tmp/crucible-build-mode.md
ormode: refactor
plus the baseline commit SHA. Only one build runs per session, so a well-known filename is sufficient.mode: feature
Compaction Recovery
Build's existing compaction step must read the Compression State FIRST (step 0 from Pipeline Status Compaction Recovery), then the mode file, before re-reading the task list or any other state. On resumption after compaction:
- Read
from pipeline-status.md — recover goal, decisions, constraints, next steps. 0.5. Check for handoff manifests (## Compression State
) in the scratch directory. If the most recent manifest exists, use its Inputs and Mode to bootstrap recovery — this supersedes the mode file for phase-boundary state.handoff-*-to-*.md - Read
— recover mode and baseline commit SHA./tmp/crucible-build-mode.md - If file is missing: Default to feature mode and warn.
- If mode is
: Verify baseline commit SHA exists.refactor - Read
— if it exists, apply Gate Ledger Compaction Recovery (see Compaction Recovery subsection under Gate Ledger Protocol). Use the ledger's phase statuses to determine the resume point. If the ledger is missing but handoff manifests exist, reconstruct with INFERRED status.build-gate-ledger.md - After mode and ledger are recovered: Proceed with general state reconstruction (task list, phase, health).
Phase 1: Design (Interactive)
Step -1: Resume Detection and Pipeline-Active Marker
Before any design or dispatch work, check for a crashed prior pipeline:
- Check
(where<scratch>/.pipeline-active
is<scratch>
)~/.claude/projects/<hash>/memory/ - Not found: Write the pipeline-active marker (JSON with
set to current session ID,pipeline_id
set toskill
,"build"
set tophase
,"1"
set to current ISO-8601 timestamp,start_time
set to the scratch directory path,scratch_dir
set to the dispatch directory path,dispatch_dir
frombranch
,git branch --show-current
frombaseline_sha
). Proceed to Step 0.git rev-parse HEAD - Found, same
as current session: This is a compaction recovery scenario. Follow existing compaction recovery procedures. Do not re-write the marker.pipeline_id - Found, different
: a. Branch guard: Compare marker'spipeline_id
field against currentbranch
. If they differ, warn: "Previous build on branch [marker.branch] crashed at Phase [phase]. You are currently on [current-branch]. Switch to [marker.branch] before resuming? [switch+resume / start fresh / abort]". Do NOT offer resume on the wrong branch. b. Readgit branch --show-current
from the marker'smanifest.jsonl
(or from the scratch directory copy ifdispatch_dir
was lost) c. Identify the last successful phase boundary by scanning manifest entries grouped by phase. A phase boundary is verified when all dispatches in that phase have/tmp
. d. Present resume option to the user:status: "completed""Previous build on branch [marker.branch] crashed at Phase [N], [context]. Resume from [last good boundary] ([checkpoint reason], [estimated time preserved] of work preserved)? [yes / no / fresh]" e. User accepts: Invoke
in resume mode, passing the scratch directory path. The replay skill handles checkpoint restore, state reconstruction, and re-dispatch. The build pipeline does not continue -- replay takes over. f. User declines (fresh): Delete the stalecrucible:replay
marker. Write a fresh marker with the current session. Proceed to Step 0 as a new pipeline run..pipeline-active
Marker updates during pipeline: Update the
phase field in .pipeline-active at each phase boundary (1->2, 2->3, 3->4) to track progress for crash detection.
Marker cleanup: Delete
.pipeline-active at Phase 4 step 12 (after finish skill completes).
Gate Ledger Initialization: After the pipeline-active marker is written (or recovered) and mode detection is complete, run the Gate Ledger Protocol's Ledger Initialization and Orphan Cleanup steps. The ledger must exist before Phase 1 transitions to IN_PROGRESS.
Step 0: Pre-Existing Doc Detection
Before running interactive design, check whether
/spec (or a prior /build run) already produced design artifacts for this ticket.
-
Scan for pre-existing spec docs: Search
for design docs (docs/plans/
) with a matching*-design.md
field in YAML frontmatter. Also check for correspondingticket
and*-implementation-plan.md
files with the same ticket field.*-contract.yaml -
Conflict detection: If multiple design docs match the same
field, escalate to user: "Found multiple design docs for ticket #NNN: [list files]. Which should I use?" Do not proceed until the user resolves the conflict.ticket -
Full match (design doc + implementation plan + contract all present):
- Skip interactive design (the Phase 1 design sub-skill below) — design doc already exists
- Security review check: If the contract contains
field, note it in the Phase 1→2 handoff manifest under Active Constraints: "Contract requires security review (security_review
) — siege will be evaluated in Phase 4 Step 5.5." This ensures the directive survives phase handoffs and compaction recovery.security_review.status: [required|recommended] - Quality-gate the existing design doc with staleness context: "This design doc is pre-existing from /spec and may be stale — verify against current codebase state before proceeding"
- Staleness rejection: If the quality gate finds that the design doc references files, interfaces, or modules that no longer exist in the codebase, reject the doc as fundamentally stale. Fall back to running Phase 1 interactively. Inform user: "Pre-existing design doc for #NNN is fundamentally stale (references [specific items] that no longer exist). Running interactive design instead."
- If quality gate passes: Run Phase 2 on the pre-existing implementation plan — skip Plan Writer (plan already exists), but run Plan Reviewer + innovate + quality-gate on the existing plan. This ensures the plan gets the same review rigor as a freshly written plan.
- If quality gate fails (non-staleness issues): fix or escalate
- Proceed to Phase 3 when the plan passes review
-
Partial match (design doc present but implementation plan or contract missing):
- Use the existing design doc (quality-gate it as above, including staleness rejection)
- Run the missing phases normally: if no implementation plan, run Plan Writer in Phase 2; if no contract, proceed without contract awareness for this ticket
- Inform user which artifacts were found and which are being generated fresh: "Found pre-existing design doc for #NNN. Implementation plan is missing — will generate in Phase 2." (or similar)
-
Not found: Proceed with normal Phase 1 (interactive design below).
- Model: Opus (creative/architectural work needs the best model)
- Mode: Interactive with the user
- RECOMMENDED SUB-SKILL: Use crucible:forge (feed-forward mode) — consult past lessons before starting
- RECOMMENDED SUB-SKILL: Use crucible:cartographer (consult mode) — review codebase map for structural awareness
- REQUIRED SUB-SKILL: Use crucible:design
- Follow design skill for design refinement, section-by-section validation, and saving the design doc
- OVERRIDE: When design completes and the design doc is saved, do NOT follow design's "Implementation" section (do not chain into planning or worktree from there). Return control to this build skill — Phase 2 handles planning with its own subagent-based approach.
- Phase ends when user approves the design (says "go", "looks good", "proceed", etc.)
- Everything after this point is autonomous — tell the user: "Design approved. Starting autonomous pipeline — I'll only interrupt for escalations."
Step 2: Innovate and Red-Team the Design
After the user approves the design and before starting Phase 2:
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-design-gate" before dispatching innovate and quality-gate on the design doc.
- Innovate: Dispatch
on the design doc. Plan Writer incorporates the proposal.crucible:innovate - Write Phase 1 IN_PROGRESS to the gate ledger (after ledger initialization).
- REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) design doc with artifact type "design". Include in the dispatch context:
andPhase: design
. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)PipelineID: <current PipelineID> - If the quality gate requires changes, the Plan Writer updates the design doc and re-commits.
- Verify verdict marker and write Phase 1 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
- Design doc is now finalized — proceed to acceptance tests.
Step 2.5: Generate PRD
After the design doc is finalized (Step 2 complete), generate a stakeholder-facing PRD:
- Dispatch a PRD Writer subagent (Sonnet) using
./prd-writer-prompt.md- Input: finalized design doc
- Output: PRD in standard format (problem statement, user stories, requirements, scope, out-of-scope, success metrics, technical notes, dependencies)
- Save to
docs/prds/YYYY-MM-DD-<topic>-prd.md - Commit:
docs: add PRD for [feature]
This step runs by default. The PRD is a reformatting of the design doc for non-technical stakeholders — it does not introduce new decisions or requirements. Skip only in refactor mode (refactoring has no stakeholder-facing PRD).
Step 3: Generate Acceptance Tests (RED)
Before planning, define "done" with executable tests:
- Dispatch an Acceptance Test Writer subagent (Opus) using
./acceptance-test-writer-prompt.md- Input: finalized design doc (especially acceptance criteria)
- Output: integration-level test file(s) that verify feature behavior end-to-end
- Run the acceptance tests — verify they FAIL (the feature doesn't exist yet)
- If tests pass: something is wrong — investigate before proceeding
- If tests error (won't compile): this is expected in typed languages — note which tests exist and what they verify. They become the first implementation task.
- Commit:
test: add acceptance tests for [feature] (RED)
These tests define the feature-level RED-GREEN cycle that wraps the entire pipeline. The pipeline is done when these tests pass.
Refactor Mode: Phase 1 Changes
When in refactor mode, Phase 1 shifts from "what should we build?" to "what are we changing and what could break?"
Blast Radius Analysis
After the user describes the refactoring intent, the design phase:
- Identify the target — What code is being restructured? (module, interface, data representation, file organization, etc.)
- Trace the blast radius using cartographer (if available) or fallback exploration:
- Direct consumers — code that imports/calls/references the target
- Indirect dependents — code that depends on consumers (transitive)
- Test coverage — which tests exercise the target behavior
- Configuration/wiring — DI registrations, config files, build scripts that reference the target
- Fallback when cartographer is unavailable: Use language-aware symbol search via agent exploration. Grep for symbol references (imports, type annotations, function calls) using language-specific patterns. The impact manifest's confidence field reflects reduced precision.
- Present an impact manifest to the user:
### Impact Manifest **Target:** [what's being restructured] **Structural goal:** [what the code should look like after] **Direct consumers:** N files - path/to/consumer1.py (calls TargetClass.method) - path/to/consumer2.py (imports TargetClass) **Indirect dependents:** N files - path/to/dependent.py (depends on consumer1) **Test coverage:** - N tests directly exercise target behavior - N tests exercise consumers - Gap: no tests cover [specific seam] **Risk assessment:** [Low/Medium/High] based on consumer count and coverage gaps **Confidence:** [High/Medium/Low] — High if cartographer used, Medium/Low if fallback
When confidence is Low, require explicit user confirmation before proceeding. The user must review the impact manifest and confirm the blast radius is complete.
- Design the structural goal — what should the code look like after the refactoring? User validates the target state.
Acceptance Tests (Refactor Mode)
Instead of writing NEW acceptance tests (Step 3 above), the pipeline:
- Dispatch the contract test writer using
— a single agent handles gap identification AND gap filling. Input: impact manifest + blast radius file list. The agent maps existing tests to behavioral seams, identifies untested seams, and writes contract tests for each gap../contract-test-writer-prompt.md - Run all contract tests GREEN — contract tests must pass before any refactoring begins.
- If a contract test FAILS: The contract test writer investigates:
- Test defect (wrong assertion, bad setup) — fix the test and re-run
- Latent codebase bug — report to user with options: (a) fix the bug first, (b) exclude this seam and accept the risk, (c) abort the refactoring. Never silently drop a failing contract test.
- Commit:
test: add contract tests for [target] refactoring (GREEN — locking existing behavior)
Proportionality Escape Valve
Contract test writing must remain proportional to the refactoring scope. Trigger a scope check when any of these thresholds are hit:
- Count threshold: More than 15 contract tests needed
- Effort threshold: Contract test writer reports context pressure, or estimated total contract test LOC exceeds ~2x the estimated refactoring scope LOC
When triggered:
- Present the full gap list to the user with estimated effort per gap
- User selects which gaps to fill and which to accept as uncovered risk
- Proceed with only user-selected contract tests
The impact manifest records which gaps the user chose to leave uncovered.
Phase Handoff: 1 → 2
Before dispatching the Plan Writer, verify the gate ledger and write a handoff manifest:
- Gate ledger check: Read
and verify Phase 1 Status isbuild-gate-ledger.md
. If not, follow Enforcement Rules.PASS - Write
with:handoff-1-to-2.md- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 2: design doc path, acceptance test paths (or contract tests in refactor mode), PRD path (if generated), conventions path (from cartographer, if loaded)
- Decisions Carried Forward: accumulated decisions from Phase 1
- Active Constraints: constraints affecting planning
- Shed Receipt: design iteration history, innovate proposals, quality gate round details → design doc on disk captures the outcome
- Emit shed statement: "Phase 1 context shed. Design doc, acceptance tests, and PRD are on disk. Design iteration history, innovate proposals, and gate round details are not carried forward."
- Update
in pipeline-status.md with manifest contents.## Compression State - Do NOT emit a Compression State Block (manifest replaces it at this boundary).
- Session index event: Emit a
event to the outbox:phase_change
.{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 1 -> Phase 2 (Plan)","detail":{"skill":"build","from":"1","to":"2"}}
Phase 2: Plan (Autonomous)
Step 1: Write the Plan
Dispatch a Plan Writer subagent (Opus):
- Read the design doc produced in Phase 1 and the acceptance tests from Step 3
- Write an implementation plan following the
formatcrucible:planning - If acceptance tests couldn't compile (typed language), Task 1 should create the interfaces/stubs needed for them to compile and fail correctly
- Include per-task metadata: Files (with count), Complexity (Low/Medium/High), Dependencies
- Save to
docs/plans/YYYY-MM-DD-<topic>-implementation-plan.md - Plan tasks should be scoped to 2-3 per subagent, ~10 files max (context budget awareness)
Use
./plan-writer-prompt.md template for the dispatch prompt.
Step 2: Review the Plan
Dispatch a Plan Reviewer subagent:
Reviewer model selection:
- Plan touches 4+ systems or has 10+ tasks → Opus
- Plan touches 1-3 systems with <10 tasks → Sonnet
- When in doubt → Opus
Review protocol (iterative):
- Dispatch Plan Reviewer to check plan against design doc
- If issues found: record issue count, dispatch Plan Writer to revise
- Dispatch NEW fresh Plan Reviewer on revised plan (no anchoring)
- Compare issue count to prior round:
- Strictly fewer issues → progress, loop again
- Same or more issues → stagnation, escalate to user with findings from both rounds
- Loop until plan passes with no issues
- Architectural concerns bypass the loop — immediate escalation regardless of round
Use
./plan-reviewer-prompt.md template for the dispatch prompt.
Step 3: Innovate and Red-Team the Plan
After the plan passes review:
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-plan-gate" before dispatching innovate and quality-gate on the plan.
- Write Phase 2 IN_PROGRESS to the gate ledger.
- Innovate: Dispatch
on the approved plan. Plan Writer incorporates the proposal into the plan.crucible:innovate - REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) plan with artifact type "plan". Include in the dispatch context:
andPhase: plan
. Provides the plan and design doc as context. (Non-negotiable — see Quality Gate Requirement.)PipelineID: <current PipelineID> - Verify verdict marker and write Phase 2 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
The quality gate handles the iterative red-team loop — fresh review each round, weighted stagnation detection, 15-round safety limit, escalation. See
crucible:quality-gate for details.
Phase Handoff: 2 → 3
Before creating the team and task list, write a handoff manifest. Step 3.4 above already verified the verdict marker, wrote PASS to the ledger, and deleted the marker. The handoff manifest is written AFTER the ledger PASS — this sequencing ensures compaction recovery finds a consistent state (ledger shows PASS, handoff exists).
Write a handoff manifest:
- Write
with:handoff-2-to-3.md- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 3: plan path, design doc path, acceptance test paths (or contract tests), contract YAML path (if exists), baseline SHA (current HEAD), cartographer context paths (module files, conventions.md, landmines.md)
- Decisions Carried Forward: accumulated decisions from Phases 1-2
- Active Constraints: constraints affecting execution
- Shed Receipt: plan review iterations, innovate proposals, quality gate round history → plan on disk captures the outcome
- Emit shed statement: "Phase 2 context shed. Plan, design doc, and acceptance tests are on disk. Plan review rounds, innovate proposals, and gate details are not carried forward."
- Update
in pipeline-status.md with manifest contents.## Compression State - Do NOT emit a Compression State Block.
- Session index event: Emit a
event to the outbox:phase_change
.{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 2 -> Phase 3 (Execute)","detail":{"skill":"build","from":"2","to":"3"}}
Phase 3: Execute (Autonomous, Team-Based)
Step 0: Load Module Context for Subagents
-
RECOMMENDED SUB-SKILL: Use crucible:cartographer (load mode) — when dispatching implementers and reviewers, include relevant module files, conventions.md, and landmines.md in their dispatch files
-
Defect signature loading (for implementers only):
- Glob
(excludingdefect-signatures/*.md
) from the cartographer storage directory*.non-matches.md - For each signature, read its
field and match against the task's target modules:Modules- Read each cartographer module file's
fieldPath: - A task's file is in a module if the file path starts with the module's
valuePath: - When a task spans multiple modules, load signatures for all matched modules
- Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's
directory prefixesModules
- Read each cartographer module file's
- For matching signatures, validate all file paths still exist on disk — drop stale entries silently
- Inject into the
section of[DEFECT_SIGNATURES]
:build-implementer-prompt.md- Generalized pattern (always)
- Confirmed siblings list (always)
- Unresolved siblings list (always — these are known live defects; produces a stronger warning)
- Non-match companion files are NOT loaded for implementers
update: Loading is pure-read. After all implementer dispatches for the current phase complete, batch-update theLast loaded
field to today on all signatures that were loaded. Do NOT update during dispatch — defer to after all subagents are dispatched.Last loaded
- Glob
Step 0.5: Gate Ledger — Phase 3 Start
Write Phase 3 IN_PROGRESS to the gate ledger (after Phase 2 PASS verification).
Step 1: Create Team and Task List
Create a team using
TeamCreate:
team_name: "build-<feature-name>" description: "Building <feature description>"
Read the approved plan. Create tasks via
TaskCreate for each plan task, including:
- Subject from plan task title
- Description with full plan task text (subagents should never read the plan file)
- Dependencies via
withTaskUpdateaddBlockedBy
Agent Teams Fallback
If
TeamCreate fails (agent teams not available), output a clear one-time warning:
⚠️ Agent teams are not available. Recommended: set
Falling back to sequential subagent dispatch via Agent tool.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
Then fall back to sequential subagent dispatch via the regular Task tool (without
team_name). Everything still works — independent tasks run sequentially instead of in parallel via teammates.
What changes in fallback mode:
- Tasks are dispatched via
tool instead of as teammatesAgent - Independent tasks that would run in parallel now run sequentially
- Task tracking still uses
/TaskCreate
for state managementTaskUpdate - All other pipeline behavior (TDD, review, de-sloppify, quality gates) is unchanged
Step 2: Analyze Dependencies and Execution Order
Before dispatching:
- Map the dependency graph from plan task metadata
- Identify independent tasks (no shared files, no sequential dependencies)
- Group into execution waves — independent tasks parallel, dependent tasks sequential
- Assess complexity per task for reviewer model selection
Step 3: Execute Tasks
For each task (or wave of parallel tasks):
RECOMMENDED SUB-SKILL: Before dispatching each execution wave, use crucible:checkpoint — create checkpoint with reason "pre-wave-N" (where N is the wave number). This captures the working directory state after the prior wave's verification gate passed.
- Mark task
viain_progressTaskUpdate - Spawn Implementer teammate (Opus) via Task tool with
andteam_namesubagent_type="general-purpose"- Use
template./build-implementer-prompt.md - Pass full task text, file paths, project conventions
- Contract-aware dispatch (when a contract exists for this ticket): Include the contract YAML alongside the design doc and task description. See "Contract-Aware Implementer Guidance" below.
- Implementer follows TDD, writes tests, runs tests, commits, self-reviews
- Use
- When Implementer reports completion, run De-Sloppify Cleanup (see below)
- After cleanup completes, spawn Reviewer teammate
- Use
template./build-reviewer-prompt.md
- Use
- Tier-aware review routing: Read the task's
from plan metadata.Review-Tier- Tier 1: Dispatch single-pass code reviewer (Sonnet). If Clean or Minor-only: task complete. If Critical/Important: dispatch implementer fix, then task complete. If Architectural Concern: escalate.
- Tier 2: Dispatch iterative code review (per existing loop). Then dispatch single-pass test reviewer. If test review surfaces Critical findings, escalate to Tier 3. Then dispatch adversarial tester (per existing logic). Task complete.
- Tier 3: Follow current full pipeline (no changes to existing flow).
Contract-Aware Implementer Guidance
When a contract YAML exists for the current ticket (detected during Step 0 or produced by
/spec), the implementer receives the contract alongside the design doc and task description. The contract uses the schema defined in crucible:spec/contract-schema.md (version 1.0). Implementers must treat contract elements as follows:
-
declarations are binding. The implementer must match the declared function signatures, class interfaces, endpoint shapes, parameter names, types, and return types exactly. Deviations from the contract's API surface are implementation errors.api_surface -
invariants are binding. The implementer must satisfy all declared constraints (e.g., "must not import X", "must be idempotent"). Thecheckable
field (check_method
,grep
,code-inspection
) indicates how the quality gate will verify compliance — the implementer should self-check against these before committing.file-structure -
invariants require tagged tests. For eachtestable
invariant, the implementer must write a test tagged with the declaredtestable
(pattern:test_tag
) that validates the invariant. These tests are checked by the quality gate and reviewers — they must exist and pass.contract:<category>:<id> -
are informational. These indicate which other components and contracts this ticket interacts with. The implementer should be aware of referenced components and ensure compatibility, but integration points are not binding constraints — they provide context for making good implementation decisions.integration_points
De-Sloppify Cleanup
<!-- TRUST: subagent report is L4 — cross-check file paths and claims against L3 source before acting. -->After the implementer reports completion and before dispatching the reviewer:
RECOMMENDED: Use crucible:checkpoint — create checkpoint with reason "pre-cleanup-task-N" before dispatching the cleanup agent. If cleanup removes something needed, restore to this checkpoint.
- Record the pre-cleanup commit SHA
- Dispatch a fresh Cleanup Agent (Opus) using
./cleanup-prompt.md- Input:
(the implementer's committed changes)git diff <pre-task-sha>..HEAD - The orchestrator provides the pre-task commit SHA to the cleanup agent
- Input:
- Cleanup agent reviews changes, removes unnecessary code (see allowlist), runs tests
- If cleanup made changes, commits separately:
refactor: cleanup task N implementation - If cleanup found nothing to remove, reports "No cleanup needed" and proceeds
Reviewer Model Selection (Lead Decides Per-Task)
| Task Complexity | Reviewer Model |
|---|---|
| Low (1-3 files, straightforward) | Sonnet |
| Medium (3-6 files, some cross-system) | Lead decides (default Opus) |
| High (6+ files, refactoring, deep chains) | Opus |
| When in doubt | Opus |
Two-Pass Review Cycle
Each task gets TWO review passes before completion:
digraph review { "Implementer builds + tests" -> "De-sloppify cleanup"; "De-sloppify cleanup" -> "Pass 1: Code Review"; "Pass 1: Code Review" -> "Implementer fixes code findings"; "Implementer fixes code findings" -> "Pass 2: Test Quality Review"; "Pass 2: Test Quality Review" -> "Implementer fixes test findings"; "Implementer fixes test findings" -> "Test Alignment Audit (crucible:test-coverage)"; "Test Alignment Audit (crucible:test-coverage)" -> "Test Gap Writer"; "Test Gap Writer" -> "Adversarial Tester"; "Adversarial Tester" -> "Task complete"; }
Pass 1 — Code Review: Architecture, patterns, correctness, wiring (actually connected, not just existing?)
Pass 2 — Test Quality Review: Test independence? Determinism? Edge cases? Integration tests where mocks are masking real behavior? AAA pattern? Correct test level? (Staleness and alignment checks are handled by the test-coverage dispatch below.)
Review Tier Routing
Each task's
Review-Tier (from the plan) determines which review steps execute. Phase 4 full-implementation gates are NOT affected by per-task tiers.
| Step | Tier 1 | Tier 2 | Tier 3 |
|---|---|---|---|
| Implementer | Yes | Yes | Yes |
| De-sloppify cleanup | Yes | Yes | Yes |
| Pass 1: Code review | Single pass | Iterative | Iterative |
| Implementer fixes (code) | If findings | If findings | If findings |
| Pass 2: Test quality review | SKIP | Single pass (non-iterative) | Iterative |
| Implementer fixes (test) | SKIP | If critical findings only | If findings |
| Test alignment audit | SKIP | SKIP | Yes |
| Test gap writer | SKIP | SKIP | Yes |
| Adversarial tester | SKIP | Yes | Yes |
Tier 1 "single pass" code review: Dispatch one reviewer. If findings are Clean, task is complete. If findings include Critical or Important issues, dispatch implementer to fix, then the task is complete (no re-review). If findings include an Architectural Concern, escalate as normal.
Tier 2 "single pass" test review: Dispatch one test quality reviewer. Report findings but do NOT enter the iterative review loop. If the single pass surfaces Critical findings, escalate the task to Tier 3 for full iterative treatment.
Tier 2 "iterative" code review: Same as current behavior -- fresh reviewer each round, track issue count, loop until clean or stagnation.
Runtime Tier Escalation
The orchestrator may escalate a task's review tier during execution. Escalation is one-directional (up only).
Triggers:
- Implementer reports unexpected complexity or cross-system interaction not anticipated in the plan
- Single-pass reviewer (Tier 1 code review or Tier 2 test review) reports Critical findings
- Implementer touches significantly more files than the plan specified
Process:
- Log escalation to decision journal:
[timestamp] DECISION: review-tier | choice=escalate T1->T2 | reason=<trigger> | alternatives=none - Execute the additional review steps for the new tier (from the point where the current tier's pipeline diverges)
- Update the task status display to show the escalated tier
Contract-Aware Reviewer Guidance
When a contract YAML exists for the current ticket, reviewers receive the contract alongside the implementation and must add the following checks to both review passes:
-
API surface compliance: Do the implemented public interfaces match the
declarations in the contract? Check function signatures, class interfaces, endpoint shapes, parameter names/types, and return types. Any deviation from the contract's declared API surface is a blocking finding.api_surface -
Checkable invariant satisfaction: Are all
invariants satisfied per their declaredcheckable
?check_method
: verify the pattern match (or absence) in production codegrep
: read and reason about code to confirm the invariant holdscode-inspection
: check file existence/organization matches the constraint Any unsatisfied checkable invariant is a blocking finding.file-structure
-
Testable invariant test existence: Does a test exist for each
invariant, tagged with the correcttestable
(pattern:test_tag
)? A missing tagged test is a blocking finding.contract:<category>:<id> -
Test correctness: Do the tagged tests actually validate the invariant they claim to cover? A test that exists but does not meaningfully exercise the invariant (e.g., a trivially passing assertion, a test that tests something unrelated despite having the right tag) is a blocking finding.
Severity: All contract-related review findings are classified as blocking — the same severity as contract violations in the quality gate. Contract findings must be resolved before the task is marked complete.
Test Alignment Audit
After the implementer addresses Pass 2 findings, invoke
crucible:test-coverage against the task's changes:
- Code diff:
git diff <pre-task-sha>..HEAD - Affected test files: test files touched or related to the task
- Context: "Build task N: [task description]"
The test-coverage skill audits existing tests for staleness (wrong assertions, misleading descriptions, dead tests, coincidence tests) and handles its own fix dispatch and revert-on-failure logic. It returns a structured report. Note: the diff includes review fix commits — the audit agent should focus on behavioral changes to source files, not changes that only touch test files.
Skip this step if the task made no behavioral source changes (only
.md, .json, config files).
Test Gap Writer
After test-coverage completes (or is skipped), dispatch a Test Gap Writer (Opus) using
./test-gap-writer-prompt.md:
- Input: Pass 2 test reviewer's missing coverage findings + implementer's changes + test-coverage audit report (if available)
- The test gap writer writes tests ONLY for gaps the reviewer identified — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit).
- Tests should pass immediately (the behavior already exists from implementation)
- The test gap writer reports per-test PASS/FAIL results (see prompt template for report format)
- Commits new tests:
test: fill coverage gaps for task N
If all tests PASS: Continue to adversarial tester.
If some tests FAIL (gaps reveal genuinely missing implementation):
- Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, and the gap descriptions from the reviewer
- Implementer fixes the missing behavior, then re-runs ALL test gap writer tests (not just the failures — catches regressions from the fix)
- If all tests pass after fix: commit (
), continue to adversarial testerfix: address test gap failures for task N - If tests still fail after one fix attempt: escalate to user with:
- Which coverage gaps the reviewer identified
- Which tests the gap writer wrote (per-test PASS/FAIL)
- What the implementer attempted to fix
- Which tests still fail and their current failure messages
Skip this step if the Pass 2 test reviewer reported zero missing coverage gaps.
Adversarial Tester
After the test gap writer completes (or is skipped), dispatch an Adversarial Tester (Opus) using
skills/adversarial-tester/break-it-prompt.md:
- Input: Full diff of the task's changes (
), project test conventions, cartographer module context (if available)git diff <pre-task-sha>..HEAD - The adversarial tester identifies the top 5 most likely failure modes, writes one test per mode, and runs them
- Outcome handling:
- All tests PASS: Implementation is robust. Log results and proceed to task complete.
- Some tests FAIL: Real weaknesses found. Dispatch implementer to fix. Re-run all tests (including adversarial). If pass → task complete. If fail → one more fix attempt, then escalate to user.
- Tests ERROR (won't compile): Adversarial tester mistake. Discard broken tests, log, proceed to task complete.
- Quality bypass prevention: If the implementer's fix touches more than 3 files, route through a lightweight code review before completing.
- Commit adversarial tests:
test: adversarial tests for task N
Skip this step when:
- The task diff contains no behavioral source files (only
,.md
,.json
,.yaml
,.uss
).uxml - No tests were written during implementation (pure scaffolding)
Iterative Review Loop
Each review pass (code and test) uses the iterative loop:
- After fixes, dispatch a NEW fresh Reviewer (no anchoring to prior findings)
- Track issue count between rounds
- Strictly fewer issues → progress, loop again
- Same or more issues → stagnation, escalate to user
- Loop until clean
- Architectural concerns → immediate escalation regardless of round
Verification Gates
After each wave completes:
- Run full test suite (not just current wave's tests)
- Check compilation
- Failures → identify which task caused regression before fixing
- Clean → proceed to next wave
Refactor Mode: Phase 3 Changes
When in refactor mode, Phase 3 execution differs from feature mode in several ways.
Pre-Execution Coverage Check
Before the first task executes:
- Run all contract tests from Phase 1 — confirm GREEN
- Run the full test suite — confirm GREEN (pre-execution baseline)
- Record the "baseline commit" SHA in
— this is the rollback target/tmp/crucible-build-mode.md
Tiered Test Strategy
Running the full test suite after every atomic step is prohibitively expensive. Instead:
- (a) After each atomic task: Run blast-radius tests + direct consumer tests only (tests identified in the impact manifest)
- (b) After each execution wave: Run the full test suite (matches existing verification gate between waves)
- (c) Full suite checkpoints: Pre-execution baseline and Phase 4 final verification always run the full suite
Coordinated-Atomic Execution
When the executor encounters a task marked
atomic: true:
- Record pre-task commit SHA
- Implementer makes ALL changes (multiple files) — dispatch with
appended./refactor-implementer-addendum.md - Run blast-radius tests + direct consumer tests (per tiered strategy)
- If GREEN: Commit all files together in a single commit
- If FAIL: Revert ALL files to pre-task SHA. Dispatch one retry with a fresh implementer that receives the failure context and test output. If second attempt also fails, revert to pre-task SHA and escalate to user (see Rollback Policy below).
Key difference from feature mode: Feature mode does RED-GREEN-REFACTOR. Refactor mode for atomic steps does GREEN-GREEN — tests are green before, tests must be green after. No RED phase because no new behavior is being added.
After a successful atomic commit (step 4), the rest of the per-task pipeline continues as normal: de-sloppify cleanup, two-pass review cycle, test alignment audit, test gap writer, and adversarial tester (unless skipped per restructuring-only annotation below).
Non-atomic refactoring tasks follow normal execution — structural changes that don't break intermediate states (e.g., extracting a private method, adding a module nothing imports yet). These use standard TDD if they introduce new abstractions, or GREEN-GREEN if they are pure restructuring.
Phase 3 Adaptations for Existing Steps
- Adversarial tester: The planner annotates each task with
. Ifrestructuring-only: true/false
, adversarial testing is skipped. Tasks withrestructuring-only: true
still get adversarial testing. When in doubt, default torestructuring-only: false
.false
examples: renames where all call sites are mechanically updated, file moves with updated paths, extract-method where the extracted method is private and preserves the original call signaturerestructuring-only: true
examples: extract-class where callers must change call targets, splitting a module where consumers must update imports, any change where the consumer-facing API surface shiftsrestructuring-only: false
- De-sloppify cleanup: Gains a new removal category: dead compatibility shims. After a refactoring task, look for leftover adapter code, re-export aliases, or compatibility layers introduced during migration but no longer referenced. Detection scope: code added after the baseline commit SHA that re-exports, aliases, or wraps symbols under old names, AND where no code outside the refactoring's changed files references the old names. String-based references: When the target was registered by name in a configuration system, flag the shim as UNCERTAIN and defer to the reviewer rather than removing it.
Refactoring Rollback Policy
Baseline Commit
The orchestrator records the baseline commit SHA before the first refactoring task executes (during pre-execution coverage check). Persisted in
/tmp/crucible-build-mode.md.
Per-Task Rollback
When a single task fails after the executor's retry attempt:
- Revert that task's changes to the pre-task commit SHA
- Escalate to user with failure context and test output
- User chooses: skip this task and continue (orchestrator also skips all tasks that depend on the skipped task, and informs the user which tasks were transitively skipped), retry with guidance, or revert all tasks to baseline
Full Rollback to Baseline
When the user chooses full rollback (or cascading failures make forward progress impossible):
- Perform
to restore pre-refactoring stategit reset --hard <baseline-SHA> - Re-run all contract tests to confirm known-good state
- Report what was reverted and why
Safe Partial States
The planner annotates tasks with
safe-partial: true/false. A task is safe-partial: true if the codebase is in a valid, shippable state after that task completes (all tests green, no dangling references). When a later task fails, the orchestrator can offer to keep changes through the last safe-partial task.
Architectural Checkpoint
For plans with 10+ tasks, at ~50% completion or after a major subsystem:
- Dispatch architecture reviewer using
./architecture-reviewer-prompt.md - Design drift → escalate to user
- Minor concerns → adjust prompts for remaining tasks
- All clear → continue
Noticed Reconciliation
After all implementers in Phase 3 report back and before writing the Phase 3 COMPLETE ledger entry, aggregate their
### Noticed But Not Touching sections into a single docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md artifact.
Scope discipline: Notice, do not act. If an implementer sees an out-of-scope issue during implementation, it must be logged under
### Noticed But Not Touching in their report — NOT fixed in their diff. Acting on noticed items in the same task is a scope-discipline failure. The orchestrator enforces this via reconciliation: noticed entries are surfaced here and converted to follow-up tickets later (see /finish).
7-step reconciliation process:
-
Collect each implementer's
section from every Phase 3 implementer report.### Noticed But Not Touching -
Skip any section whose body is
.*(none)* -
Dedupe entries using the canonical dedupe key:
, wheresha256( normalize(file_path) + "|" + line_range + "|" + noticed[:40] )
is the repo-relative POSIX path lowercased.normalize(file_path) -
Sort the deduped entries by file path, then line range.
-
If any entries remain, write
matching the canonical filename regexdocs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md
. Use the date embedded in the sibling plan filename (not wall-clock date) so all sibling artifacts share a date; slug matches the ticket being built. Frontmatter and body must follow the Canonical Constants template exactly:^docs/plans/\d{4}-\d{2}-\d{2}-[a-z0-9-]+-noticed\.md$--- pipeline_id: "<build-YYYYMMDD-HHMMSS>" date: "YYYY-MM-DD" ticket: "#NNN" --- # Noticed But Not Touching — <ticket-slug> - **file:** `path:L<start>-L<end>` **noticed:** <desc> **why it matters:** <risk/opportunity> **suggested follow-up:** <optional> -
Idempotent overwrite: If the target
already exists (same-ticket re-run on the same date), merge the existing entries with the newly collected entries, run the full dedupe (same key), sort, and overwrite the file in one write. No append-mode; the on-disk file is always the full deduped set for that date+ticket.-noticed.md -
Stage the
file so it lands in the PR commit.-noticed.md
Skip the write entirely if zero entries remain after dedupe — do not produce an empty
-noticed.md.
Gate Ledger — Phase 3 Complete
After the last task wave's verification gate passes and all tasks are marked complete — but BEFORE the Phase 3→4 handoff — write
Status: COMPLETE and Tasks: N/N complete to the Phase 3 ledger entry. If any task is in a retry/re-dispatch loop, COMPLETE is NOT written until retries resolve.
Phase Handoff: 3 → 4
Before running acceptance tests and code review, verify the gate ledger and write a handoff manifest:
- Gate ledger check: Read
and verify Phase 3 Status isbuild-gate-ledger.md
. If not, follow Enforcement Rules.COMPLETE
Write the handoff manifest:
- Write
with:handoff-3-to-4.md- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 4: HEAD SHA (all tasks committed), design doc path, acceptance test paths (or contract tests), baseline SHA (for
scope), task summary (completed count, escalation outcomes)git diff - Decisions Carried Forward: accumulated decisions from Phases 1-3
- Active Constraints: constraints affecting completion review
- Shed Receipt: per-task review rounds, implementer context, wave verification details → task completion status in task list; per-task review details are shed
- Emit shed statement: "Phase 3 context shed. Working code at HEAD, design doc, and acceptance tests on disk. Per-task implementation context, review rounds, and verification details are not carried forward."
- Update
in pipeline-status.md with manifest contents.## Compression State - Do NOT emit a Compression State Block.
- Session index event: Emit a
event to the outbox:phase_change
.{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 3 -> Phase 4 (Completion)","detail":{"skill":"build","from":"3","to":"4"}}
Phase 4: Completion
After all tasks complete:
-
Write Phase 4 IN_PROGRESS to the gate ledger (after Phase 3 COMPLETE verification).
-
Feature mode: Run acceptance tests from Phase 1 Step 3 — verify they PASS (GREEN). Refactor mode: Run all contract tests from Phase 1 — verify they PASS (GREEN).
- If any fail: implementation is incomplete. Identify what's missing, dispatch implementer to fix, re-run.
- If all pass: feature is verifiably done. Proceed.
-
Run full test suite (unit + integration)
-
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-code-review" before dispatching code review. If the iterative review fix cycle introduces regressions, this is the rollback target.
-
REQUIRED SUB-SKILL: Use crucible:code-review on full implementation (iterative until clean)
-
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-inquisitor" before dispatching inquisitor. If the inquisitor's fix cycle produces regressions, this is the rollback target.
-
REQUIRED SUB-SKILL: Use crucible:inquisitor on full implementation (dispatches 5 parallel dimensions against full feature diff)
- Input:
where base-sha is the commit before Phase 3 execution begangit diff <base-sha>..HEAD - Runs after code review (obvious issues already fixed) and before quality gate (gate reviews final state)
- The inquisitor manages its own fix cycle internally — do not intervene unless it escalates
- See
for full processcrucible:inquisitor
- Input:
-
Conditional: If the inquisitor's fix cycle produced any code changes, re-run crucible:code-review scoped to the inquisitor fix commits only (
)git diff <pre-inquisitor-sha>..HEAD- This is NOT a full implementation re-review — scope it to only the fixer's changes
- Iterative until clean, same as step 3
- Skip if the inquisitor reported all PASS (no fixes were needed) 5.5. CONDITIONAL: Security review via crucible:siege
a. Contract check: If a contract YAML exists for this ticket with
, siege is mandatory — skip to step (d). b. Code scan: If no contract directive (or contract hassecurity_review.status: "required"
or field absent), scan for siege activation signals:security_review.status: "recommended"- Scan targets: design doc content +
(changed file contents)git diff <base-sha>..HEAD - Method: Case-insensitive keyword matching using the 7-category keyword lists from
shared/security-signals.md - Count distinct categories matched (one hit per category is sufficient) c. Threshold evaluation:
- 0 signals: Skip siege silently. No narration needed.
- 1 signal: Log in narration: "1 security signal detected ([category]) — skipping siege. Invoke
manually if needed." Record in manifest and decision journal:/siege --force
.security-review | choice=skip | reason=1 signal ([category]) - 2+ signals: Proceed to step (d). d. Dispatch siege:
- RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-siege" before dispatching siege. If siege's fix cycle produces regressions, this is the rollback target.
- Dispatch
with:crucible:siege- Target: design doc + full implementation diff (artifact type:
)mixed
: from contractdeployment_context
if present, else unset (siege defaults tosecurity_review.deployment_context
)public
- Target: design doc + full implementation diff (artifact type:
- Narration: "Security signals detected: [list categories]. Dispatching siege."
- Decision journal:
security-review | choice=dispatch | reason=[N] signals ([categories]) [or contract-required] - Session index event: Emit to outbox:
e. Blocking behavior: Siege iterates internally until zero Critical + zero High.{"ts":"<now>","seq":0,"type":"security_review","summary":"Siege dispatched: [N] signals detected","detail":{"skill":"build","signals":[categories]}} - If siege completes clean: continue to step 6 (quality-gate)
- If siege escalates (stagnation, user input needed): escalate to user with siege context
- If siege's fix cycle produced code changes: re-run crucible:code-review scoped to siege fix commits only (
). Same pattern as post-inquisitor conditional review at step 5. f. Escape hatches: User can override automatic siege behavior:git diff <pre-siege-sha>..HEAD
— Dispatch siege regardless of signal count. Maps to siege's--force-siege
flag. Decision journal:--forcesecurity-review | choice=force-dispatch | reason=user --force-siege flag
— Suppress siege even when signals/contract require it. Maps to siege's--skip-siege
flag. Decision journal:--skipsecurity-review | choice=force-skip | reason=user --skip-siege flag
-
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-impl-gate" before dispatching the implementation quality gate. If gate fix rounds degrade the code, this is the rollback target.
-
REQUIRED SUB-SKILL: Use crucible:quality-gate on full implementation (artifact type: "code"). Include in the dispatch context:
andPhase: code
. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.) 6b. Verify verdict marker and write Phase 4 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.PipelineID: <current PipelineID> -
RECOMMENDED SUB-SKILL: Use crucible:forge (retrospective mode) — capture what happened vs what was planned 7.5. Chronicle signal fallback: If forge retrospective was skipped (user declined, session ending), append a minimal chronicle signal directly:
- Read the metrics log at
for duration and subagent counts/tmp/crucible-metrics-<session-id>.log - Construct signal:
,v=1
,ts=now
,skill="build"
from acceptance test results,outcome
from metrics log,duration_m
from git,branch
fromfiles_touched
,git diff <base-sha>..HEAD --name-onlymetrics={mode, tasks count, tasks_passed count from task list, stagnation=false} - Append as a single JSON line to
~/.claude/projects/<hash>/memory/chronicle/signals.jsonl - If forge retrospective DID run, skip this step (forge Step 8.5 already emitted the signal)
- Read the metrics log at
-
RECOMMENDED SUB-SKILL: Use crucible:cartographer (record mode) — persist any new codebase knowledge discovered during build
-
Compile summary: what was built, acceptance tests passing, review findings addressed, inquisitor findings, concerns
-
Report to user 10.5. Session index event: Emit a
event to the outbox:skill_end
.{"ts":"<now>","seq":0,"type":"skill_end","summary":"/build complete: <outcome summary>","detail":{"skill":"build","outcome":"success|failure|escalated"}} -
REQUIRED SUB-SKILL: Use crucible:finish — skip finish's Step 2.5 (test-coverage) since test-coverage ran per-task in Phase 3, and skip finish's Step 3 (red-team) since quality-gate already ran at step 6. Tell finish to skip both.
-
Delete pipeline-active marker: Remove
. This signals that the pipeline completed successfully. If deletion fails (permissions, missing file), log a warning but do not fail the pipeline.<scratch>/.pipeline-active
Session Metrics
Throughout the pipeline, the orchestrator appends timestamped entries to
/tmp/crucible-metrics-<session-id>.log on each subagent dispatch and completion.
Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from
shared/dispatch-convention.md:
- Before dispatching: Measure the dispatch file size in characters. Record
andinput_chars
in the manifest entry.model_tier - After dispatch returns: Measure the subagent response length in characters. Record
andoutput_chars
(if available) in the manifest completion entry.tool_calls
At completion (before reporting to user, i.e. step 9), read the metrics log and manifest, then compute:
-- Pipeline Complete ---------------------------------------- Subagents dispatched: 23 (14 Opus, 7 Sonnet, 2 Haiku) Active work time: 2h 47m Wall clock time: 11h 13m Quality gate rounds: 4 (design: 2, plan: 1, impl: 1) Siege: dispatched (3 agents, 2 rounds, 0 Critical, 0 High) | skipped (0 signals) | skipped (1 signal: auth) Task tiers: 3 Tier 1, 3 Tier 2, 2 Tier 3 Subagent savings: ~21 dispatches skipped vs all-Tier-3 Est. input tokens: ~32,100 (128,400 chars) Est. output tokens: ~20,500 (82,000 chars) Token estimate note: Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%. -------------------------------------------------------------
Metrics tracked:
- Total subagents dispatched (by type and model tier: Opus/Sonnet/Haiku)
- Active work time (merge overlapping parallel intervals — NOT naive sum)
- Wall clock time (first dispatch to final completion)
- Quality gate rounds (per gate: design, plan, implementation)
- Siege status (dispatched with agent count, rounds, and final severity counts — or skipped with signal count and reason)
- Estimated input tokens (sum of
from manifest / 4)input_chars - Estimated output tokens (sum of
from manifest / 4)output_chars
Efficiency summary computation: Read
manifest.jsonl from the dispatch directory. Sum input_chars and output_chars across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by model_tier. Include these in the pipeline completion report alongside existing metrics.
Gate tracking verification: Before compiling the pipeline summary (Phase 4 Step 9), verify that all three gate categories (design, plan, implementation) show round count >= 1 with clean final rounds (0 Fatal, 0 Significant). If any gate was skipped with explicit user approval, record it as
USER_SKIP in the metrics. A zero without user approval indicates a gate was dropped — report this in the summary.
Pipeline Decision Journal
Alongside the metrics log, maintain a decision journal at
/tmp/crucible-decisions-<session-id>.log. Append a structured entry for every non-trivial routing decision:
[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>
Decision types to capture:
— why Opus vs Sonnet for this reviewerreviewer-model
-- tier assignment read from plan, runtime escalation reason if applicablereview-tier
— issue count, severity shifts, progress/stagnation per roundgate-round
— why the orchestrator escalated to user (and user's decision)escalation
— parallelism decisions for wave executiontask-grouping
— what de-sloppify removed and accept/reject decisioncleanup-removal
Escalation Triggers (Any Phase)
STOP and ask the user when:
- Architectural concerns in plan or code review
- Review loop stagnation (same or more issues after fixes — any phase)
- Test suite failures not obviously fixable
- Multiple teammates fail on different tasks
- Teammate reports context pressure at 50%+ with significant work remaining
- When escalating for regression or stagnation AND a checkpoint exists for the current phase boundary: include "A checkpoint from [reason] is available. Restore to pre-regression state?" in the escalation message.
Minor issues: Log, work around, include in final report.
What the Lead Should NOT Do
- Implement code (dispatch implementers)
- Read large files (spawn Haiku researcher)
- Debug failing tests (dispatch implementer)
- Make architectural decisions (escalate to user)
Context Management
- One task per agent — always spawn a fresh implementer for each task. Never send a second task to a running agent via SendMessage. Reusing agents accumulates context and causes exhaustion.
- "2-3 per subagent, ~10 files max" refers to plan design — group small steps into one task at planning time, not sequential dispatch to a running agent
- Lead stays thin — coordination only
- All important state on disk (plan files, task list)
- Teammates report at 50%+ context usage
- Lead compaction acceptable — task list is source of truth
- Agent teams unavailable: If agent teams are not enabled, the lead dispatches tasks sequentially via Agent tool. Task tracking still uses TaskCreate/TaskUpdate. The pipeline is slower but functionally identical.
Prompt Templates
— Phase 1 acceptance test generation./acceptance-test-writer-prompt.md
— Phase 1 PRD generation from design doc./prd-writer-prompt.md
— Phase 2 plan writer dispatch./plan-writer-prompt.md
— Phase 2 plan reviewer dispatch./plan-reviewer-prompt.md
— Phase 3 implementer dispatch./build-implementer-prompt.md
— Phase 3 reviewer dispatch./build-reviewer-prompt.md
— Phase 3 de-sloppify cleanup dispatch./cleanup-prompt.md
— Phase 3 test gap writer dispatch./test-gap-writer-prompt.md
— Mid-plan checkpoint./architecture-reviewer-prompt.md
— Phase 1 refactor-mode contract test generation./contract-test-writer-prompt.md
— Phase 3 refactor-mode implementer addendum (appended to build-implementer-prompt)./refactor-implementer-addendum.md
Red-team, innovate, adversarial tester, and inquisitor prompts live in their respective skills:
—crucible:red-teamskills/red-team/red-team-prompt.md
—crucible:innovateskills/innovate/innovate-prompt.md
—crucible:adversarial-testerskills/adversarial-tester/break-it-prompt.md
—crucible:inquisitorskills/inquisitor/inquisitor-prompt.md
Quality Gate Orchestration
Build is the outermost orchestrator and controls all quality gates via
crucible:quality-gate. Quality gate wraps crucible:red-team internally — do NOT invoke red-team separately at these points.
Gate points in the pipeline:
| Pipeline Stage | Artifact Type | Replaces |
|---|---|---|
| Phase 1, Step 2 (after design) | design | Existing on design |
| Phase 2, Step 3 (after plan review) | plan | Existing on plan |
| Phase 4, Step 6 (after inquisitor + conditional re-review) | code | Existing on implementation |
Code review (
crucible:code-review) and inquisitor (crucible:inquisitor) remain separate from the quality gate — code-review does structured quality checks, inquisitor writes cross-component adversarial tests, and the quality gate does adversarial artifact review. All three serve distinct purposes.
Contract-Aware Quality Gate
When a contract YAML exists for the current ticket, the quality gate adds contract verification to its checks. This applies at all gate points (design, plan, and code), though most contract checks are only meaningful at the code gate (Phase 4, Step 6).
-
Version check: Before processing a contract, verify the
field isversion
. If the version is missing or unrecognized, reject the contract with a clear error: "Contract version [X] is not supported. Expected version 1.0." Do not proceed with contract-aware checks — fall back to standard quality gate behavior without contract awareness."1.0" -
Checkable invariant verification: For each
invariant in the contract, verify satisfaction using the declaredcheckable
:check_method
— pattern match (or absence) in production code. Run the grep and confirm the result matches the invariant'sgrep
description.verification
— read and reason about the relevant code to confirm the invariant holds (e.g., idempotency, no side effects).code-inspection
— check that file existence, location, or organization matches the constraint.file-structure
-
Testable invariant verification: For each
invariant in the contract:testable- Verify that a test tagged with the declared
(pattern:test_tag
) exists in the test suite.contract:<category>:<id> - Verify that the tagged test passes when run.
- A missing or failing tagged test is a contract violation.
- Verify that a test tagged with the declared
-
Contract violations are blocking issues. Contract violations are NOT warnings — they have the same severity as architectural concerns and must be resolved before the gate passes. The quality gate's iterative fix loop applies: dispatch fixes, re-check, track progress/stagnation as normal.
Red Flags
- Skipping Compression State Block emission at checkpoint boundaries
- Emitting a Compression State Block at a phase boundary (1→2, 2→3, 3→4) instead of writing a handoff manifest
- Skipping the shed statement after a manifest write
- Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
- Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
- Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
- Skipping a REQUIRED quality gate because the task seems "small", "simple", or "trivial"
- Self-assessing that a quality gate is unnecessary based on perceived task complexity
- Rationalizing that quality-gate findings would be "minor" as justification to skip
- Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
- Short-circuiting the quality-gate iteration loop by assuming fixes are self-evidently correct
- Interpreting general user feedback as approval to skip a quality gate that has not yet run — once a gate has run and presented findings to the user, the user's decision to proceed is authoritative. Pre-gate skip approval must be an unambiguous instruction specifically referencing the gate.
- Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)
Integration
Required sub-skills:
- crucible:design — Phase 1
- crucible:finish — Phase 4
- crucible:quality-gate — Iterative red-teaming at each quality gate point
- crucible:red-team — Adversarial review engine (invoked by quality-gate)
- crucible:innovate — Creative enhancement before quality gates
- crucible:inquisitor — Full-feature cross-component adversarial testing (Phase 4, after code-review, before quality-gate)
Recommended sub-skills:
- crucible:forge — Feed-forward at Phase 1 start, retrospective at Phase 4 completion
- crucible:cartographer — Consult at Phase 1 start, load at Phase 3 dispatches, record at Phase 4
- crucible:checkpoint — Shadow git checkpoints at pipeline boundaries (pre-design-gate, pre-plan-gate, pre-wave-N, pre-cleanup-task-N, pre-code-review, pre-inquisitor, pre-impl-gate)
Recon/assay context: Inherits recon/assay context through /design (Phase 1). No direct dispatch. When design integrates recon, build benefits automatically. See #147 for rationale.
Phase 3 sub-skills (dispatched per-task):
- crucible:test-coverage — Test alignment audit after each task's test quality review (staleness, dead tests, coincidence tests)
Implementer sub-skills:
- crucible:test-driven-development — TDD within each task
- crucible:source-driven-development — Detect → Fetch → Implement → Cite loop for non-trivial external API usage (≥ 5 LOC touching a detected framework); invoked by the implementer prompt's Source Consultation block. Recommended — skipped for pure internal refactors or trivial edits.
Contract consumption:
- crucible:spec — Consumes contract YAML files produced by
(schema version 1.0). Contracts are read from/spec
and feed into pre-existing doc detection (Phase 1 Step 0), implementer dispatch (Phase 3), reviewer checks (Phase 3), and quality gate verification (all gate points). Seedocs/plans/*-contract.yaml
for field definitions.crucible:spec/contract-schema.md