Crucible build

Use when starting any feature development, building new functionality, implementing a design, or going from idea to working code. Triggers on "build", "implement", "add feature", or any task requiring design-through-execution.

install

source · Clone the upstream repo

git clone https://github.com/raddue/crucible

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/raddue/crucible "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/build" ~/.claude/skills/raddue-crucible-build && rm -rf "$T"

manifest: skills/build/SKILL.md

source content

Build

Overview

All subagent dispatches use disk-mediated dispatch. See

shared/dispatch-convention.md

for the full protocol.

End-to-end development pipeline: interactive design, autonomous planning with adversarial review, team-based execution with per-task code and test review. One command, idea to completion.

Announce at start: "I'm using the build skill to run the full development pipeline."

Session index event: At startup, if session indexing is active (session index path discoverable via glob), emit a

skill_start

event to the outbox:

{"ts":"<now>","seq":0,"type":"skill_start","summary":"Starting /build for <user goal>","detail":{"skill":"build","goal":"<user goal>"}}

. See

skills/shared/session-index-convention.md

for the outbox pattern.

Guiding principle: Quality over velocity. This pipeline produces correct, well-integrated, maintainable output — even if slower. Parallel execution is available for independent work, but sequential with quality gates is the default.

Communication Requirement (Non-Negotiable)

Between every agent dispatch and every agent completion, output a status update to the user. This is NOT optional — the user cannot see agent activity without your narration.

Every status update must include:

Current phase — Which pipeline phase you're in
What just completed — What the last agent reported
What's being dispatched next — What you're about to do and why
Task checklist — Current status of all tasks (pending/in-progress/complete)

After compaction: If you just experienced context compaction, re-read the task list from disk and output current status before continuing. Do NOT proceed silently.

Examples of GOOD narration:

"Phase 3, Task 4 complete. Reviewer found 2 Important issues — dispatching implementer to fix. Tasks: [1] ✓ [2] ✓ [3] ✓ [4] fixing [5-8] pending"

"Phase 2 complete. Plan passed review with 0 issues on round 2. Dispatching innovate on the plan."

This requirement exists because: Long-running autonomous pipelines can run for hours. Without narration, the user sees nothing but a spinner. They can't assess progress, can't decide whether to intervene, and can't learn from the pipeline's decisions.

Pipeline Discipline (Non-Negotiable)

NEVER skip quality gate steps. Every artifact must pass its quality gate before proceeding to the next phase. No exceptions, no shortcuts.

BLOCK semantics: Phase transitions are gated. You CANNOT proceed from Phase 1→2, 2→3, or 3→4 without the gate for that phase passing. If a gate fails, fix the issues and re-run the gate. Do not silently skip a gate because "it looks fine" or "we already reviewed it."

If you find yourself about to skip a gate: STOP. Re-read this section. The gate exists because skipping it has caused real production incidents and hours of wasted time. Run the gate.

Anti-Rationalization Table — build

Rationalization	Rebuttal	Rule
"This task is small/simple/trivial, the quality gate would just find nits."	Small changes have the same bug density per line as large ones. QG has never run on a Crucible artifact without finding at least one real issue.	Run the quality gate on every phase artifact, regardless of size.
"Phase N looks fine, I can skip the gate and move on."	Self-assessment of artifact quality is exactly the bias the gate exists to counter. "Looks fine" is the failure mode, not a pass criterion.	Phase transitions are BLOCKED without a verified PASS verdict marker for the prior phase.
"The fix agent addressed the findings, so the gate is done."	Fixing is not passing. Fix rounds routinely introduce new issues or incompletely resolve old ones. A clean verification round is required.	The gate is only complete after a fresh red-team round returns 0 Fatal, 0 Significant.
"The user said 'looks good' / 'move on' — that's approval to skip the gate."	General feedback is not skip approval. Only an unambiguous instruction that explicitly references the gate counts.	Require literal `SKIP GATE` (or equivalent explicit phrase) before recording `Status: SKIPPED` .
"I can fix this one finding myself instead of dispatching a fix agent."	Orchestrator-applied fixes conflate coordination with remediation and bypass the fix journal. Every fix — even trivial — goes through a fix agent.	Orchestrator never edits the artifact directly; always dispatch the fix agent.
"Innovate/red-team seem redundant on top of the quality gate, I'll skip them."	They are not redundant. Innovate is divergent; red-team is adversarial; QG is iterative remediation. Skipping any one of them is a documented regression ( `feedback_never_skip_gates` ).	Run innovate and red-team on every artifact, every time.
"I'll just finish the task list and narrate at the end."	Long-running autonomous pipelines are invisible without narration. Silent runs prevent the user from intervening or learning.	Narrate before every dispatch and after every completion — non-negotiable.

Gate Ledger Protocol

Tamper-evident audit trail for phase transitions and gate verdicts. This is defense-in-depth — it raises the cost of gate-skipping from zero to nonzero by requiring structured state to be maintained and verified. An external enforcement hook (

gate-ledger-guard.sh

) provides mechanical enforcement by blocking unauthorized PASS writes.

File location:

~/.claude/projects/<project-hash>/memory/build-gate-ledger.md

Relationship to pipeline-status.md: pipeline-status.md is ambient user awareness (overwritten at every narration point). build-gate-ledger.md is the gate verdict audit trail (updated per phase as gates pass). Both are needed; neither replaces the other.

PipelineID Generation

At pipeline start, generate a PipelineID via

date -u +build-%Y%m%d-%H%M%S

. This ID:

Is persisted in the ledger header
Is passed to quality-gate invocations as
```
pipeline_id
```
Is used by the enforcement hook to cross-check verdict markers
Is unique per build run (timestamp-based)

Ledger Format

# Build Gate Ledger
Run: <ISO-8601 timestamp>
PipelineID: <build-YYYYMMDD-HHMMSS>
Goal: <user request>
Mode: <feature | refactor>

## Phase 1: Design
Status: NOT_STARTED

## Phase 2: Plan
Status: NOT_STARTED

## Phase 3: Execute
Status: NOT_STARTED

## Phase 4: Completion
Status: NOT_STARTED

Format constraints:

One key-value pair per line:
```
Key: value
```

Fixed key names:

Status

Gate

Artifact

Tasks

Reason

Acknowledged

PipelineID

Status values:

NOT_STARTED

IN_PROGRESS

PASS

COMPLETE

FAIL

SKIPPED

INFERRED

Phase headers are
```
## Phase N: Name
```
— always 4 phases, always in order
No prose, no paragraphs, no nested structure

Ledger Initialization

Runs during build startup, after mode detection but before Phase 1 begins:

Check for existing ledger at canonical path
If found: run Run Isolation checks (see below)
If not found (or user chose "start fresh"): write new ledger including
```
Run
```
,
```
PipelineID
```
,
```
Goal
```
, and
```
Mode
```
header fields, then all four phases with
```
Status: NOT_STARTED
```
The ledger MUST exist before Phase 1 transitions to
```
IN_PROGRESS
```

After writing any ledger (fresh or reconstructed), immediately re-read the ledger header to extract the PipelineID into the active in-memory state. This is a defensive consistency practice — ensures the in-memory value always matches the persisted value.

Run Isolation

Stale detection prevents cross-run contamination:

Compaction recovery (same run): If pipeline-status.md
```
Started
```
timestamp matches the ledger's
```
Run
```
timestamp, this is the same build run recovering from compaction. Auto-resume without prompting.
New session with existing ledger: If the ledger exists but pipeline-status.md is missing or its
```
Started
```
timestamp doesn't match the ledger's
```
Run
```
, prompt: "Found existing ledger for '[goal]' (started [timestamp], Phase N [status]). Resume this run? [y/n]". On "no", archive the old ledger via Bash
```
mv
```
to
```
build-gate-ledger-<old-timestamp>.md
```
. If the target filename already exists, append a counter suffix (
```
-2
```
,
```
-3
```
, etc.).
No existing ledger: Create fresh.

Orphan Cleanup

Requires: Active PipelineID established (from Ledger Initialization + Run Isolation). This step runs AFTER the resume/fresh decision is resolved.

Scan

~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md

for verdict markers. Delete any whose

PipelineID

does not match the active PipelineID. If resuming: use the resumed build's PipelineID (from the existing ledger). If starting fresh: use the newly generated PipelineID.

Note: If the session is recovering via INFERRED reconstruction (new PipelineID generated), markers from the old run will be cleaned up. This is intentional — the design requires a fresh QG run for INFERRED→PASS upgrade, not reuse of old markers.

Timestamps and File Operations

Timestamps: Obtained via Bash
```
date -u +%Y-%m-%dT%H:%M:%S
```
(Bash is allowed for
```
date
```
commands that don't reference
```
.claude/
```
paths)
Ledger archival (rename): Uses Bash
```
mv
```
since Write/Read/Edit/Glob have no rename capability
All other ledger operations (create, read, update): MUST use Write and Read tools, NOT Bash. This is a hard constraint due to
```
.claude/
```
path restrictions.

Enforcement Rules

Before each phase transition, read

build-gate-ledger.md

and check the previous phase's status:

Gate check: If the previous phase's Status is NOT in {

PASS

COMPLETE

(Phase 3 only),

SKIPPED

with

Acknowledged: true

}, output:

PHASE GATE BLOCKED: Cannot start Phase N — Phase N-1 gate has not passed.
Current state: [status]
Run the quality gate on Phase N-1's artifact before proceeding.

This means

INFERRED

IN_PROGRESS

FAIL

, and

NOT_STARTED

all trigger BLOCKED.

Phase 1 exception: Phase 1 (Design) has no predecessor gate — it always starts.
Phase 3 exception: Phase 3 transitions to
```
COMPLETE
```
(not
```
PASS
```
) when all tasks are done and per-task code reviews pass.
```
COMPLETE
```
satisfies the gate requirement for Phase 4. No verdict marker is required for Phase 3.

Verdict Marker Verification

After quality-gate returns with a verdict, verify the verdict marker before writing to the ledger:

Glob for verdict markers:

~/.claude/projects/<project-hash>/memory/quality-gate/gate-verdict-*.md

Filter by
```
PipelineID
```
match — only markers with the current build's PipelineID
Sort by the
```
Timestamp
```
field value inside the marker file (parsed as ISO-8601), take the most recent
Verify: marker exists,
```
Verdict
```
is
```
PASS
```
,
```
PipelineID
```
matches current build's PipelineID
If verification passes: write
```
PASS
```
to the ledger with
```
Gate
```
timestamp and
```
Artifact
```
path
If verification fails:
- Normal flow (marker missing/mismatched after a just-run gate): do NOT write PASS. Output warning and re-invoke quality-gate on the same artifact.
- INFERRED recovery (PipelineID mismatch or missing marker on an INFERRED phase): prompt the user for the artifact path, then offer to run the gate or type SKIP GATE.
After writing the ledger entry, delete the verdict marker (it has served its purpose). This applies to all verdict outcomes — PASS, FAIL, STAGNATION, and ESCALATED markers are all deleted after the corresponding ledger entry is written. [PLAN ADDITION — extends the design doc's PASS-only deletion to all verdict outcomes for cleanliness.]

Skip Escape Hatch

If the user explicitly wants to bypass a gate:

Example of a SKIPPED phase in the ledger:

## Phase 2: Plan
Status: SKIPPED
Gate: 2026-04-13T15:00:00
Reason: User requested skip
Acknowledged: true

Confirmation protocol: [Default: option (a) — separate-turn required, matching the design doc's two-step flow. User may override to option (b) before implementation.]

The orchestrator outputs: "Gate skip requested. Type
```
SKIP GATE
```
to confirm. This will be logged."
The orchestrator halts execution and waits. The user's NEXT message must contain exactly
```
SKIP GATE
```
. A
```
SKIP GATE
```
token in the same message as the skip request does NOT satisfy the confirmation requirement.
The orchestrator writes
```
Status: SKIPPED
```
with
```
Reason
```
field to the ledger.

Per-phase acknowledgment: SKIPPED requires one acknowledgment per phase, not per boundary. Before starting Phase N, the orchestrator checks all prior phases. Any prior phase with

Status: SKIPPED

that has not yet been

Acknowledged: true

triggers the BLOCKED message. The user types

SKIP GATE

once per skipped phase, and the ledger records

Acknowledged: true

. Subsequent boundaries do not re-prompt for already-acknowledged skips.

Missing artifact handling: If a phase was SKIPPED because no artifact was produced, retroactive gating requires the user to supply the artifact path: "To run the gate on Phase N, provide the artifact path." If no artifact exists, retroactive gating is not possible — the phase remains SKIPPED.

Recovery from SKIPPED: If the user later wants to properly gate a skipped phase, they can ask to "run the gate on Phase N." The orchestrator transitions

SKIPPED → IN_PROGRESS

, runs the quality gate on the phase's artifact, and writes the result normally.

Phase 4 completion warning: If ANY prior phase has

Status: SKIPPED

, Phase 4 outputs a prominent warning listing all skipped gates before presenting finish options.

State Machine

Phase 1: Design
  NOT_STARTED → IN_PROGRESS (design skill starts)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates — stagnation/regression)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE — does NOT unlock next phase without acknowledgment)
  SKIPPED → IN_PROGRESS (user asks to run the gate retroactively)
  INFERRED → IN_PROGRESS (user runs gate after compaction recovery)
  INFERRED → SKIPPED (user types SKIP GATE after compaction recovery)

Phase 2: Plan
  NOT_STARTED → IN_PROGRESS (requires Phase 1 Status = PASS or SKIPPED+Acknowledged)
  [same transitions as Phase 1]

Phase 3: Execute (no quality gate — uses COMPLETE instead of PASS)
  NOT_STARTED → IN_PROGRESS (requires Phase 2 Status = PASS or SKIPPED+Acknowledged)
  IN_PROGRESS → COMPLETE (all tasks done, per-task reviews passed, verification gates green)
  IN_PROGRESS → FAIL (task failures, user escalation)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  Note: Phase 3 has no QG invocation. COMPLETE satisfies Phase 4's gate requirement.

Phase 4: Completion
  NOT_STARTED → IN_PROGRESS (requires Phase 3 Status = COMPLETE or SKIPPED+Acknowledged. PASS is unreachable for Phase 3.)
  IN_PROGRESS → PASS (quality gate verdict marker verified)
  IN_PROGRESS → FAIL (quality gate escalates)
  FAIL → IN_PROGRESS (user directs re-work)
  * → SKIPPED (user types SKIP GATE)
  SKIPPED → IN_PROGRESS (user asks to run retroactively)
  IN_PROGRESS includes: emit skip warnings if any prior phase SKIPPED

Compaction Recovery (Ledger)

build-gate-ledger.md is on disk and survives compaction. Recovery precedence when state is partial:

Ledger exists, handoff manifest missing: Use ledger to determine which phase to resume from. Prompt: "Gate ledger shows Phase N passed, but the phase handoff context was lost. Confirm resume from Phase N+1?" If PASS but no handoff, also prompt for Phase N inputs (design doc path, plan path, etc.) before proceeding.
Handoff manifest exists, ledger missing: Reconstruct ledger from manifests. Mark the current phase as
```
INFERRED
```
(not
```
PASS
```
). Mark predecessor phases as
```
PASS
```
(handoff existence proves the boundary was crossed). Generate a new PipelineID and write it to the reconstructed ledger header. After writing, re-read the ledger header to extract the PipelineID into active state. INFERRED phases trigger the gate-blocked check — the orchestrator must run a fresh quality gate (with matching PipelineID) or the user must type SKIP GATE.
Both missing: Fresh start. Prompt user.

Quality Gate Requirement (Non-Negotiable)

Every quality gate in this pipeline MUST run to completion. This is NOT optional — you may NOT self-assess whether a quality gate is "needed" based on task size, complexity, or scope.

Quality gates are unconditional at all three gate points:

Phase 1, Step 2 — Design doc gate
Phase 2, Step 3 — Plan gate
Phase 4, Step 6 — Implementation gate

Common rationalizations that are NEVER valid reasons to skip:

"This is a small change"
"This is trivial / simple / straightforward"
"This is just a config change / documentation update / one-liner"
"The quality gate won't find anything on something this simple"
"I fixed the findings, so the gate is done" — fixing findings is NOT the same as passing the gate. The iteration loop must complete with a clean verification round (0 Fatal, 0 Significant on a fresh review). Fix agents introduce new issues or incompletely resolve old ones — that is why fresh-eyes re-review exists.

This requirement exists because: Quality gates consistently find issues the pipeline misses regardless of task size. There is no category of task that is immune. In observed runs, tasks self-assessed as "trivial" had the same defect rate as complex tasks. The only way to skip a quality gate is with explicit user approval — an unambiguous instruction specifically referencing the gate, not general feedback like "looks good" or "move on."

Pipeline Status

Write a status file to

~/.claude/projects/<hash>/memory/pipeline-status.md

at every narration point. This file is overwritten (not appended) and provides ambient awareness for the user in a second terminal.

Write Triggers

Write the status file at every point where the Communication Requirement mandates narration: before dispatch, after completion, phase transitions, health changes, escalations, and after compaction recovery.

Status File Format

The status file uses this structure (overwritten in full each time):

# Pipeline Status
**Updated:** <current timestamp>
**Started:** <timestamp from first write — persisted across compaction>
**Skill:** build
**Phase:** <current phase, e.g. "3 — Execute (Autonomous)">
**Health:** <GREEN|YELLOW|RED>
**Suggested Action:** <omit when GREEN; concrete one-sentence action when YELLOW/RED>
**Elapsed:** <computed from Started>

## Recent Events
- [HH:MM] <most recent event>
- [HH:MM] <previous event>
(last 5 events, newest first)

Skill-Specific Body

Append after the shared header:

## Task Progress
| # | Task | Tier | Status | Duration |
|---|------|------|--------|----------|
| 1 | Auth middleware | T3 | DONE | 12m |
| 2 | Route handlers | T2 | IN REVIEW (code, pass 1) | 18m+ |
| 3 | Database layer | T1 | PENDING | — |

## Quality Gates
- Design: PASSED (2 rounds)
- Plan: PASSED (1 round)
- Task tiers: 1x T1, 1x T2, 1x T3
- Code: not yet reached

## Checkpoints
- Last checkpoint: pre-wave-3 (12:45:30)
- Total checkpoints: 7
- Shadow repo: healthy

## Compression State
Goal: [original user request]
Key Decisions:
- [accumulated decisions, max 10]
Active Constraints:
- [constraints affecting remaining work]
Next Steps:
1. [immediate next action]
2. [subsequent actions]

The Compression State section is a semantic subset of the full Compression State Block emitted into the conversation. It omits Files Modified (recoverable from git) and Scratch State (fixed per skill). It is the first section read during compaction recovery.

Health State Machine

Health transitions are one-directional within a phase: GREEN -> YELLOW -> RED. Phase boundaries reset to GREEN.

Phase boundaries (reset to GREEN): Phase 1->2, 2->3, 3->4
YELLOW: review loop round 3+, quality gate round 5+, retry in progress
RED: escalation pending, stagnation detected, test suite failure unresolved

When health is YELLOW or RED, include

**Suggested Action:**

with a concrete, context-specific sentence (e.g., "Code review looping on Task 4. Check recent events for recurring patterns.").

Inline CLI Format

Output concise inline status alongside the status file write:

Minor transitions (dispatch, completion): one-liner, e.g.

Phase 3 [4/8] Task 4 IN REVIEW (pass 1) | GREEN | 1h 12m

Phase changes and escalations: expanded block with
```
---
```
separators
Health transitions: always expanded with old -> new health

Compaction Recovery

After compaction, before re-writing the status file: 0. Read the

## Compression State

section from

pipeline-status.md

— recover Goal, Key Decisions, Active Constraints, and Next Steps. If the section is absent (pre-update pipeline), skip to step 1.

0.5. Check for handoff manifests (

handoff-*-to-*.md

) in the scratch directory. If the most recent manifest exists, use its Inputs, Decisions, and Constraints to reconstruct state for the current phase — this supersedes the Compression State section for phase-boundary recovery. If no manifest exists, continue with CSB-based recovery.

Read the rest of
```
pipeline-status.md
```
to recover
```
Started
```
timestamp and
```
Recent Events
```
buffer
Reconstruct phase, health, and skill-specific body from internal state files
If crucible:checkpoint was used: verify checkpoint availability by checking for the shadow repo at the computed path. Log available checkpoint count. Do not restore — just confirm checkpoints are recoverable.
Emit a Compression State Block into the conversation to seed the new context window with recovered state 4.5. Read session index summary (supplementary): If the CSB Scratch State contains a
```
Session Index:
```
path, or if globbing
```
~/.claude/projects/<hash>/memory/session-index/*/summary.md
```
finds a recent file, read
```
summary.md
```
. Include the Activity Timeline, Files Modified, and Key Decisions sections in the post-compaction narration. If no session index exists, skip silently — this step is purely additive. If
```
summary.md
```
lacks detail for a specific event type (e.g., errors, decisions, file changes), use
```
/recall
```
to query
```
events.jsonl
```
with filters for targeted recovery.
Write the updated status file
Output inline status to CLI

Compression State Block

At checkpoint boundaries (see Checkpoint Timing below), emit the following structured block into the conversation. This block signals to the auto-compactor which state is critical to preserve. Also persist the semantic subset (Goal, Key Decisions, Active Constraints, Next Steps) to the

## Compression State

section of pipeline-status.md.

===COMPRESSION_STATE===
Goal: [original user request, one sentence]
Skill: [skill name]
Phase: [current phase identifier]
Health: [GREEN|YELLOW|RED]
Mode: [skill-specific mode if applicable, omit otherwise]

Progress:
- [completed milestone 1]
- [completed milestone 2]
- [current work in progress]

Key Decisions (this session):
- [DEC-1] [decision]: [reasoning, one line]
- [DEC-2] [decision]: [reasoning, one line]

Active Constraints:
- [constraint that affects remaining work]
- [constraint from prior phase that still applies]

Files Modified:
- [file path]: [what changed, one line]

Scratch State:
- Location: [scratch directory path]
- Session Index: [~/.claude/projects/<hash>/memory/session-index/<session-id>/ if active, omit if not]
- Recovery: [which files to read first, in order]

Next Steps:
1. [immediate next action]
2. [action after that]
3. [remaining work summary]
===END_COMPRESSION_STATE===

Rules:

Key Decisions list is capped at 10. When adding an 11th, compress the oldest low-impact decision into a single-line Progress entry annotated "[compressed from decisions]".
Each Compression State Block includes the FULL accumulated decision list, not just new decisions since the last block. Decisions accumulate across compressions.
Progress entries are cumulative — include all completed milestones, not just since the last block.
Files Modified lists only files changed since the last block emission. On first block of a session, list all files changed so far.
Goal must be the original user request verbatim or a faithful one-sentence paraphrase. Do not let it drift across compressions.

Checkpoint Timing

Emit a Compression State Block into the conversation AND update the

## Compression State

section in pipeline-status.md at these points:

Phase transitions: 1→2, 2→3, 3→4 — emit a Phase Handoff Manifest (see below) instead of a Compression State Block at these points
Phase 3 progress: After every 3 task completions
Quality gate entry/exit: Before first quality gate round dispatch and after gate completes (pass or escalation)
Escalations: Before any escalation to user
Health transitions: On any GREEN->YELLOW or YELLOW->RED transition

These triggers are a superset of the existing pipeline-status.md write triggers. The Compression State Block is emitted alongside (not instead of) the normal narration and status file write.

Phase Handoff Manifest

At phase boundaries (1→2, 2→3, 3→4), write a handoff manifest to the scratch directory instead of emitting a Compression State Block. The manifest defines exactly what the next phase needs — an allowlist, not a denylist. Everything not on the manifest is shed.

Format:

# Phase Handoff: N → M
**Timestamp:** ISO-8601
**Goal:** [original user request, verbatim]
**Mode:** feature | refactor

## Inputs for Phase M
- **[Input name]:** [disk path or inline value]

## Decisions Carried Forward
- [DEC-N] [decision]: [reasoning, one line]

## Active Constraints
- [constraint affecting remaining work]

## Shed Receipt
- [what was shed] → [where it lives on disk]

Rules:

After writing the manifest, emit an explicit shed statement: list what context is no longer needed, where it lives on disk, and that the orchestrator operates from manifest inputs only going forward.
After writing the manifest, update the
```
## Compression State
```
section in pipeline-status.md with the manifest contents (Goal, Decisions, Constraints, and the Inputs as Next Steps). This ensures compaction recovery can reconstruct state even if the manifest is lost.
CSBs continue at all non-boundary checkpoint triggers (intra-phase progress, quality gate entry/exit, escalations, health transitions).
Backward compatibility: If a handoff manifest does not exist at a recovery point, fall back to CSB-based recovery (existing behavior).

Mode Detection

Before dispatching the design skill, determine whether this build is:

Feature mode (default) — adding new capability. Success = new acceptance tests pass.
Refactor mode — restructuring existing code. Success = existing behavior preserved + structural goals met.

Detection: If the user's intent is ambiguous, ask directly before proceeding:

"Is this adding new behavior, or restructuring existing code without changing what it does?"

The user's answer sets the mode for the entire pipeline. No special syntax needed.

Mode Propagation

Propagate refactor mode to subagents through:

New refactor-specific prompt templates —
```
contract-test-writer-prompt.md
```
and
```
refactor-implementer-addendum.md
```
are standalone files used only in refactor mode. Select these instead of (or in addition to) the feature-mode equivalents.
Appended context blocks — For existing prompts that serve both modes (
```
plan-writer-prompt.md
```
,
```
build-implementer-prompt.md
```
), append a "Refactor Mode Context" section when composing the dispatch file. The templates remain flat markdown — the orchestrator decides what to include.
Scratch file for compaction recovery — Persist the current mode in
```
/tmp/crucible-build-mode.md
```
containing
```
mode: refactor
```
or
```
mode: feature
```
plus the baseline commit SHA. Only one build runs per session, so a well-known filename is sufficient.

Compaction Recovery

Build's existing compaction step must read the Compression State FIRST (step 0 from Pipeline Status Compaction Recovery), then the mode file, before re-reading the task list or any other state. On resumption after compaction:

Read
## Compression State
from pipeline-status.md — recover goal, decisions, constraints, next steps. 0.5. Check for handoff manifests (
```
handoff-*-to-*.md
```
) in the scratch directory. If the most recent manifest exists, use its Inputs and Mode to bootstrap recovery — this supersedes the mode file for phase-boundary state.
Read
/tmp/crucible-build-mode.md
— recover mode and baseline commit SHA.
If file is missing: Default to feature mode and warn.
If mode is
refactor
: Verify baseline commit SHA exists.
Read
build-gate-ledger.md
— if it exists, apply Gate Ledger Compaction Recovery (see Compaction Recovery subsection under Gate Ledger Protocol). Use the ledger's phase statuses to determine the resume point. If the ledger is missing but handoff manifests exist, reconstruct with INFERRED status.
After mode and ledger are recovered: Proceed with general state reconstruction (task list, phase, health).

Phase 1: Design (Interactive)

Step -1: Resume Detection and Pipeline-Active Marker

Before any design or dispatch work, check for a crashed prior pipeline:

Check
<scratch>/.pipeline-active
(where

<scratch>

~/.claude/projects/<hash>/memory/

)

Not found: Write the pipeline-active marker (JSON with
```
pipeline_id
```
set to current session ID,
```
skill
```
set to
```
"build"
```
,
```
phase
```
set to
```
"1"
```
,
```
start_time
```
set to current ISO-8601 timestamp,
```
scratch_dir
```
set to the scratch directory path,
```
dispatch_dir
```
set to the dispatch directory path,
```
branch
```
from
```
git branch --show-current
```
,
```
baseline_sha
```
from
```
git rev-parse HEAD
```
). Proceed to Step 0.
Found, same
pipeline_id
as current session: This is a compaction recovery scenario. Follow existing compaction recovery procedures. Do not re-write the marker.
Found, different
pipeline_id
: a. Branch guard: Compare marker's
```
branch
```
field against current
```
git branch --show-current
```
. If they differ, warn: "Previous build on branch [marker.branch] crashed at Phase [phase]. You are currently on [current-branch]. Switch to [marker.branch] before resuming? [switch+resume / start fresh / abort]". Do NOT offer resume on the wrong branch. b. Read
```
manifest.jsonl
```
from the marker's
```
dispatch_dir
```
(or from the scratch directory copy if
```
/tmp
```
was lost) c. Identify the last successful phase boundary by scanning manifest entries grouped by phase. A phase boundary is verified when all dispatches in that phase have
```
status: "completed"
```
. d. Present resume option to the user:
"Previous build on branch [marker.branch] crashed at Phase [N], [context]. Resume from [last good boundary] ([checkpoint reason], [estimated time preserved] of work preserved)? [yes / no / fresh]" e. User accepts: Invoke
```
crucible:replay
```
in resume mode, passing the scratch directory path. The replay skill handles checkpoint restore, state reconstruction, and re-dispatch. The build pipeline does not continue -- replay takes over. f. User declines (fresh): Delete the stale
```
.pipeline-active
```
marker. Write a fresh marker with the current session. Proceed to Step 0 as a new pipeline run.

Marker updates during pipeline: Update the

phase

field in

.pipeline-active

at each phase boundary (1->2, 2->3, 3->4) to track progress for crash detection.

Marker cleanup: Delete

.pipeline-active

at Phase 4 step 12 (after finish skill completes).

Gate Ledger Initialization: After the pipeline-active marker is written (or recovered) and mode detection is complete, run the Gate Ledger Protocol's Ledger Initialization and Orphan Cleanup steps. The ledger must exist before Phase 1 transitions to IN_PROGRESS.

Step 0: Pre-Existing Doc Detection

Before running interactive design, check whether

/spec

(or a prior

/build

run) already produced design artifacts for this ticket.

Scan for pre-existing spec docs: Search
```
docs/plans/
```
for design docs (
```
*-design.md
```
) with a matching
```
ticket
```
field in YAML frontmatter. Also check for corresponding
```
*-implementation-plan.md
```
and
```
*-contract.yaml
```
files with the same ticket field.
Conflict detection: If multiple design docs match the same
```
ticket
```
field, escalate to user: "Found multiple design docs for ticket #NNN: [list files]. Which should I use?" Do not proceed until the user resolves the conflict.
Full match (design doc + implementation plan + contract all present):
- Skip interactive design (the Phase 1 design sub-skill below) — design doc already exists
- Security review check: If the contract contains
```
security_review
```
  field, note it in the Phase 1→2 handoff manifest under Active Constraints: "Contract requires security review (
```
security_review.status: [required|recommended]
```
  ) — siege will be evaluated in Phase 4 Step 5.5." This ensures the directive survives phase handoffs and compaction recovery.
- Quality-gate the existing design doc with staleness context: "This design doc is pre-existing from /spec and may be stale — verify against current codebase state before proceeding"
- Staleness rejection: If the quality gate finds that the design doc references files, interfaces, or modules that no longer exist in the codebase, reject the doc as fundamentally stale. Fall back to running Phase 1 interactively. Inform user: "Pre-existing design doc for #NNN is fundamentally stale (references [specific items] that no longer exist). Running interactive design instead."
- If quality gate passes: Run Phase 2 on the pre-existing implementation plan — skip Plan Writer (plan already exists), but run Plan Reviewer + innovate + quality-gate on the existing plan. This ensures the plan gets the same review rigor as a freshly written plan.
- If quality gate fails (non-staleness issues): fix or escalate
- Proceed to Phase 3 when the plan passes review
Partial match (design doc present but implementation plan or contract missing):
- Use the existing design doc (quality-gate it as above, including staleness rejection)
- Run the missing phases normally: if no implementation plan, run Plan Writer in Phase 2; if no contract, proceed without contract awareness for this ticket
- Inform user which artifacts were found and which are being generated fresh: "Found pre-existing design doc for #NNN. Implementation plan is missing — will generate in Phase 2." (or similar)
Not found: Proceed with normal Phase 1 (interactive design below).

Model: Opus (creative/architectural work needs the best model)
Mode: Interactive with the user
RECOMMENDED SUB-SKILL: Use crucible:forge (feed-forward mode) — consult past lessons before starting
RECOMMENDED SUB-SKILL: Use crucible:cartographer (consult mode) — review codebase map for structural awareness
REQUIRED SUB-SKILL: Use crucible:design
Follow design skill for design refinement, section-by-section validation, and saving the design doc
OVERRIDE: When design completes and the design doc is saved, do NOT follow design's "Implementation" section (do not chain into planning or worktree from there). Return control to this build skill — Phase 2 handles planning with its own subagent-based approach.
Phase ends when user approves the design (says "go", "looks good", "proceed", etc.)
Everything after this point is autonomous — tell the user: "Design approved. Starting autonomous pipeline — I'll only interrupt for escalations."

Step 2: Innovate and Red-Team the Design

After the user approves the design and before starting Phase 2:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-design-gate" before dispatching innovate and quality-gate on the design doc.

Innovate: Dispatch
```
crucible:innovate
```
on the design doc. Plan Writer incorporates the proposal.
Write Phase 1 IN_PROGRESS to the gate ledger (after ledger initialization).
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) design doc with artifact type "design". Include in the dispatch context:
```
Phase: design
```
and
```
PipelineID: <current PipelineID>
```
. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.)
If the quality gate requires changes, the Plan Writer updates the design doc and re-commits.
Verify verdict marker and write Phase 1 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
Design doc is now finalized — proceed to acceptance tests.

Step 2.5: Generate PRD

After the design doc is finalized (Step 2 complete), generate a stakeholder-facing PRD:

Dispatch a PRD Writer subagent (Sonnet) using
```
./prd-writer-prompt.md
```
- Input: finalized design doc
- Output: PRD in standard format (problem statement, user stories, requirements, scope, out-of-scope, success metrics, technical notes, dependencies)
Save to
```
docs/prds/YYYY-MM-DD-<topic>-prd.md
```
Commit:
```
docs: add PRD for [feature]
```

This step runs by default. The PRD is a reformatting of the design doc for non-technical stakeholders — it does not introduce new decisions or requirements. Skip only in refactor mode (refactoring has no stakeholder-facing PRD).

Step 3: Generate Acceptance Tests (RED)

Before planning, define "done" with executable tests:

Dispatch an Acceptance Test Writer subagent (Opus) using
```
./acceptance-test-writer-prompt.md
```
- Input: finalized design doc (especially acceptance criteria)
- Output: integration-level test file(s) that verify feature behavior end-to-end
Run the acceptance tests — verify they FAIL (the feature doesn't exist yet)
- If tests pass: something is wrong — investigate before proceeding
- If tests error (won't compile): this is expected in typed languages — note which tests exist and what they verify. They become the first implementation task.

Commit:

test: add acceptance tests for [feature] (RED)

These tests define the feature-level RED-GREEN cycle that wraps the entire pipeline. The pipeline is done when these tests pass.

Refactor Mode: Phase 1 Changes

When in refactor mode, Phase 1 shifts from "what should we build?" to "what are we changing and what could break?"

Blast Radius Analysis

After the user describes the refactoring intent, the design phase:

Identify the target — What code is being restructured? (module, interface, data representation, file organization, etc.)
Trace the blast radius using cartographer (if available) or fallback exploration:
- Direct consumers — code that imports/calls/references the target
- Indirect dependents — code that depends on consumers (transitive)
- Test coverage — which tests exercise the target behavior
- Configuration/wiring — DI registrations, config files, build scripts that reference the target
- Fallback when cartographer is unavailable: Use language-aware symbol search via agent exploration. Grep for symbol references (imports, type annotations, function calls) using language-specific patterns. The impact manifest's confidence field reflects reduced precision.
Present an impact manifest to the user:

### Impact Manifest

**Target:** [what's being restructured]
**Structural goal:** [what the code should look like after]

**Direct consumers:** N files
- path/to/consumer1.py (calls TargetClass.method)
- path/to/consumer2.py (imports TargetClass)

**Indirect dependents:** N files
- path/to/dependent.py (depends on consumer1)

**Test coverage:**
- N tests directly exercise target behavior
- N tests exercise consumers
- Gap: no tests cover [specific seam]

**Risk assessment:** [Low/Medium/High] based on consumer count and coverage gaps
**Confidence:** [High/Medium/Low] — High if cartographer used, Medium/Low if fallback

When confidence is Low, require explicit user confirmation before proceeding. The user must review the impact manifest and confirm the blast radius is complete.

Design the structural goal — what should the code look like after the refactoring? User validates the target state.

Acceptance Tests (Refactor Mode)

Instead of writing NEW acceptance tests (Step 3 above), the pipeline:

Dispatch the contract test writer using
```
./contract-test-writer-prompt.md
```
— a single agent handles gap identification AND gap filling. Input: impact manifest + blast radius file list. The agent maps existing tests to behavioral seams, identifies untested seams, and writes contract tests for each gap.
Run all contract tests GREEN — contract tests must pass before any refactoring begins.
If a contract test FAILS: The contract test writer investigates:
- Test defect (wrong assertion, bad setup) — fix the test and re-run
- Latent codebase bug — report to user with options: (a) fix the bug first, (b) exclude this seam and accept the risk, (c) abort the refactoring. Never silently drop a failing contract test.

Commit:

test: add contract tests for [target] refactoring (GREEN — locking existing behavior)

Proportionality Escape Valve

Contract test writing must remain proportional to the refactoring scope. Trigger a scope check when any of these thresholds are hit:

Count threshold: More than 15 contract tests needed
Effort threshold: Contract test writer reports context pressure, or estimated total contract test LOC exceeds ~2x the estimated refactoring scope LOC

When triggered:

Present the full gap list to the user with estimated effort per gap
User selects which gaps to fill and which to accept as uncovered risk
Proceed with only user-selected contract tests

The impact manifest records which gaps the user chose to leave uncovered.

Phase Handoff: 1 → 2

Before dispatching the Plan Writer, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read
```
build-gate-ledger.md
```
and verify Phase 1 Status is
```
PASS
```
. If not, follow Enforcement Rules.
Write
```
handoff-1-to-2.md
```
with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 2: design doc path, acceptance test paths (or contract tests in refactor mode), PRD path (if generated), conventions path (from cartographer, if loaded)
- Decisions Carried Forward: accumulated decisions from Phase 1
- Active Constraints: constraints affecting planning
- Shed Receipt: design iteration history, innovate proposals, quality gate round details → design doc on disk captures the outcome
Emit shed statement: "Phase 1 context shed. Design doc, acceptance tests, and PRD are on disk. Design iteration history, innovate proposals, and gate round details are not carried forward."
Update
```
## Compression State
```
in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block (manifest replaces it at this boundary).

Session index event: Emit a

phase_change

event to the outbox:

{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 1 -> Phase 2 (Plan)","detail":{"skill":"build","from":"1","to":"2"}}

Phase 2: Plan (Autonomous)

Step 1: Write the Plan

Dispatch a Plan Writer subagent (Opus):

Read the design doc produced in Phase 1 and the acceptance tests from Step 3
Write an implementation plan following the
```
crucible:planning
```
format
If acceptance tests couldn't compile (typed language), Task 1 should create the interfaces/stubs needed for them to compile and fail correctly
Include per-task metadata: Files (with count), Complexity (Low/Medium/High), Dependencies

Save to

docs/plans/YYYY-MM-DD-<topic>-implementation-plan.md

Plan tasks should be scoped to 2-3 per subagent, ~10 files max (context budget awareness)

Use

./plan-writer-prompt.md

template for the dispatch prompt.

Step 2: Review the Plan

Dispatch a Plan Reviewer subagent:

Reviewer model selection:

Plan touches 4+ systems or has 10+ tasks → Opus
Plan touches 1-3 systems with <10 tasks → Sonnet
When in doubt → Opus

Review protocol (iterative):

Dispatch Plan Reviewer to check plan against design doc
If issues found: record issue count, dispatch Plan Writer to revise
Dispatch NEW fresh Plan Reviewer on revised plan (no anchoring)
Compare issue count to prior round:
- Strictly fewer issues → progress, loop again
- Same or more issues → stagnation, escalate to user with findings from both rounds
Loop until plan passes with no issues
Architectural concerns bypass the loop — immediate escalation regardless of round

Use

./plan-reviewer-prompt.md

template for the dispatch prompt.

Step 3: Innovate and Red-Team the Plan

After the plan passes review:

RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-plan-gate" before dispatching innovate and quality-gate on the plan.

Write Phase 2 IN_PROGRESS to the gate ledger.
Innovate: Dispatch
```
crucible:innovate
```
on the approved plan. Plan Writer incorporates the proposal into the plan.
REQUIRED SUB-SKILL: Use crucible:quality-gate on the (potentially updated) plan with artifact type "plan". Include in the dispatch context:
```
Phase: plan
```
and
```
PipelineID: <current PipelineID>
```
. Provides the plan and design doc as context. (Non-negotiable — see Quality Gate Requirement.)
Verify verdict marker and write Phase 2 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.

The quality gate handles the iterative red-team loop — fresh review each round, weighted stagnation detection, 15-round safety limit, escalation. See

crucible:quality-gate

for details.

Phase Handoff: 2 → 3

Before creating the team and task list, write a handoff manifest. Step 3.4 above already verified the verdict marker, wrote PASS to the ledger, and deleted the marker. The handoff manifest is written AFTER the ledger PASS — this sequencing ensures compaction recovery finds a consistent state (ledger shows PASS, handoff exists).

Write a handoff manifest:

Write
```
handoff-2-to-3.md
```
with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 3: plan path, design doc path, acceptance test paths (or contract tests), contract YAML path (if exists), baseline SHA (current HEAD), cartographer context paths (module files, conventions.md, landmines.md)
- Decisions Carried Forward: accumulated decisions from Phases 1-2
- Active Constraints: constraints affecting execution
- Shed Receipt: plan review iterations, innovate proposals, quality gate round history → plan on disk captures the outcome
Emit shed statement: "Phase 2 context shed. Plan, design doc, and acceptance tests are on disk. Plan review rounds, innovate proposals, and gate details are not carried forward."
Update
```
## Compression State
```
in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.

Session index event: Emit a

phase_change

event to the outbox:

{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 2 -> Phase 3 (Execute)","detail":{"skill":"build","from":"2","to":"3"}}

Phase 3: Execute (Autonomous, Team-Based)

Step 0: Load Module Context for Subagents

RECOMMENDED SUB-SKILL: Use crucible:cartographer (load mode) — when dispatching implementers and reviewers, include relevant module files, conventions.md, and landmines.md in their dispatch files
Defect signature loading (for implementers only):
1. Glob
```
defect-signatures/*.md
```
  (excluding
```
*.non-matches.md
```
  ) from the cartographer storage directory
2. For each signature, read its
```
Modules
```
  field and match against the task's target modules:
  - Read each cartographer module file's
```
Path:
```
    field
  - A task's file is in a module if the file path starts with the module's
```
Path:
```
    value
  - When a task spans multiple modules, load signatures for all matched modules
  - Directory prefix fallback: When no cartographer modules exist, match if any target file path starts with any of the signature's
```
Modules
```
    directory prefixes
3. For matching signatures, validate all file paths still exist on disk — drop stale entries silently
4. Inject into the
```
[DEFECT_SIGNATURES]
```
  section of
```
build-implementer-prompt.md
```
  :
  - Generalized pattern (always)
  - Confirmed siblings list (always)
  - Unresolved siblings list (always — these are known live defects; produces a stronger warning)
  - Non-match companion files are NOT loaded for implementers
5. Last loaded
  update: Loading is pure-read. After all implementer dispatches for the current phase complete, batch-update the
```
Last loaded
```
  field to today on all signatures that were loaded. Do NOT update during dispatch — defer to after all subagents are dispatched.

Step 0.5: Gate Ledger — Phase 3 Start

Write Phase 3 IN_PROGRESS to the gate ledger (after Phase 2 PASS verification).

Step 1: Create Team and Task List

Create a team using

TeamCreate

team_name: "build-<feature-name>"
description: "Building <feature description>"

Read the approved plan. Create tasks via

TaskCreate

for each plan task, including:

Subject from plan task title
Description with full plan task text (subagents should never read the plan file)
Dependencies via
```
TaskUpdate
```
with
```
addBlockedBy
```

Agent Teams Fallback

TeamCreate

fails (agent teams not available), output a clear one-time warning:

⚠️ Agent teams are not available. Recommended: set
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
Falling back to sequential subagent dispatch via Agent tool.

Then fall back to sequential subagent dispatch via the regular Task tool (without

team_name

). Everything still works — independent tasks run sequentially instead of in parallel via teammates.

What changes in fallback mode:

Tasks are dispatched via
```
Agent
```
tool instead of as teammates
Independent tasks that would run in parallel now run sequentially
Task tracking still uses
```
TaskCreate
```
/
```
TaskUpdate
```
for state management
All other pipeline behavior (TDD, review, de-sloppify, quality gates) is unchanged

Step 2: Analyze Dependencies and Execution Order

Before dispatching:

Map the dependency graph from plan task metadata
Identify independent tasks (no shared files, no sequential dependencies)
Group into execution waves — independent tasks parallel, dependent tasks sequential
Assess complexity per task for reviewer model selection

Step 3: Execute Tasks

For each task (or wave of parallel tasks):

RECOMMENDED SUB-SKILL: Before dispatching each execution wave, use crucible:checkpoint — create checkpoint with reason "pre-wave-N" (where N is the wave number). This captures the working directory state after the prior wave's verification gate passed.

Mark task
```
in_progress
```
via
```
TaskUpdate
```
Spawn Implementer teammate (Opus) via Task tool with
```
team_name
```
and
```
subagent_type="general-purpose"
```
- Use
```
./build-implementer-prompt.md
```
  template
- Pass full task text, file paths, project conventions
- Contract-aware dispatch (when a contract exists for this ticket): Include the contract YAML alongside the design doc and task description. See "Contract-Aware Implementer Guidance" below.
- Implementer follows TDD, writes tests, runs tests, commits, self-reviews
When Implementer reports completion, run De-Sloppify Cleanup (see below)
After cleanup completes, spawn Reviewer teammate
- Use
```
./build-reviewer-prompt.md
```
  template
Tier-aware review routing: Read the task's
```
Review-Tier
```
from plan metadata.
- Tier 1: Dispatch single-pass code reviewer (Sonnet). If Clean or Minor-only: task complete. If Critical/Important: dispatch implementer fix, then task complete. If Architectural Concern: escalate.
- Tier 2: Dispatch iterative code review (per existing loop). Then dispatch single-pass test reviewer. If test review surfaces Critical findings, escalate to Tier 3. Then dispatch adversarial tester (per existing logic). Task complete.
- Tier 3: Follow current full pipeline (no changes to existing flow).

Contract-Aware Implementer Guidance

When a contract YAML exists for the current ticket (detected during Step 0 or produced by

/spec

), the implementer receives the contract alongside the design doc and task description. The contract uses the schema defined in

crucible:spec/contract-schema.md

(version

1.0

). Implementers must treat contract elements as follows:

```
api_surface
```
declarations are binding. The implementer must match the declared function signatures, class interfaces, endpoint shapes, parameter names, types, and return types exactly. Deviations from the contract's API surface are implementation errors.
```
checkable
```
invariants are binding. The implementer must satisfy all declared constraints (e.g., "must not import X", "must be idempotent"). The
```
check_method
```
field (
```
grep
```
,
```
code-inspection
```
,
```
file-structure
```
) indicates how the quality gate will verify compliance — the implementer should self-check against these before committing.
```
testable
```
invariants require tagged tests. For each
```
testable
```
invariant, the implementer must write a test tagged with the declared
```
test_tag
```
(pattern:
```
contract:<category>:<id>
```
) that validates the invariant. These tests are checked by the quality gate and reviewers — they must exist and pass.
```
integration_points
```
are informational. These indicate which other components and contracts this ticket interacts with. The implementer should be aware of referenced components and ensure compatibility, but integration points are not binding constraints — they provide context for making good implementation decisions.

De-Sloppify Cleanup

After the implementer reports completion and before dispatching the reviewer:

RECOMMENDED: Use crucible:checkpoint — create checkpoint with reason "pre-cleanup-task-N" before dispatching the cleanup agent. If cleanup removes something needed, restore to this checkpoint.

Record the pre-cleanup commit SHA
Dispatch a fresh Cleanup Agent (Opus) using
```
./cleanup-prompt.md
```
- Input:
```
git diff <pre-task-sha>..HEAD
```
  (the implementer's committed changes)
- The orchestrator provides the pre-task commit SHA to the cleanup agent
Cleanup agent reviews changes, removes unnecessary code (see allowlist), runs tests
If cleanup made changes, commits separately:
```
refactor: cleanup task N implementation
```
If cleanup found nothing to remove, reports "No cleanup needed" and proceeds

Reviewer Model Selection (Lead Decides Per-Task)

Task Complexity	Reviewer Model
Low (1-3 files, straightforward)	Sonnet
Medium (3-6 files, some cross-system)	Lead decides (default Opus)
High (6+ files, refactoring, deep chains)	Opus
When in doubt	Opus

Two-Pass Review Cycle

Each task gets TWO review passes before completion:

digraph review {
  "Implementer builds + tests" -> "De-sloppify cleanup";
  "De-sloppify cleanup" -> "Pass 1: Code Review";
  "Pass 1: Code Review" -> "Implementer fixes code findings";
  "Implementer fixes code findings" -> "Pass 2: Test Quality Review";
  "Pass 2: Test Quality Review" -> "Implementer fixes test findings";
  "Implementer fixes test findings" -> "Test Alignment Audit (crucible:test-coverage)";
  "Test Alignment Audit (crucible:test-coverage)" -> "Test Gap Writer";
  "Test Gap Writer" -> "Adversarial Tester";
  "Adversarial Tester" -> "Task complete";
}

Pass 1 — Code Review: Architecture, patterns, correctness, wiring (actually connected, not just existing?)

Pass 2 — Test Quality Review: Test independence? Determinism? Edge cases? Integration tests where mocks are masking real behavior? AAA pattern? Correct test level? (Staleness and alignment checks are handled by the test-coverage dispatch below.)

Review Tier Routing

Each task's

Review-Tier

(from the plan) determines which review steps execute. Phase 4 full-implementation gates are NOT affected by per-task tiers.

Step	Tier 1	Tier 2	Tier 3
Implementer	Yes	Yes	Yes
De-sloppify cleanup	Yes	Yes	Yes
Pass 1: Code review	Single pass	Iterative	Iterative
Implementer fixes (code)	If findings	If findings	If findings
Pass 2: Test quality review	SKIP	Single pass (non-iterative)	Iterative
Implementer fixes (test)	SKIP	If critical findings only	If findings
Test alignment audit	SKIP	SKIP	Yes
Test gap writer	SKIP	SKIP	Yes
Adversarial tester	SKIP	Yes	Yes

Tier 1 "single pass" code review: Dispatch one reviewer. If findings are Clean, task is complete. If findings include Critical or Important issues, dispatch implementer to fix, then the task is complete (no re-review). If findings include an Architectural Concern, escalate as normal.

Tier 2 "single pass" test review: Dispatch one test quality reviewer. Report findings but do NOT enter the iterative review loop. If the single pass surfaces Critical findings, escalate the task to Tier 3 for full iterative treatment.

Tier 2 "iterative" code review: Same as current behavior -- fresh reviewer each round, track issue count, loop until clean or stagnation.

Runtime Tier Escalation

The orchestrator may escalate a task's review tier during execution. Escalation is one-directional (up only).

Triggers:

Implementer reports unexpected complexity or cross-system interaction not anticipated in the plan
Single-pass reviewer (Tier 1 code review or Tier 2 test review) reports Critical findings
Implementer touches significantly more files than the plan specified

Process:

Log escalation to decision journal:

[timestamp] DECISION: review-tier | choice=escalate T1->T2 | reason=<trigger> | alternatives=none

Execute the additional review steps for the new tier (from the point where the current tier's pipeline diverges)
Update the task status display to show the escalated tier

Contract-Aware Reviewer Guidance

When a contract YAML exists for the current ticket, reviewers receive the contract alongside the implementation and must add the following checks to both review passes:

API surface compliance: Do the implemented public interfaces match the
```
api_surface
```
declarations in the contract? Check function signatures, class interfaces, endpoint shapes, parameter names/types, and return types. Any deviation from the contract's declared API surface is a blocking finding.
Checkable invariant satisfaction: Are all
```
checkable
```
invariants satisfied per their declared
```
check_method
```
?
- ```
grep
```
  : verify the pattern match (or absence) in production code
- ```
code-inspection
```
  : read and reason about code to confirm the invariant holds
- ```
file-structure
```
  : check file existence/organization matches the constraint Any unsatisfied checkable invariant is a blocking finding.
Testable invariant test existence: Does a test exist for each
```
testable
```
invariant, tagged with the correct
```
test_tag
```
(pattern:
```
contract:<category>:<id>
```
)? A missing tagged test is a blocking finding.
Test correctness: Do the tagged tests actually validate the invariant they claim to cover? A test that exists but does not meaningfully exercise the invariant (e.g., a trivially passing assertion, a test that tests something unrelated despite having the right tag) is a blocking finding.

Severity: All contract-related review findings are classified as blocking — the same severity as contract violations in the quality gate. Contract findings must be resolved before the task is marked complete.

Test Alignment Audit

After the implementer addresses Pass 2 findings, invoke

crucible:test-coverage

against the task's changes:

Code diff:
```
git diff <pre-task-sha>..HEAD
```
Affected test files: test files touched or related to the task
Context: "Build task N: [task description]"

The test-coverage skill audits existing tests for staleness (wrong assertions, misleading descriptions, dead tests, coincidence tests) and handles its own fix dispatch and revert-on-failure logic. It returns a structured report. Note: the diff includes review fix commits — the audit agent should focus on behavioral changes to source files, not changes that only touch test files.

Skip this step if the task made no behavioral source changes (only

.md

.json

, config files).

Test Gap Writer

After test-coverage completes (or is skipped), dispatch a Test Gap Writer (Opus) using

./test-gap-writer-prompt.md

Input: Pass 2 test reviewer's missing coverage findings + implementer's changes + test-coverage audit report (if available)
The test gap writer writes tests ONLY for gaps the reviewer identified — no scope creep. Before writing a new test for a flagged gap, verify no existing test already covers this path (it may have been updated by the test-coverage audit).
Tests should pass immediately (the behavior already exists from implementation)
The test gap writer reports per-test PASS/FAIL results (see prompt template for report format)
Commits new tests:
```
test: fill coverage gaps for task N
```

If all tests PASS: Continue to adversarial tester.

If some tests FAIL (gaps reveal genuinely missing implementation):

Dispatch a fresh implementer (Opus) with the failing test(s), their failure messages, and the gap descriptions from the reviewer
Implementer fixes the missing behavior, then re-runs ALL test gap writer tests (not just the failures — catches regressions from the fix)
If all tests pass after fix: commit (
```
fix: address test gap failures for task N
```
), continue to adversarial tester
If tests still fail after one fix attempt: escalate to user with:
- Which coverage gaps the reviewer identified
- Which tests the gap writer wrote (per-test PASS/FAIL)
- What the implementer attempted to fix
- Which tests still fail and their current failure messages

Skip this step if the Pass 2 test reviewer reported zero missing coverage gaps.

Adversarial Tester

After the test gap writer completes (or is skipped), dispatch an Adversarial Tester (Opus) using

skills/adversarial-tester/break-it-prompt.md

Input: Full diff of the task's changes (
```
git diff <pre-task-sha>..HEAD
```
), project test conventions, cartographer module context (if available)
The adversarial tester identifies the top 5 most likely failure modes, writes one test per mode, and runs them
Outcome handling:
- All tests PASS: Implementation is robust. Log results and proceed to task complete.
- Some tests FAIL: Real weaknesses found. Dispatch implementer to fix. Re-run all tests (including adversarial). If pass → task complete. If fail → one more fix attempt, then escalate to user.
- Tests ERROR (won't compile): Adversarial tester mistake. Discard broken tests, log, proceed to task complete.
Quality bypass prevention: If the implementer's fix touches more than 3 files, route through a lightweight code review before completing.
Commit adversarial tests:
```
test: adversarial tests for task N
```

Skip this step when:

The task diff contains no behavioral source files (only
```
.md
```
,
```
.json
```
,
```
.yaml
```
,
```
.uss
```
,
```
.uxml
```
)
No tests were written during implementation (pure scaffolding)

Iterative Review Loop

Each review pass (code and test) uses the iterative loop:

After fixes, dispatch a NEW fresh Reviewer (no anchoring to prior findings)
Track issue count between rounds
Strictly fewer issues → progress, loop again
Same or more issues → stagnation, escalate to user
Loop until clean
Architectural concerns → immediate escalation regardless of round

Verification Gates

After each wave completes:

Run full test suite (not just current wave's tests)
Check compilation
Failures → identify which task caused regression before fixing
Clean → proceed to next wave

Refactor Mode: Phase 3 Changes

When in refactor mode, Phase 3 execution differs from feature mode in several ways.

Pre-Execution Coverage Check

Before the first task executes:

Run all contract tests from Phase 1 — confirm GREEN
Run the full test suite — confirm GREEN (pre-execution baseline)
Record the "baseline commit" SHA in
```
/tmp/crucible-build-mode.md
```
— this is the rollback target

Tiered Test Strategy

Running the full test suite after every atomic step is prohibitively expensive. Instead:

(a) After each atomic task: Run blast-radius tests + direct consumer tests only (tests identified in the impact manifest)
(b) After each execution wave: Run the full test suite (matches existing verification gate between waves)
(c) Full suite checkpoints: Pre-execution baseline and Phase 4 final verification always run the full suite

Coordinated-Atomic Execution

When the executor encounters a task marked

atomic: true

Record pre-task commit SHA
Implementer makes ALL changes (multiple files) — dispatch with
```
./refactor-implementer-addendum.md
```
appended
Run blast-radius tests + direct consumer tests (per tiered strategy)
If GREEN: Commit all files together in a single commit
If FAIL: Revert ALL files to pre-task SHA. Dispatch one retry with a fresh implementer that receives the failure context and test output. If second attempt also fails, revert to pre-task SHA and escalate to user (see Rollback Policy below).

Key difference from feature mode: Feature mode does RED-GREEN-REFACTOR. Refactor mode for atomic steps does GREEN-GREEN — tests are green before, tests must be green after. No RED phase because no new behavior is being added.

After a successful atomic commit (step 4), the rest of the per-task pipeline continues as normal: de-sloppify cleanup, two-pass review cycle, test alignment audit, test gap writer, and adversarial tester (unless skipped per restructuring-only annotation below).

Non-atomic refactoring tasks follow normal execution — structural changes that don't break intermediate states (e.g., extracting a private method, adding a module nothing imports yet). These use standard TDD if they introduce new abstractions, or GREEN-GREEN if they are pure restructuring.

Phase 3 Adaptations for Existing Steps

Adversarial tester: The planner annotates each task with
```
restructuring-only: true/false
```
. If
```
restructuring-only: true
```
, adversarial testing is skipped. Tasks with
```
restructuring-only: false
```
still get adversarial testing. When in doubt, default to
```
false
```
.
- ```
restructuring-only: true
```
  examples: renames where all call sites are mechanically updated, file moves with updated paths, extract-method where the extracted method is private and preserves the original call signature
- ```
restructuring-only: false
```
  examples: extract-class where callers must change call targets, splitting a module where consumers must update imports, any change where the consumer-facing API surface shifts
De-sloppify cleanup: Gains a new removal category: dead compatibility shims. After a refactoring task, look for leftover adapter code, re-export aliases, or compatibility layers introduced during migration but no longer referenced. Detection scope: code added after the baseline commit SHA that re-exports, aliases, or wraps symbols under old names, AND where no code outside the refactoring's changed files references the old names. String-based references: When the target was registered by name in a configuration system, flag the shim as UNCERTAIN and defer to the reviewer rather than removing it.

Refactoring Rollback Policy

Baseline Commit

The orchestrator records the baseline commit SHA before the first refactoring task executes (during pre-execution coverage check). Persisted in

/tmp/crucible-build-mode.md

Per-Task Rollback

When a single task fails after the executor's retry attempt:

Revert that task's changes to the pre-task commit SHA
Escalate to user with failure context and test output
User chooses: skip this task and continue (orchestrator also skips all tasks that depend on the skipped task, and informs the user which tasks were transitively skipped), retry with guidance, or revert all tasks to baseline

Full Rollback to Baseline

When the user chooses full rollback (or cascading failures make forward progress impossible):

Perform
```
git reset --hard <baseline-SHA>
```
to restore pre-refactoring state
Re-run all contract tests to confirm known-good state
Report what was reverted and why

Safe Partial States

The planner annotates tasks with

safe-partial: true/false

. A task is

safe-partial: true

if the codebase is in a valid, shippable state after that task completes (all tests green, no dangling references). When a later task fails, the orchestrator can offer to keep changes through the last safe-partial task.

Architectural Checkpoint

For plans with 10+ tasks, at ~50% completion or after a major subsystem:

Dispatch architecture reviewer using
```
./architecture-reviewer-prompt.md
```
Design drift → escalate to user
Minor concerns → adjust prompts for remaining tasks
All clear → continue

Noticed Reconciliation

After all implementers in Phase 3 report back and before writing the Phase 3 COMPLETE ledger entry, aggregate their

### Noticed But Not Touching

sections into a single

docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md

artifact.

Scope discipline: Notice, do not act. If an implementer sees an out-of-scope issue during implementation, it must be logged under

### Noticed But Not Touching

in their report — NOT fixed in their diff. Acting on noticed items in the same task is a scope-discipline failure. The orchestrator enforces this via reconciliation: noticed entries are surfaced here and converted to follow-up tickets later (see

/finish

7-step reconciliation process:

Collect each implementer's
```
### Noticed But Not Touching
```
section from every Phase 3 implementer report.
Skip any section whose body is
```
*(none)*
```
.
Dedupe entries using the canonical dedupe key:
```
sha256( normalize(file_path) + "|" + line_range + "|" + noticed[:40] )
```
, where
```
normalize(file_path)
```
is the repo-relative POSIX path lowercased.
Sort the deduped entries by file path, then line range.

If any entries remain, write

docs/plans/<YYYY-MM-DD>-<ticket-slug>-noticed.md

matching the canonical filename regex

^docs/plans/\d{4}-\d{2}-\d{2}-[a-z0-9-]+-noticed\.md$

. Use the date embedded in the sibling plan filename (not wall-clock date) so all sibling artifacts share a date; slug matches the ticket being built. Frontmatter and body must follow the Canonical Constants template exactly:

---
pipeline_id: "<build-YYYYMMDD-HHMMSS>"
date: "YYYY-MM-DD"
ticket: "#NNN"
---

# Noticed But Not Touching — <ticket-slug>

- **file:** `path:L<start>-L<end>`
  **noticed:** <desc>
  **why it matters:** <risk/opportunity>
  **suggested follow-up:** <optional>

Idempotent overwrite: If the target
```
-noticed.md
```
already exists (same-ticket re-run on the same date), merge the existing entries with the newly collected entries, run the full dedupe (same key), sort, and overwrite the file in one write. No append-mode; the on-disk file is always the full deduped set for that date+ticket.
Stage the
```
-noticed.md
```
file so it lands in the PR commit.

Skip the write entirely if zero entries remain after dedupe — do not produce an empty

-noticed.md

Gate Ledger — Phase 3 Complete

After the last task wave's verification gate passes and all tasks are marked complete — but BEFORE the Phase 3→4 handoff — write

Status: COMPLETE

and

Tasks: N/N complete

to the Phase 3 ledger entry. If any task is in a retry/re-dispatch loop, COMPLETE is NOT written until retries resolve.

Phase Handoff: 3 → 4

Before running acceptance tests and code review, verify the gate ledger and write a handoff manifest:

Gate ledger check: Read
```
build-gate-ledger.md
```
and verify Phase 3 Status is
```
COMPLETE
```
. If not, follow Enforcement Rules.

Write the handoff manifest:

Write
```
handoff-3-to-4.md
```
with:
- Goal: original user request, verbatim
- Mode: feature or refactor
- Inputs for Phase 4: HEAD SHA (all tasks committed), design doc path, acceptance test paths (or contract tests), baseline SHA (for
```
git diff
```
  scope), task summary (completed count, escalation outcomes)
- Decisions Carried Forward: accumulated decisions from Phases 1-3
- Active Constraints: constraints affecting completion review
- Shed Receipt: per-task review rounds, implementer context, wave verification details → task completion status in task list; per-task review details are shed
Emit shed statement: "Phase 3 context shed. Working code at HEAD, design doc, and acceptance tests on disk. Per-task implementation context, review rounds, and verification details are not carried forward."
Update
```
## Compression State
```
in pipeline-status.md with manifest contents.
Do NOT emit a Compression State Block.

Session index event: Emit a

phase_change

event to the outbox:

{"ts":"<now>","seq":0,"type":"phase_change","summary":"Build: Phase 3 -> Phase 4 (Completion)","detail":{"skill":"build","from":"3","to":"4"}}

Phase 4: Completion

After all tasks complete:

Write Phase 4 IN_PROGRESS to the gate ledger (after Phase 3 COMPLETE verification).
Feature mode: Run acceptance tests from Phase 1 Step 3 — verify they PASS (GREEN). Refactor mode: Run all contract tests from Phase 1 — verify they PASS (GREEN).
- If any fail: implementation is incomplete. Identify what's missing, dispatch implementer to fix, re-run.
- If all pass: feature is verifiably done. Proceed.
Run full test suite (unit + integration)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-code-review" before dispatching code review. If the iterative review fix cycle introduces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:code-review on full implementation (iterative until clean)
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-inquisitor" before dispatching inquisitor. If the inquisitor's fix cycle produces regressions, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:inquisitor on full implementation (dispatches 5 parallel dimensions against full feature diff)
- Input:
```
git diff <base-sha>..HEAD
```
  where base-sha is the commit before Phase 3 execution began
- Runs after code review (obvious issues already fixed) and before quality gate (gate reviews final state)
- The inquisitor manages its own fix cycle internally — do not intervene unless it escalates
- See
```
crucible:inquisitor
```
  for full process
Conditional: If the inquisitor's fix cycle produced any code changes, re-run crucible:code-review scoped to the inquisitor fix commits only (
```
git diff <pre-inquisitor-sha>..HEAD
```
)
- This is NOT a full implementation re-review — scope it to only the fixer's changes
- Iterative until clean, same as step 3
- Skip if the inquisitor reported all PASS (no fixes were needed) 5.5. CONDITIONAL: Security review via crucible:siege

a. Contract check: If a contract YAML exists for this ticket with
```
security_review.status: "required"
```
, siege is mandatory — skip to step (d). b. Code scan: If no contract directive (or contract has
```
security_review.status: "recommended"
```
or field absent), scan for siege activation signals:
- Scan targets: design doc content +
```
git diff <base-sha>..HEAD
```
  (changed file contents)
- Method: Case-insensitive keyword matching using the 7-category keyword lists from
```
shared/security-signals.md
```
- Count distinct categories matched (one hit per category is sufficient) c. Threshold evaluation:
- 0 signals: Skip siege silently. No narration needed.
- 1 signal: Log in narration: "1 security signal detected ([category]) — skipping siege. Invoke
```
/siege --force
```
  manually if needed." Record in manifest and decision journal:
```
security-review | choice=skip | reason=1 signal ([category])
```
  .
- 2+ signals: Proceed to step (d). d. Dispatch siege:
- RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-siege" before dispatching siege. If siege's fix cycle produces regressions, this is the rollback target.
- Dispatch
```
crucible:siege
```
  with:
  - Target: design doc + full implementation diff (artifact type:
```
mixed
```
    )
  - ```
  deployment_context
```
  : from contract
```
  security_review.deployment_context
```
  if present, else unset (siege defaults to
```
  public
```
  )
- Narration: "Security signals detected: [list categories]. Dispatching siege."
- Decision journal:
```
security-review | choice=dispatch | reason=[N] signals ([categories]) [or contract-required]
```
- Session index event: Emit to outbox:
```
{"ts":"<now>","seq":0,"type":"security_review","summary":"Siege dispatched: [N] signals detected","detail":{"skill":"build","signals":[categories]}}
```
  e. Blocking behavior: Siege iterates internally until zero Critical + zero High.
- If siege completes clean: continue to step 6 (quality-gate)
- If siege escalates (stagnation, user input needed): escalate to user with siege context
- If siege's fix cycle produced code changes: re-run crucible:code-review scoped to siege fix commits only (
```
git diff <pre-siege-sha>..HEAD
```
  ). Same pattern as post-inquisitor conditional review at step 5. f. Escape hatches: User can override automatic siege behavior:
- ```
--force-siege
```
  — Dispatch siege regardless of signal count. Maps to siege's
```
--force
```
  flag. Decision journal:
```
security-review | choice=force-dispatch | reason=user --force-siege flag
```
- ```
--skip-siege
```
  — Suppress siege even when signals/contract require it. Maps to siege's
```
--skip
```
  flag. Decision journal:
```
security-review | choice=force-skip | reason=user --skip-siege flag
```
RECOMMENDED SUB-SKILL: Use crucible:checkpoint — create checkpoint with reason "pre-impl-gate" before dispatching the implementation quality gate. If gate fix rounds degrade the code, this is the rollback target.
REQUIRED SUB-SKILL: Use crucible:quality-gate on full implementation (artifact type: "code"). Include in the dispatch context:
```
Phase: code
```
and
```
PipelineID: <current PipelineID>
```
. Iterates until clean or stagnation. (Non-negotiable — see Quality Gate Requirement.) 6b. Verify verdict marker and write Phase 4 PASS to the gate ledger (see Verdict Marker Verification). Delete the verdict marker after writing the ledger entry.
RECOMMENDED SUB-SKILL: Use crucible:forge (retrospective mode) — capture what happened vs what was planned 7.5. Chronicle signal fallback: If forge retrospective was skipped (user declined, session ending), append a minimal chronicle signal directly:
- Read the metrics log at
```
/tmp/crucible-metrics-<session-id>.log
```
  for duration and subagent counts
- Construct signal:
```
v=1
```
  ,
```
ts=now
```
  ,
```
skill="build"
```
  ,
```
outcome
```
  from acceptance test results,
```
duration_m
```
  from metrics log,
```
branch
```
  from git,
```
files_touched
```
  from
```
git diff <base-sha>..HEAD --name-only
```
  ,
```
metrics={mode, tasks count, tasks_passed count from task list, stagnation=false}
```
- Append as a single JSON line to
```
~/.claude/projects/<hash>/memory/chronicle/signals.jsonl
```
- If forge retrospective DID run, skip this step (forge Step 8.5 already emitted the signal)
RECOMMENDED SUB-SKILL: Use crucible:cartographer (record mode) — persist any new codebase knowledge discovered during build
Compile summary: what was built, acceptance tests passing, review findings addressed, inquisitor findings, concerns

Report to user 10.5. Session index event: Emit a

skill_end

event to the outbox:

{"ts":"<now>","seq":0,"type":"skill_end","summary":"/build complete: <outcome summary>","detail":{"skill":"build","outcome":"success|failure|escalated"}}

REQUIRED SUB-SKILL: Use crucible:finish — skip finish's Step 2.5 (test-coverage) since test-coverage ran per-task in Phase 3, and skip finish's Step 3 (red-team) since quality-gate already ran at step 6. Tell finish to skip both.
Delete pipeline-active marker: Remove
```
<scratch>/.pipeline-active
```
. This signals that the pipeline completed successfully. If deletion fails (permissions, missing file), log a warning but do not fail the pipeline.

Session Metrics

Throughout the pipeline, the orchestrator appends timestamped entries to

/tmp/crucible-metrics-<session-id>.log

on each subagent dispatch and completion.

Dispatch measurement protocol: On every subagent dispatch, the orchestrator follows the enriched manifest protocol from

shared/dispatch-convention.md

Before dispatching: Measure the dispatch file size in characters. Record
```
input_chars
```
and
```
model_tier
```
in the manifest entry.
After dispatch returns: Measure the subagent response length in characters. Record
```
output_chars
```
and
```
tool_calls
```
(if available) in the manifest completion entry.

At completion (before reporting to user, i.e. step 9), read the metrics log and manifest, then compute:

-- Pipeline Complete ----------------------------------------
  Subagents dispatched:  23 (14 Opus, 7 Sonnet, 2 Haiku)
  Active work time:      2h 47m
  Wall clock time:       11h 13m
  Quality gate rounds:   4 (design: 2, plan: 1, impl: 1)
  Siege:                 dispatched (3 agents, 2 rounds, 0 Critical, 0 High) | skipped (0 signals) | skipped (1 signal: auth)
  Task tiers:           3 Tier 1, 3 Tier 2, 2 Tier 3
  Subagent savings:     ~21 dispatches skipped vs all-Tier-3
  Est. input tokens:    ~32,100 (128,400 chars)
  Est. output tokens:   ~20,500 (82,000 chars)
  Token estimate note:  Based on dispatch file sizes (chars/4). Actual consumption may vary +/-30%.
-------------------------------------------------------------

Metrics tracked:

Total subagents dispatched (by type and model tier: Opus/Sonnet/Haiku)
Active work time (merge overlapping parallel intervals — NOT naive sum)
Wall clock time (first dispatch to final completion)
Quality gate rounds (per gate: design, plan, implementation)
Siege status (dispatched with agent count, rounds, and final severity counts — or skipped with signal count and reason)
Estimated input tokens (sum of
```
input_chars
```
from manifest / 4)
Estimated output tokens (sum of
```
output_chars
```
from manifest / 4)

Efficiency summary computation: Read

manifest.jsonl

from the dispatch directory. Sum

input_chars

and

output_chars

across all completed entries (skip nulls). Divide each by 4 for token estimates. Count dispatches grouped by

model_tier

. Include these in the pipeline completion report alongside existing metrics.

Gate tracking verification: Before compiling the pipeline summary (Phase 4 Step 9), verify that all three gate categories (design, plan, implementation) show round count >= 1 with clean final rounds (0 Fatal, 0 Significant). If any gate was skipped with explicit user approval, record it as

USER_SKIP

in the metrics. A zero without user approval indicates a gate was dropped — report this in the summary.

Pipeline Decision Journal

Alongside the metrics log, maintain a decision journal at

/tmp/crucible-decisions-<session-id>.log

. Append a structured entry for every non-trivial routing decision:

[timestamp] DECISION: <type> | choice=<what> | reason=<why> | alternatives=<rejected>

Decision types to capture:

```
reviewer-model
```
— why Opus vs Sonnet for this reviewer
```
review-tier
```
-- tier assignment read from plan, runtime escalation reason if applicable
```
gate-round
```
— issue count, severity shifts, progress/stagnation per round
```
escalation
```
— why the orchestrator escalated to user (and user's decision)
```
task-grouping
```
— parallelism decisions for wave execution
```
cleanup-removal
```
— what de-sloppify removed and accept/reject decision

Escalation Triggers (Any Phase)

STOP and ask the user when:

Architectural concerns in plan or code review
Review loop stagnation (same or more issues after fixes — any phase)
Test suite failures not obviously fixable
Multiple teammates fail on different tasks
Teammate reports context pressure at 50%+ with significant work remaining
When escalating for regression or stagnation AND a checkpoint exists for the current phase boundary: include "A checkpoint from [reason] is available. Restore to pre-regression state?" in the escalation message.

Minor issues: Log, work around, include in final report.

What the Lead Should NOT Do

Implement code (dispatch implementers)
Read large files (spawn Haiku researcher)
Debug failing tests (dispatch implementer)
Make architectural decisions (escalate to user)

Context Management

One task per agent — always spawn a fresh implementer for each task. Never send a second task to a running agent via SendMessage. Reusing agents accumulates context and causes exhaustion.
"2-3 per subagent, ~10 files max" refers to plan design — group small steps into one task at planning time, not sequential dispatch to a running agent
Lead stays thin — coordination only
All important state on disk (plan files, task list)
Teammates report at 50%+ context usage
Lead compaction acceptable — task list is source of truth
Agent teams unavailable: If agent teams are not enabled, the lead dispatches tasks sequentially via Agent tool. Task tracking still uses TaskCreate/TaskUpdate. The pipeline is slower but functionally identical.

Prompt Templates

```
./acceptance-test-writer-prompt.md
```
— Phase 1 acceptance test generation
```
./prd-writer-prompt.md
```
— Phase 1 PRD generation from design doc
```
./plan-writer-prompt.md
```
— Phase 2 plan writer dispatch
```
./plan-reviewer-prompt.md
```
— Phase 2 plan reviewer dispatch
```
./build-implementer-prompt.md
```
— Phase 3 implementer dispatch
```
./build-reviewer-prompt.md
```
— Phase 3 reviewer dispatch
```
./cleanup-prompt.md
```
— Phase 3 de-sloppify cleanup dispatch
```
./test-gap-writer-prompt.md
```
— Phase 3 test gap writer dispatch
```
./architecture-reviewer-prompt.md
```
— Mid-plan checkpoint
```
./contract-test-writer-prompt.md
```
— Phase 1 refactor-mode contract test generation
```
./refactor-implementer-addendum.md
```
— Phase 3 refactor-mode implementer addendum (appended to build-implementer-prompt)

Red-team, innovate, adversarial tester, and inquisitor prompts live in their respective skills:

crucible:red-team

—

skills/red-team/red-team-prompt.md

crucible:innovate

—

skills/innovate/innovate-prompt.md

crucible:adversarial-tester

—

skills/adversarial-tester/break-it-prompt.md

crucible:inquisitor

—

skills/inquisitor/inquisitor-prompt.md

Quality Gate Orchestration

Build is the outermost orchestrator and controls all quality gates via

crucible:quality-gate

. Quality gate wraps

crucible:red-team

internally — do NOT invoke red-team separately at these points.

Gate points in the pipeline:

Pipeline Stage	Artifact Type	Replaces
Phase 1, Step 2 (after design)	design	Existing `crucible:red-team` on design
Phase 2, Step 3 (after plan review)	plan	Existing `crucible:red-team` on plan
Phase 4, Step 6 (after inquisitor + conditional re-review)	code	Existing `crucible:red-team` on implementation

Code review (

crucible:code-review

) and inquisitor (

crucible:inquisitor

) remain separate from the quality gate — code-review does structured quality checks, inquisitor writes cross-component adversarial tests, and the quality gate does adversarial artifact review. All three serve distinct purposes.

Contract-Aware Quality Gate

When a contract YAML exists for the current ticket, the quality gate adds contract verification to its checks. This applies at all gate points (design, plan, and code), though most contract checks are only meaningful at the code gate (Phase 4, Step 6).

Version check: Before processing a contract, verify the
```
version
```
field is
```
"1.0"
```
. If the version is missing or unrecognized, reject the contract with a clear error: "Contract version [X] is not supported. Expected version 1.0." Do not proceed with contract-aware checks — fall back to standard quality gate behavior without contract awareness.
Checkable invariant verification: For each
```
checkable
```
invariant in the contract, verify satisfaction using the declared
```
check_method
```
:
- ```
grep
```
  — pattern match (or absence) in production code. Run the grep and confirm the result matches the invariant's
```
verification
```
  description.
- ```
code-inspection
```
  — read and reason about the relevant code to confirm the invariant holds (e.g., idempotency, no side effects).
- ```
file-structure
```
  — check that file existence, location, or organization matches the constraint.
Testable invariant verification: For each
```
testable
```
invariant in the contract:
- Verify that a test tagged with the declared
```
test_tag
```
  (pattern:
```
contract:<category>:<id>
```
  ) exists in the test suite.
- Verify that the tagged test passes when run.
- A missing or failing tagged test is a contract violation.
Contract violations are blocking issues. Contract violations are NOT warnings — they have the same severity as architectural concerns and must be resolved before the gate passes. The quality gate's iterative fix loop applies: dispatch fixes, re-check, track progress/stagnation as normal.

Red Flags

Skipping Compression State Block emission at checkpoint boundaries
Emitting a Compression State Block at a phase boundary (1→2, 2→3, 3→4) instead of writing a handoff manifest
Skipping the shed statement after a manifest write
Emitting a Compression State Block with stale or missing Key Decisions (decisions must be cumulative across all prior blocks)
Allowing the Goal field to drift across successive Compression State Blocks (must match original user request)
Exceeding 10 entries in the Key Decisions list without overflow-compressing the oldest
Skipping a REQUIRED quality gate because the task seems "small", "simple", or "trivial"
Self-assessing that a quality gate is unnecessary based on perceived task complexity
Rationalizing that quality-gate findings would be "minor" as justification to skip
Declaring a quality gate "done" after fixing findings without a clean verification round (fixing is not passing)
Short-circuiting the quality-gate iteration loop by assuming fixes are self-evidently correct
Interpreting general user feedback as approval to skip a quality gate that has not yet run — once a gate has run and presented findings to the user, the user's decision to proceed is authoritative. Pre-gate skip approval must be an unambiguous instruction specifically referencing the gate.
Treating session index summary as authoritative over CSB state (session index is supplementary narrative, CSB is authoritative state)

Integration

Required sub-skills:

crucible:design — Phase 1
crucible:finish — Phase 4
crucible:quality-gate — Iterative red-teaming at each quality gate point
crucible:red-team — Adversarial review engine (invoked by quality-gate)
crucible:innovate — Creative enhancement before quality gates
crucible:inquisitor — Full-feature cross-component adversarial testing (Phase 4, after code-review, before quality-gate)

Recommended sub-skills:

crucible:forge — Feed-forward at Phase 1 start, retrospective at Phase 4 completion
crucible:cartographer — Consult at Phase 1 start, load at Phase 3 dispatches, record at Phase 4
crucible:checkpoint — Shadow git checkpoints at pipeline boundaries (pre-design-gate, pre-plan-gate, pre-wave-N, pre-cleanup-task-N, pre-code-review, pre-inquisitor, pre-impl-gate)

Recon/assay context: Inherits recon/assay context through /design (Phase 1). No direct dispatch. When design integrates recon, build benefits automatically. See #147 for rationale.

Phase 3 sub-skills (dispatched per-task):

crucible:test-coverage — Test alignment audit after each task's test quality review (staleness, dead tests, coincidence tests)

Implementer sub-skills:

crucible:test-driven-development — TDD within each task
crucible:source-driven-development — Detect → Fetch → Implement → Cite loop for non-trivial external API usage (≥ 5 LOC touching a detected framework); invoked by the implementer prompt's Source Consultation block. Recommended — skipped for pure internal refactors or trivial edits.

Contract consumption:

crucible:spec — Consumes contract YAML files produced by
```
/spec
```
(schema version 1.0). Contracts are read from
```
docs/plans/*-contract.yaml
```
and feed into pre-existing doc detection (Phase 1 Step 0), implementer dispatch (Phase 3), reviewer checks (Phase 3), and quality gate verification (all gate points). See
```
crucible:spec/contract-schema.md
```
for field definitions.