Crucible design
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
git clone https://github.com/raddue/crucible
T=$(mktemp -d) && git clone --depth=1 https://github.com/raddue/crucible "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/design" ~/.claude/skills/raddue-crucible-design && rm -rf "$T"
skills/design/SKILL.mdBrainstorming Ideas Into Designs
Overview
<!-- CANONICAL: shared/dispatch-convention.md -->All subagent dispatches use disk-mediated dispatch. See
shared/dispatch-convention.md for the full protocol.
<!-- Trust framework: see [skills/getting-started/trust-hierarchy.md](../getting-started/trust-hierarchy.md). -->
Turn ideas into fully formed designs through investigated, collaborative dialogue.
Every significant design question is backed by parallel investigation agents that research the codebase, explore approaches, and assess impact BEFORE the question reaches the user. Questions arrive informed, not naive.
Anti-Rationalization Table — design
| Rationalization | Rebuttal | Rule |
|---|---|---|
| "This decision is obvious, I can skip the hypothesis step." | The hypothesis step is a forcing function for noticing surprises — the most valuable output of investigation. Skipping it loses the contrast. | Write the hypothesis before dispatching investigation agents, even when the answer feels obvious. |
| "One investigation agent is enough, I don't need the Challenger." | The Challenger catches assumption blind spots that the recommendation agent inherited. Skipping it is how bad designs ship. | Deep Dive dimensions always dispatch the Challenger (or multi-model consensus challenge when available). |
| "Quick scan is fine for this data-model decision." | Data-model decisions always warrant Deep Dive — schema outlives the application. Quick scans have repeatedly missed grain/durability issues. | Any dimension touching schema/data model/persistent storage is Deep Dive, not Quick scan. |
| "The recommendation is clear, I can skip presenting alternatives." | Presenting 2–3 options keeps the user in the decision loop. Auto-resolving without alternatives is only allowed when investigation proves a single viable path. | Present 2–3 options per dimension unless the investigation explicitly shows one viable path; record the justification. |
| "The feature obviously belongs in this system, skip the scope absorption test." | The test is cheap (4 yes/no questions) and has repeatedly flagged features that didn't belong. | Run the scope absorption test whenever adding a feature to an existing system. |
| "The user is still deliberating, I'll keep asking for more analysis." | After 2 non-resolving user responses, the stall-breaker protocol (eliminate / reversibility / ship-sooner) applies. | Activate the stall-breaker protocol on the 3rd user exchange on the same dimension without resolution. |
The Process
Phase 1: Context Gathering
- RECOMMENDED: Use crucible:forge (feed-forward mode) — consult past lessons
- RECOMMENDED: Use crucible:cartographer (consult mode) — review codebase map
- Check current project state (files, docs, recent commits)
- Understand the user's initial idea through open conversation
Phase 2: Investigated Questions
Step 0: Recon Dispatch
Before entering the dimension loop, dispatch
/recon to gather structural context for all subsequent investigations:
/recon task: [user's feature request / design goal] session_id: "<design-run-timestamp>" modules: ["impact-analysis"]
Store the Investigation Brief in the design session's scratch directory. The brief provides codebase structure, existing patterns, scope boundaries, prior art, and task-level impact analysis that all dimension investigations share.
Narration: "Dispatching recon for design context [with session_id: X]."
<!-- TRUST: recon brief is L2 — prior-stage artifact; prefer L3 source on any code-behavior conflict. -->On success: The brief is available as
[RECON_BRIEF] context for all subsequent dimension investigations. Quick scan dimensions read the brief directly (no agent dispatch). Deep dive dimensions pass relevant sections to the Domain Researcher and Impact Analyst.
On failure: "Recon failed: [reason]. Falling back to inline investigation." Proceed without recon context -- dimension investigations explore from scratch (existing behavior). All subsequent steps work identically, just without the acceleration that recon provides.
For each design dimension that needs a decision, follow this loop:
Step 1: Identify the Design Dimension
Before asking anything, name the decision needed (e.g., "persistence strategy," "component communication pattern," "UI architecture").
Step 2: State Your Hypothesis
Write down what you EXPECT to find before dispatching agents. After agents return, compare. Surprises get highlighted — they're the most valuable insights.
Step 3: Triage Depth
| Tier | When | Effort |
|---|---|---|
| Deep dive | Architectural decisions, integration points, pattern choices, anything constraining future work | 2 parallel agents (Domain Researcher + Impact Analyst) + challenger, with recon brief as context |
| Quick scan | Implementation approach within decided architecture, which existing pattern to follow | Read relevant sections of the recon brief (no agent dispatch needed) |
| Direct ask | Naming, UI placement, priority ordering — no technical implications | Ask directly |
Data model dimensions: Decisions involving database schema, data model, or persistent storage always warrant Deep dive. During synthesis, apply the grain test:
- Grain: "What is one row in this table?" Must be answerable in a single sentence. Bad signs: rows that mean different things depending on a type column, rows containing multiple independent facts, rows whose identity requires reading application code.
- Relationships: Every foreign key tells a story. Parent-child relationships should be obvious from the schema alone. If you need a whiteboard to explain how two tables relate, add an intermediate table or rethink the relationship.
- Durability: The schema outlives the application. Design tables as if the application will be replaced but the data must survive. Use simple types, avoid application-specific encoding.
Step 4: Dispatch Investigation
Deep dive — spawn two agents in parallel (templates in
investigation-prompts.md):
- Domain Researcher — What are the viable approaches? Trade-offs, best practices, precedents. Receives
with relevant structural context.[RECON_BRIEF] - Impact Analyst — What existing systems does this decision affect? What could break? Receives
with relevant structural context.[RECON_BRIEF]
Pass the cascading context (all prior decisions and rationale) to each agent.
Quick scan — read relevant sections of the recon brief directly. No agent dispatch needed. Extract existing patterns, constraints, touchpoints, and precedents from the brief's
## Project Structure, ## Existing Patterns, and ## Prior Art sections.
Step 5: Synthesize
After agents return:
-
Compare to hypothesis — note surprises
-
Check for auto-resolution — if only one viable path exists, inform the user rather than asking: "Investigation showed X is the only viable approach because [reasons]. Moving on." User can interrupt if they disagree.
-
Scope absorption test — If this feature targets an existing system (not greenfield), challenge the assumption that it belongs there. Apply four questions:
- Does this share the same data model as the host application?
- Does this share the same interaction pattern (form entry vs. dashboard vs. real-time)?
- Does this serve the same users with the same workflows?
- Would this feature survive if the host application were replaced tomorrow? If fewer than 3 answers are "yes," flag to the user: "This feature may not belong in [target system] — it only shares [N/4] characteristics. Consider whether it deserves its own application. Each application should be describable in a single sentence — if you need a paragraph, it's doing too much." The user decides; this is a structured nudge, not a veto.
-
Check for question redirection — if agents found the wrong question is being asked, redirect: "Was going to ask about X, but investigation revealed the real decision is Y."
-
Assay dispatch (Deep Dive only) — For Deep Dive dimensions, dispatch
for structured evaluation:/assay/assay question: "<design dimension question>" context: { recon brief sections + agent findings + open_questions: [recon's ## Open Questions relevant to this dimension] } decision_type: "architecture" cascading_decisions: [<prior dimension decisions>]Extract
from the recon brief and include entries whose## Open Questions
tag matches the current dimension. This grounds assay's confidence scoring on what the investigation explicitly could not determine, and populatesRelevant to:
with actionable items (using recon'smissing_information
metadata).Resolvable by:Use assay's recommendation as the starting point. Present assay's
scoring,constraint_fit
, andkill_criteria
level to the user.confidenceOn assay failure: "Assay evaluation failed: [reason]. Proceeding with manual synthesis." Synthesize options manually (existing behavior).
Quick scan and Direct ask dimensions do NOT dispatch assay.
-
Synthesize into 2-3 informed options with a recommended choice (enriched by assay output when available)
Step 6: Challenge (Deep Dive Only)
Dispatch a Challenger agent (template in
investigation-prompts.md):
- Attacks the recommendation's assumptions
- Checks for conflicts with prior decisions
- Identifies blind spots in the investigation
- Brief output — this is lightweight, not a full red-team
Multi-model enhancement (when available): When the
consensus_query
MCP tool is available and consensus mode investigate is enabled, the
Challenger step calls consensus_query(mode: "investigate") instead of
dispatching a single Challenger agent. Multiple models attack the
recommendation independently, and the aggregator synthesizes their
challenges. This is structurally stronger than a single-model challenger
because different model families have different assumption sets.
Present multi-model challenge results with provenance: "Challenge (multi-model, N models): [synthesis]" vs. the standard "Challenge: [findings]" format.
Fallback: If consensus is unavailable, dispatch the standard single-model Challenger agent.
Step 7: Present to User
### [Design Dimension] **Hypothesis:** [what you expected] **Surprises:** [anything that contradicted expectations — highlight these] **Investigation:** - **Codebase (recon):** [2-3 sentence summary from recon brief] - **Approaches:** [2-3 sentence summary of viable options] - **Impact:** [2-3 sentence summary of affected systems] **Challenge:** [1-2 sentence summary of what the challenger raised] **Constraint Fit:** [from assay report, if available — pattern_alignment, scope_fit, reversibility, integration_risk] **Kill Criteria:** [from assay — when to revisit this decision] **Confidence:** [high/medium/low from assay] **Unknowns:** [from assay's missing_information, grounded by recon's Open Questions — what we don't know and how to find out] **Recommendation:** [your recommended option and why] **Question:** [the refined question, prefer multiple choice]
For auto-resolved questions:
### [Design Dimension] — Auto-Resolved [Why only one viable path exists. Decision made.] *Speak up if you disagree.*
Step 8: Cascade
After the user answers, add the decision and rationale to the running context. All subsequent agents receive this.
Step 9: Stall-Breaker (Conditional)
Trigger: The same design dimension has received 2+ user responses without new information surfacing or a decision being made. A "user response" is a reply after the dimension is presented that does not resolve it (asks for more analysis, expresses uncertainty, or revisits a prior option). The initial presentation (Step 7) does not count — the user gets at least two chances to engage before the stall-breaker activates.
When triggered, apply this tiebreaker protocol in order:
- Eliminate: Can you identify a concrete technical reason one option is wrong? If so, eliminate it and present the remaining option as the recommendation.
- Reversibility: Are both options still viable? Pick the one that's simpler to reverse if you're wrong. Two-way doors beat one-way doors.
- Ship sooner: Still stuck? Pick the one that ships sooner. Shipping teaches things deliberation cannot.
Present the tiebreaker:
"We've been deliberating on [dimension] for a while without new information surfacing. Here's my tiebreaker recommendation: [option], because [reason from the protocol above]. If you disagree, tell me — otherwise I'll proceed with this."
The user can always override. This is a structured nudge, not a forced decision. If the user provides genuinely new information (not a restatement of prior concerns), reset the exchange counter and continue normal investigation.
What "decided" means: The decision is implemented and no longer being discussed. If the same question keeps resurfacing after a tiebreaker, it hasn't been decided — escalate: "This dimension keeps resurfacing. Would it help to time-box it, or should we move forward and revisit after we see the rest of the design?"
Phase 3: Design Presentation
Once key dimensions are decided:
- Present design in sections of 200-300 words
- Ask after each section whether it looks right
- Cover: architecture, components, data flow, error handling, testing
- Include an API Surface section listing public interfaces with signatures and types
- Include an Invariants section listing hard constraints, split into what can be checked by inspection vs. what requires tests
- Be ready to go back and re-investigate if something doesn't make sense
These contract-relevant sections (API Surface, Invariants) give the contract extraction step structured source material rather than requiring post-hoc extraction from prose.
Before Saving the Design
Scan for gaps (use judgment — not every item applies):
- Acceptance criteria — Concrete and testable?
- Testing strategy — Unit vs integration coverage?
- Integration impact — Touchpoints addressed?
- Failure modes — Invalid data, missing dependencies, unexpected state?
- Edge cases — Boundary conditions?
- API surface defined — Public interfaces with signatures?
- Invariants identified — Hard constraints (checkable vs testable)?
- Data model grain — Can you answer "what is one row?" in a single sentence for every new table? Relationships obvious from schema alone? Schema durable beyond the application?
Raise critical gaps with the user before saving.
After the Design
- Write to
docs/plans/YYYY-MM-DD-<topic>-design.md - Design doc must include YAML frontmatter with the following fields:
--- ticket: "#NNN" # Issue/ticket reference title: "Design Title" date: "YYYY-MM-DD" source: "design" # Provenance tracking — distinguishes from /spec-authored docs --- - Commit the design document
Contract Emission
After the design doc is written and the user approves it:
- Extract API surface, invariants (split into checkable/testable), and integration points from the design doc's API Surface and Invariants sections
- Present the extracted contract to the user: "Here's the contract I extracted from the design. Please review before I commit it alongside the design doc."
- The user can approve, modify, or reject the contract
- On approval, emit a
file alongside the design doc inYYYY-MM-DD-<topic>-contract.yamldocs/plans/ - Contract uses the same YAML schema as
contracts (version 1.0):/spec
,api_surface
withinvariants
/checkable
split,testable
,integration_pointsambiguity_resolutions
Contract asymmetry note:
/design extracts contracts post-hoc from prose; /spec produces contracts as a first-class output during autonomous decision-making. As a result, /design contracts may be less structured — they reflect what was extractable from prose rather than what was deliberately specified. /build should apply additional scrutiny to /design-sourced contracts.
Implementation (if continuing):
- Ask: "Ready to set up for implementation?"
- Use crucible:worktree, then crucible:planning
Quality Gate
This skill produces design docs. When used standalone, invoke
crucible:quality-gate after the design document is saved and committed. When used as a sub-skill of build, the parent orchestrator handles gating.
Key Principles
- Investigated questions — Never ask a significant question without research backing
- One question at a time — Don't overwhelm
- Auto-resolve when possible — Don't waste user attention on decided questions
- Hypothesis-first — State expectations, highlight surprises
- Cascading context — Each decision informs subsequent investigations
- YAGNI ruthlessly — Remove unnecessary features
- Depth-appropriate effort — Not every question needs deep investigation
Integration
Related skills: crucible:build, crucible:planning, crucible:worktree, crucible:forge, crucible:cartographer, crucible:quality-gate, crucible:spec, crucible:recon (Phase 2 context), crucible:assay (Phase 2 decision evaluation)
Recon/Assay dispatch: Recon is dispatched once at Phase 2 start (Step 0) with
modules: ["impact-analysis"] and a session-level session_id. Assay is dispatched per Deep Dive dimension during synthesis (Step 5) with decision_type: "architecture". Both include fallback to existing behavior on failure.
Contract schema: Shared with
/spec — see skills/spec/contract-schema.md for the canonical contract YAML schema (version 1.0). Both /design and /spec emit contracts that /build consumes.
Prompt templates:
design/investigation-prompts.md