Claude-skill-registry implementing-features

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/implementing-features" ~/.claude/skills/majiayu000-claude-skill-registry-implementing-features && rm -rf "$T"

manifest: skills/data/implementing-features/SKILL.md

Invariant Principles

Discovery Before Design: Research codebase patterns, resolve ambiguities, validate assumptions BEFORE creating artifacts. Uninformed design produces rework.
Subagents Invoke Skills: Every subagent prompt tells agent to invoke skill via Skill tool. Prompts provide CONTEXT only. Never duplicate skill instructions in prompts.
Quality Gates Block Progress: Each phase has mandatory verification. 100% score required to proceed. Bypass only with explicit user consent.
Completion Means Evidence: "Done" requires traced verification through code. Trust execution paths, not file names or comments.
Autonomous Means Thorough: In autonomous mode, treat suggestions as mandatory. Fix root causes, not symptoms. Choose highest-quality fixes.

Skill Invocation Pattern

<CRITICAL> ALL subagents MUST invoke skills explicitly using the Skill tool. Do NOT embed or duplicate skill instructions in subagent prompts. </CRITICAL>

Correct Pattern:

Task (or subagent simulation):
  prompt: |
    First, invoke the [skill-name] skill using the Skill tool.
    Then follow its complete workflow.

    ## Context for the Skill
    [Only the context the skill needs to do its job]

WRONG Pattern:

Task (or subagent simulation):
  prompt: |
    Use the [skill-name] skill to do X.
    [Then duplicating the skill's instructions here]  <-- WRONG

Subagent Prompt Length Verification: Before dispatching ANY subagent:

Count lines in subagent prompt
Estimate tokens:
```
lines * 7
```
If > 200 lines and no valid justification: compress before dispatch
Most subagent prompts should be OPTIMAL (< 150 lines) since they provide CONTEXT and invoke skills

Reasoning Schema

<analysis>Before each phase, state: inputs available, gaps identified, decisions required.</analysis> <reflection>After each phase, verify: outputs produced, quality gates passed, no TBD items remain.</reflection>

Inputs

Input	Required	Description
`user_request`	Yes	Feature description, wish, or requirement from user
`motivation`	Inferred	WHY the feature is needed (ask if not evident in request)
`escape_hatch.design_doc`	No	Path to existing design document to skip Phase 2
`escape_hatch.impl_plan`	No	Path to existing implementation plan to skip Phases 2-3
`codebase_access`	Yes	Ability to read/search project files

Outputs

Output	Type	Description
`understanding_doc`	File	Research findings at `~/.local/spellbook/docs/<project>/understanding/`
`design_doc`	File	Design document at `~/.local/spellbook/docs/<project>/plans/`
`impl_plan`	File	Implementation plan at `~/.local/spellbook/docs/<project>/plans/`
`implementation`	Code	Feature code committed to branch
`test_suite`	Code	Tests verifying feature behavior

Workflow Overview

Phase 0: Configuration Wizard
  ├─ 0.1: Escape hatch detection
  ├─ 0.2: Motivation clarification (WHY)
  ├─ 0.3: Core feature clarification (WHAT)
  └─ 0.4: Workflow preferences + store SESSION_PREFERENCES
    ↓
Phase 1: Research
  ├─ 1.1: Research strategy planning
  ├─ 1.2: Execute research (subagent)
  ├─ 1.3: Ambiguity extraction
  └─ 1.4: GATE: Research Quality Score = 100%
    ↓
Phase 1.5: Informed Discovery
  ├─ 1.5.0: Disambiguation session (resolve ambiguities)
  ├─ 1.5.1: Generate 7-category discovery questions
  ├─ 1.5.2: Conduct discovery wizard (AskUserQuestion + ARH)
  ├─ 1.5.3: Build glossary
  ├─ 1.5.4: Synthesize design_context
  ├─ 1.5.5: GATE: Completeness Score = 100% (11 validation functions)
  ├─ 1.5.6: Create Understanding Document
  └─ 1.6: Invoke devils-advocate skill
    ↓
Phase 2: Design (skip if escape hatch)
  ├─ 2.1: Subagent invokes brainstorming (SYNTHESIS MODE)
  ├─ 2.2: Subagent invokes reviewing-design-docs
  ├─ 2.3: GATE: User approval (interactive) or auto-proceed (autonomous)
  └─ 2.4: Subagent invokes executing-plans to fix
    ↓
Phase 3: Implementation Planning (skip if impl plan escape hatch)
  ├─ 3.1: Subagent invokes writing-plans
  ├─ 3.2: Subagent invokes reviewing-impl-plans
  ├─ 3.3: GATE: User approval per mode
  ├─ 3.4: Subagent invokes executing-plans to fix
  ├─ 3.4.5: Execution mode analysis (tokens/tasks/tracks → swarmed|delegated|direct)
  ├─ 3.5: Generate work packets (if swarmed)
  └─ 3.6: Session handoff (TERMINAL - if swarmed, EXIT here)
    ↓
Phase 4: Implementation (if delegated/direct)
  ├─ 4.1: Setup worktree(s) per preference
  ├─ 4.2: Execute tasks (per worktree strategy)
  ├─ 4.2.5: Smart merge (if per_parallel_track worktrees)
  ├─ For each task:
  │   ├─ 4.3: Subagent invokes test-driven-development
  │   ├─ 4.4: Implementation completion verification
  │   ├─ 4.5: Subagent invokes requesting-code-review
  │   └─ 4.5.1: Subagent invokes fact-checking
  ├─ 4.6.1: Comprehensive implementation audit
  ├─ 4.6.2: Run test suite (invoke systematic-debugging if failures)
  ├─ 4.6.3: Subagent invokes audit-green-mirage
  ├─ 4.6.4: Comprehensive fact-checking
  ├─ 4.6.5: Pre-PR fact-checking
  └─ 4.7: Subagent invokes finishing-a-development-branch

Session State Data Structures

interface SessionPreferences {
  autonomous_mode: "autonomous" | "interactive" | "mostly_autonomous";
  parallelization: "maximize" | "conservative" | "ask";
  worktree: "single" | "per_parallel_track" | "none";
  worktree_paths: string[];  // Filled during Phase 4.1 if per_parallel_track
  post_impl: "offer_options" | "auto_pr" | "stop";
  escape_hatch: null | {
    type: "design_doc" | "impl_plan";
    path: string;
    handling: "review_first" | "treat_as_ready";
  };
  execution_mode?: "swarmed" | "sequential" | "delegated" | "direct";
  estimated_tokens?: number;
  feature_stats?: {
    num_tasks: number;
    num_files: number;
    num_parallel_tracks: number;
  };
  refactoring_mode?: boolean;
}

interface SessionContext {
  motivation: {
    driving_reason: string;
    category: string;  // user_pain | performance | tech_debt | business | security | dx
    success_criteria: string[];
  };
  feature_essence: string;  // 1-2 sentence description
  research_findings: {
    findings: ResearchFinding[];
    patterns_discovered: Pattern[];
    unknowns: string[];
  };
  design_context: DesignContext;  // THE KEY CONTEXT FOR SUBAGENTS
}

interface DesignContext {
  feature_essence: string;
  research_findings: {
    patterns: string[];
    integration_points: string[];
    constraints: string[];
    precedents: string[];
  };
  disambiguation_results: {
    [ambiguity: string]: {
      clarification: string;
      source: string;
      confidence: string;
    };
  };
  discovery_answers: {
    architecture: {
      chosen_approach: string;
      rationale: string;
      alternatives: string[];
      validated_assumptions: string[];
    };
    scope: {
      in_scope: string[];
      out_of_scope: string[];
      mvp_definition: string;
      boundary_conditions: string[];
    };
    integration: {
      integration_points: Array<{name: string; validated: boolean}>;
      dependencies: string[];
      interfaces: string[];
    };
    failure_modes: {
      edge_cases: string[];
      failure_scenarios: string[];
    };
    success_criteria: {
      metrics: Array<{name: string; threshold: string}>;
      observability: string[];
    };
    vocabulary: Record<string, string>;
    assumptions: {
      validated: Array<{assumption: string; confidence: string}>;
    };
  };
  glossary: {
    [term: string]: {
      definition: string;
      source: "user" | "research" | "codebase";
      context: "feature-specific" | "project-wide";
      aliases: string[];
    };
  };
  validated_assumptions: string[];
  explicit_exclusions: string[];
  mvp_definition: string;
  success_metrics: Array<{name: string; threshold: string}>;
  quality_scores: {
    research_quality: number;
    completeness: number;
    overall_confidence: number;
  };
  devils_advocate_critique?: {
    missing_edge_cases: string[];
    implicit_assumptions: string[];
    integration_risks: string[];
    scope_gaps: string[];
    oversimplifications: string[];
  };
}

Quality Gate Thresholds

Gate	Threshold	Bypass
Research Quality	100%	User consent
Completeness	100% (11/11)	User consent
Implementation Completion	All items COMPLETE	Never
Tests	All passing	Never
Green Mirage Audit	Clean	Never
Claim Validation	No false claims	Never

Phase 0: Configuration Wizard

<CRITICAL> The Configuration Wizard MUST be completed before any other work. This is NOT optional. All preferences are collected upfront to enable fully autonomous mode. </CRITICAL>

0.1 Detect Escape Hatches

<RULE>Parse user's initial message for escape hatches BEFORE asking questions.</RULE>

Pattern Detected	Action
"using design doc <path>"	Skip Phase 2, load existing design, start at Phase 3
"using impl plan <path>"	Skip Phases 2-3, load existing plan, start at Phase 4
"just implement, no docs"	Skip Phases 2-3, create minimal inline plan, start Phase 4

If escape hatch detected, ask via AskUserQuestion:

## Existing Document Detected

I see you have an existing [design doc/impl plan] at <path>.

Header: "Document handling"
Question: "How should I handle this existing document?"

Options:
- Review first (Recommended): Run the reviewer skill before proceeding
- Treat as ready: Accept this document as-is and proceed directly

Handle by choice:

Review first (design doc): Skip 2.1, load doc, jump to 2.2 (review)
Review first (impl plan): Skip 2.1-3.1, load doc, jump to 3.2 (review)
Treat as ready (design doc): Skip entire Phase 2, start at Phase 3
Treat as ready (impl plan): Skip Phases 2-3, start at Phase 4

0.2 Clarify Motivation (WHY)

<RULE>Before diving into WHAT to build, understand WHY. Motivation shapes every subsequent decision.</RULE>

When to Ask:

Request Type	Motivation Clear?	Action
"Add a logout button"	No - why now?	Ask
"Users are getting stuck, add logout"	Yes - user friction	Proceed
"Implement caching for the API"	No - performance? cost?	Ask
"API calls cost $500/day, add caching"	Yes - perf + cost	Proceed

How to Ask (AskUserQuestion):

What's driving this request? Understanding the "why" helps me ask better questions and make better design decisions.

Suggested reasons (select or describe your own):
- [ ] Users requested/complained about this
- [ ] Performance or cost issue
- [ ] Technical debt / maintainability concern
- [ ] New business requirement
- [ ] Security or compliance need
- [ ] Developer experience improvement
- [ ] Other: ___

Motivation Categories:

Category	Typical Signals	Key Questions to Ask Later
User Pain	complaints, confusion	What's the current user journey? Failure mode?
Performance	slow, expensive, timeout	Current metrics? Target?
Technical Debt	fragile, hard to maintain	What breaks when touched?
Business Need	new requirement, stakeholder	Deadline? Priority?
Security/Compliance	audit, vulnerability	Threat model? Requirement?
Developer Experience	tedious, error-prone	How often? Workaround?

Store in

SESSION_CONTEXT.motivation

0.3 Clarify the Feature (WHAT)

<RULE>Collect only the CORE essence. Detailed discovery happens in Phase 1.5 after research.</RULE>

Ask via AskUserQuestion:

What is the feature's core purpose? (1-2 sentences)
Are there any resources, links, or docs to review during research?

Store in

SESSION_CONTEXT.feature_essence

0.4 Collect Workflow Preferences

<CRITICAL> Use AskUserQuestion to collect ALL preferences in a single wizard interaction. These preferences govern behavior for the ENTIRE session. </CRITICAL>

## Configuration Wizard

### Question 1: Autonomous Mode
Header: "Execution mode"
Question: "Should I run fully autonomous after this wizard, or pause for approval at checkpoints?"

Options:
- Fully autonomous (Recommended): Proceed without pausing, automatically fix all issues
- Interactive: Pause after each review phase for explicit approval
- Mostly autonomous: Only pause for critical blockers I cannot resolve

### Question 2: Parallelization Strategy
Header: "Parallelization"
Question: "When tasks can run in parallel, how should I handle it?"

Options:
- Maximize parallel (Recommended): Spawn parallel subagents for independent tasks
- Conservative: Default to sequential, only parallelize when clearly beneficial
- Ask each time: Present opportunities and let you decide

### Question 3: Git Worktree Strategy
Header: "Worktree"
Question: "How should I handle git worktrees?"

Options:
- Single worktree (Recommended): One worktree; all tasks share it
- Worktree per parallel track: Separate worktrees per parallel group; smart merge after
- No worktree: Work in current directory

### Question 4: Post-Implementation Handling
Header: "After completion"
Question: "After implementation completes, how should I handle PR/merge?"

Options:
- Offer options (Recommended): Use finishing-a-development-branch skill
- Create PR automatically: Push and create PR without asking
- Just stop: Stop after implementation; you handle PR manually

Store all preferences in

SESSION_PREFERENCES

Important: If

worktree == "per_parallel_track"

, automatically set

parallelization = "maximize"

0.5 Continuation Detection

<CRITICAL> This phase detects session continuation and enables zero-intervention recovery. Execute BEFORE the Configuration Wizard questions if continuation signals detected. </CRITICAL>

Continuation Signals (any of):

User prompt contains: "continue", "resume", "pick up", "where we left off", "compacted"
MCP
```
<system-reminder>
```
contains
```
**Skill Phase:**
```
with implementing-features phase

MCP

<system-reminder>

contains

**Active Skill:** implementing-features

Artifacts exist in expected locations for current project

If NO continuation signals: Proceed to Phase 0.1 (escape hatch detection)

If continuation signals detected:

Step 1: Parse Recovery Context

Extract from

<system-reminder>

(if present):

```
active_skill
```
: Confirms implementing-features was active
```
skill_phase
```
: Highest phase reached (e.g., "Phase 2: Design")
```
todos
```
: In-progress work items with status
```
exact_position
```
: Recent tool actions for position verification

Step 2: Verify Artifact Existence

Check for expected artifacts based on

skill_phase

Phase Reached	Expected Artifacts
Phase 1.5+	Understanding doc at `~/.local/spellbook/docs/<project>/understanding/`
Phase 2+	Design doc at `~/.local/spellbook/docs/<project>/plans/*-design.md`
Phase 3+	Impl plan at `~/.local/spellbook/docs/<project>/plans/*-impl.md`
Phase 4+	Worktree at `.worktrees/<feature>/`

If artifacts missing but phase suggests they should exist:

## Missing Artifacts

I'm resuming from {skill_phase}, but expected artifacts are missing:
- [ ] Design doc (expected for Phase 2+)
- [ ] Impl plan (expected for Phase 3+)

Options:
1. Regenerate missing artifacts using recovered context
2. Start fresh from Phase 0

Step 3: Quick Preferences Check

Since SESSION_PREFERENCES are not stored in the soul database, re-ask ONLY the 4 preference questions:

## Quick Preferences Check

I'm resuming your session but need to confirm a few preferences:

### Execution Mode
- [ ] Fully autonomous: Proceed without pausing
- [ ] Interactive: Pause for approval at checkpoints
- [ ] Mostly autonomous: Only pause for critical blockers

### Parallelization
- [ ] Maximize parallel
- [ ] Conservative (sequential)
- [ ] Ask each time

### Worktree Strategy
- [ ] Single worktree (detected: {worktree_exists ? "exists" : "none"})
- [ ] Worktree per parallel track
- [ ] No worktree

### Post-Implementation
- [ ] Offer options (finishing-a-development-branch)
- [ ] Create PR automatically
- [ ] Just stop

Your choices: ___

Important: Skip motivation/feature questions if design doc exists.

Step 4: Synthesize Resume Point

Based on verified state, determine exact resume point:

Find in-progress todo (most precise position)
If no in-progress todo, use
```
skill_phase
```
(phase-level precision)
If no skill_phase, infer from artifacts

Step 5: Confirm and Resume

## Session Continuation Detected

I'm resuming your implementing-features session:

**Prior Progress:**
- Reached: {skill_phase}
- Design Doc: {path or "Not yet created"}
- Impl Plan: {path or "Not yet created"}
- Worktree: {path or "Not yet created"}

**Current Task:** {in_progress_todo or "Beginning of " + skill_phase}

Resuming at {resume_point}...

Then jump directly to the appropriate phase using the Phase Jump Mechanism.

Phase Jump Mechanism

When resuming, the skill MUST:

Determine target phase from
```
skill_phase
```
and artifact verification
Skip all prior phases by checking phase number
Execute only from target phase forward

Display on resume:

## Resuming Session

**Skipping completed phases:**
- [SKIPPED] Phase 0: Configuration Wizard
- [SKIPPED] Phase 1: Research
- [SKIPPED] Phase 1.5: Informed Discovery

**Resuming at:**
- [CURRENT] Phase 2: Design (Step 2.2: Review Design Document)

Proceeding...

Artifact-Only Fallback

When MCP soul data is unavailable, infer phase from artifacts alone:

Artifact Pattern	Inferred Phase	Confidence
No artifacts found	Phase 0 (fresh start)	HIGH
Understanding doc exists, no design doc	Phase 1.5 complete, resume at Phase 2	HIGH
Design doc exists, no impl plan	Phase 2 complete, resume at Phase 3	HIGH
Design doc + impl plan exist, no worktree	Phase 3 complete, resume at Phase 4.1	HIGH
Worktree exists with uncommitted changes	Phase 4 in progress	MEDIUM
Worktree exists with commits, no PR	Phase 4 late stages	MEDIUM
PR exists for feature branch	Phase 4.7 (finishing)	HIGH

0.6 Detect Refactoring Mode

<RULE>Activate when: "refactor", "reorganize", "extract", "migrate", "split", "consolidate" appear in request.</RULE>

if (request.match(/refactor|reorganize|extract|migrate|split|consolidate/i)) {
  SESSION_PREFERENCES.refactoring_mode = true;
}

Refactoring is NOT greenfield. Behavior preservation is the primary constraint. See Refactoring Mode section below.

Phase 1: Research & Ambiguity Detection

<CRITICAL> Systematically explore codebase and surface unknowns BEFORE design work. All research findings must achieve 100% quality score to proceed. </CRITICAL>

1.1 Research Strategy Planning

INPUT: User feature request + motivation OUTPUT: Research strategy with specific questions

Process:

Analyze feature request for technical domains
Generate codebase questions:
- Which files/modules handle similar features?
- What patterns exist for this type of work?
- What integration points are relevant?
- What edge cases have been handled before?
Identify knowledge gaps explicitly

Example Questions:

Feature: "Add JWT authentication for mobile API"

Generated Questions:
1. Where is authentication currently handled in the codebase?
2. Are there existing JWT implementations we can reference?
3. What mobile API endpoints exist that will need auth?
4. How are other features securing API access?
5. What session management patterns exist?

1.2 Execute Research (Subagent)

SUBAGENT DISPATCH: YES REASON: Exploration with uncertain scope. Subagent reads N files, returns synthesis.

Task (or subagent simulation):
  description: "Research Agent - Codebase Patterns"
  prompt: |
    You are a research agent. Your job is to answer these specific questions about
    the codebase. For each question:

    1. Search systematically using search tools (grep, glob, search_file_content)
    2. Read relevant files
    3. Extract patterns, conventions, precedents
    4. FLAG any ambiguities or conflicting patterns
    5. EXPLICITLY state 'UNKNOWN' if evidence is insufficient

    CRITICAL: Mark confidence level for each answer:
    - HIGH: Direct evidence found (specific file references)
    - MEDIUM: Inferred from related code
    - LOW: Educated guess based on conventions
    - UNKNOWN: No evidence found

    QUESTIONS TO ANSWER:
    [Insert questions from Phase 1.1]

    RETURN FORMAT (strict JSON):
    {
      "findings": [
        {
          "question": "...",
          "answer": "...",
          "confidence": "HIGH|MEDIUM|LOW|UNKNOWN",
          "evidence": ["file:line", ...],
          "ambiguities": ["..."]
        }
      ],
      "patterns_discovered": [
        {
          "name": "...",
          "files": ["..."],
          "description": "..."
        }
      ],
      "unknowns": ["..."]
    }

ERROR HANDLING:

If subagent fails: Retry once with same instructions
If second failure: Return findings with all items marked UNKNOWN
Note: "Research failed after 2 attempts: [error]"
Do NOT block progress - user chooses to proceed or retry

TIMEOUT: 120 seconds per subagent

1.3 Ambiguity Extraction

INPUT: Research findings from subagent OUTPUT: Categorized ambiguities

Process:

Extract all MEDIUM/LOW/UNKNOWN confidence items
Extract all flagged ambiguities
Categorize by type:
- Technical: How it works (e.g., "Two auth patterns found - which to use?")
- Scope: What to include (e.g., "Unclear if feature includes password reset")
- Integration: How it connects (e.g., "Multiple integration points - which is primary?")
- Terminology: What terms mean (e.g., "'Session' used inconsistently")
Prioritize by impact on design (HIGH/MEDIUM/LOW)

Example Output:

Categorized Ambiguities:

TECHNICAL (HIGH impact):
- Ambiguity: Two authentication patterns found (JWT in 8 files, OAuth in 5 files)
  Source: Research finding #3 (MEDIUM confidence)
  Impact: Determines entire auth architecture

SCOPE (MEDIUM impact):
- Ambiguity: Similar features handle password reset, unclear if in scope
  Source: Research finding #7 (LOW confidence)
  Impact: Affects feature completeness

1.4 Research Quality Score

SCORING FORMULAS:

// 1. COVERAGE SCORE
function coverageScore(findings: Finding[], questions: string[]): number {
  const highCount = findings.filter(f => f.confidence === "HIGH").length;
  if (questions.length === 0) return 100;
  return (highCount / questions.length) * 100;
}

// 2. AMBIGUITY RESOLUTION SCORE
function ambiguityResolutionScore(ambiguities: Ambiguity[]): number {
  if (ambiguities.length === 0) return 100;
  const categorized = ambiguities.filter(a => a.category && a.impact);
  return (categorized.length / ambiguities.length) * 100;
}

// 3. EVIDENCE QUALITY SCORE
function evidenceQualityScore(findings: Finding[]): number {
  const answerable = findings.filter(f => f.confidence !== "UNKNOWN");
  if (answerable.length === 0) return 0;
  const withEvidence = answerable.filter(f => f.evidence.length > 0);
  return (withEvidence.length / answerable.length) * 100;
}

// 4. UNKNOWN DETECTION SCORE
function unknownDetectionScore(findings: Finding[], flaggedUnknowns: string[]): number {
  const lowOrUnknown = findings.filter(f =>
    f.confidence === "UNKNOWN" || f.confidence === "LOW"
  );
  if (lowOrUnknown.length === 0) return 100;
  return (flaggedUnknowns.length / lowOrUnknown.length) * 100;
}

// OVERALL SCORE: Weakest link determines quality
function overallScore(...scores: number[]): number {
  return Math.min(...scores);  // All must be 100%
}

DISPLAY FORMAT:

Research Quality Score: [X]%

Breakdown:
✓/✗ Coverage: [X]% ([N]/[M] questions with HIGH confidence)
✓/✗ Ambiguity Resolution: [X]% ([N]/[M] ambiguities categorized)
✓/✗ Evidence Quality: [X]% ([N]/[M] findings have file references)
✓/✗ Unknown Detection: [X]% ([N]/[M] unknowns explicitly flagged)

Overall: [X]% (minimum of all criteria)

GATE BEHAVIOR:

IF SCORE < 100%:

Research Quality Score: [X]% - Below threshold

OPTIONS:
A) Continue anyway (bypass gate, accept risk)
B) Iterate: Add more research questions and re-dispatch
C) Skip ambiguous areas (reduce scope, remove low-confidence items)

Your choice: ___

IF SCORE = 100%:

Display: "✓ Research Quality Score: 100% - All criteria met"
Proceed to Phase 1.5

Phase 1.5: Informed Discovery & Validation

<CRITICAL> Use research findings to generate informed questions. Apply Adaptive Response Handler (ARH) pattern for intelligent response processing. All discovery must achieve 100% completeness score before proceeding to design. </CRITICAL>

Adaptive Response Handler (ARH) Pattern

The ARH pattern provides intelligent handling of user responses during discovery. Instead of requiring exact answers, it adapts to various response types:

Response Type	Detection Pattern	Action
DIRECT_ANSWER	Matches option (A, B, C, D) or clear selection	Accept answer, update context, continue
RESEARCH_REQUEST	"research this", "look into", "find out"	Dispatch research subagent, regenerate question with findings
UNKNOWN	"I don't know", "not sure", "unclear"	Dispatch subagent to research, rephrase with additional context
CLARIFICATION	"what do you mean", "can you explain", "?"	Rephrase question with more context, examples, re-ask
SKIP	"skip", "not relevant", "doesn't apply"	Mark as out-of-scope, add to explicit_exclusions, continue
USER_ABORT	"stop", "cancel", "exit"	Save current state, exit cleanly with resume instructions

Apply this pattern to ALL discovery questions in Phase 1.5.

1.5.0 Disambiguation Session

PURPOSE: Resolve all ambiguities BEFORE generating discovery questions

For each ambiguity from Phase 1.3, present:

AMBIGUITY: [description from Phase 1.3]

CONTEXT FROM RESEARCH:
[Relevant research findings with evidence]

IMPACT ON DESIGN:
[Why this matters / what breaks if we guess wrong]

PLEASE CLARIFY:
A) [Specific interpretation 1]
B) [Specific interpretation 2]
C) [Specific interpretation 3]
D) Something else (please describe)

Your choice: ___

PROCESSING (ARH Pattern):

Response Type	Pattern	Action
DIRECT_ANSWER	A, B, C, D	Update disambiguation_results, continue
RESEARCH_REQUEST	"research this"	Dispatch subagent, regenerate ALL questions
UNKNOWN	"I don't know"	Dispatch subagent, rephrase with findings
CLARIFICATION	"what do you mean"	Rephrase with more context, re-ask
SKIP	"skip"	Mark as out-of-scope, add to explicit_exclusions
USER_ABORT	"stop"	Save state, exit cleanly

Example Flow:

Question: "Research found JWT (8 files) and OAuth (5 files). Which should we use?"
User: "What's the difference? I don't know which is better."

ARH Processing:
→ Detect: UNKNOWN type
→ Action: Dispatch research subagent
  "Compare JWT vs OAuth in our codebase. Return pros/cons."
→ Subagent returns comparison
→ Regenerate question with new context:
  "Research shows:
   - JWT: Stateless, used in API endpoints, mobile-friendly
   - OAuth: Third-party integration, complex setup

   For mobile API auth, which fits better?
   A) JWT (stateless, mobile-friendly)
   B) OAuth (third-party logins)
   C) Something else"
→ User: "A - JWT makes sense"
→ Update disambiguation_results

1.5.1 Generate Deep Discovery Questions

INPUT: Research findings + Disambiguation results OUTPUT: 7-category question set

GENERATION RULES:

Use research findings to make questions specific (not generic)
Reference concrete codebase patterns in questions
Include assumption checks in every category
Generate 3-5 questions per category

7 CATEGORIES:

1. Architecture & Approach

How should [feature] integrate with [discovered pattern]?
Should we follow [pattern A from file X] or [pattern B from file Y]?
ASSUMPTION CHECK: Does [discovered constraint] apply here?

2. Scope & Boundaries

Research shows [N] similar features. Should this match their scope?
Explicit exclusions: What should this NOT do?
MVP definition: What's the minimum for success?
ASSUMPTION CHECK: Are we building for [discovered use case]?

3. Integration & Constraints

Research found [integration points]. Which are relevant?
Interface verification: Should we match [discovered interface]?
ASSUMPTION CHECK: Must this work with [discovered dependency]?

4. Failure Modes & Edge Cases

Research shows [N] edge cases in similar code. Which apply?
What happens if [dependency] fails?
How should we handle [boundary condition]?

5. Success Criteria & Observability

Measurable thresholds: What numbers define success?
How will we know this works in production?
What metrics should we track?

6. Vocabulary & Definitions

Research uses terms [X, Y, Z]. What do they mean here?
Are [term A] and [term B] synonyms?
Build glossary incrementally

7. Assumption Audit

I assume [X] based on [research finding]. Correct?
Explicit validation of ALL research-based assumptions

Example Questions (Architecture):

Feature: "Add JWT authentication for mobile API"

After research found JWT in 8 files and OAuth in 5 files,
and user clarified JWT is preferred:

1. Research shows JWT implementation in src/api/auth.ts using jose library.
   Should we follow this pattern or use a different JWT library?
   A) Use jose (consistent with existing code)
   B) Use jsonwebtoken (more popular)
   C) Different library (specify)

2. Existing JWT implementations store tokens in Redis (src/cache/tokens.ts).
   Should we use the same storage approach?
   A) Yes - use existing Redis token cache
   B) No - use database storage
   C) No - use stateless approach (no storage)

1.5.2 Conduct Discovery Wizard (with ARH)

Present questions one category at a time (7 iterations):

## Discovery Wizard (Research-Informed)

Based on research findings and disambiguation, I have questions in 7 categories.

### Category 1/7: Architecture & Approach

[Present 3-5 questions]
[Wait for responses, process with ARH]

### Category 2/7: Scope & Boundaries
[Continue...]

Progress tracking: "[Category N/7]: X/Y questions answered"

1.5.3 Build Glossary

Process:

Extract domain terms from discovery answers (during wizard)
Build glossary incrementally
After wizard completes, show full glossary
Ask user ONCE about persistence

I've built a glossary with [N] terms:
[Show glossary preview]

Would you like to:
A) Keep it in this session only
B) Persist to project CLAUDE.md (all team members benefit)

IF B SELECTED - Glossary Persistence Protocol:

Location: Append to end of project CLAUDE.md file

Format:


---

## Feature Glossary: [Feature Name]

**Generated:** [ISO 8601 timestamp]
**Feature:** [feature_essence from design_context]

### Terms

**[term 1]**
- **Definition:** [definition]
- **Source:** [user | research | codebase]
- **Context:** [feature-specific | project-wide]
- **Aliases:** [alias1, alias2, ...]

**[term 2]**
[...]

---

Write Operation:

Read current CLAUDE.md content
Append formatted glossary (as above)
Write back to CLAUDE.md
Verify write succeeded

ERROR HANDLING:

If write fails (permission denied, read-only): Fallback to

~/.local/spellbook/docs/<project-encoded>/glossary-[feature-slug].md

Show location: "Glossary saved to: [path]"
Suggest: "Manually append to CLAUDE.md when ready"

COLLISION HANDLING:

Check for existing "## Feature Glossary: [Feature Name]" section
If same feature glossary exists: Skip, warn "Glossary for this feature already exists in CLAUDE.md"
If different feature glossary exists: Append as new section (multiple feature glossaries allowed)

1.5.4 Synthesize design_context

Build complete

DesignContext

object from all prior phases. (See data structure above.)

Validation:

No null values allowed (except optional fields)
No "TBD" or "unknown" strings
All arrays with content or explicit "N/A"

1.5.5 Completeness Checklist (11 Validation Functions)

// FUNCTION 1: Research quality validated
function research_quality_validated(): boolean {
  return quality_scores.research_quality === 100 || override_flag === true;
}

// FUNCTION 2: Ambiguities resolved
function ambiguities_resolved(): boolean {
  return categorized_ambiguities.every(amb =>
    disambiguation_results.hasOwnProperty(amb.description)
  );
}

// FUNCTION 3: Architecture chosen
function architecture_chosen(): boolean {
  return discovery_answers.architecture.chosen_approach !== null &&
         discovery_answers.architecture.rationale !== null;
}

// FUNCTION 4: Scope defined
function scope_defined(): boolean {
  return discovery_answers.scope.in_scope.length > 0 &&
         discovery_answers.scope.out_of_scope.length > 0;
}

// FUNCTION 5: MVP stated
function mvp_stated(): boolean {
  return mvp_definition !== null && mvp_definition.length > 10;
}

// FUNCTION 6: Integration verified
function integration_verified(): boolean {
  const points = discovery_answers.integration.integration_points;
  return points.length > 0 && points.every(p => p.validated === true);
}

// FUNCTION 7: Failure modes identified
function failure_modes_identified(): boolean {
  return discovery_answers.failure_modes.edge_cases.length > 0 ||
         discovery_answers.failure_modes.failure_scenarios.length > 0;
}

// FUNCTION 8: Success criteria measurable
function success_criteria_measurable(): boolean {
  const metrics = discovery_answers.success_criteria.metrics;
  return metrics.length > 0 && metrics.every(m => m.threshold !== null);
}

// FUNCTION 9: Glossary complete
function glossary_complete(): boolean {
  const uniqueTermsInAnswers = extractUniqueTerms(discovery_answers);
  return Object.keys(glossary).length >= uniqueTermsInAnswers.length ||
         user_said_no_glossary_needed === true;
}

// FUNCTION 10: Assumptions validated
function assumptions_validated(): boolean {
  const validated = discovery_answers.assumptions.validated;
  return validated.length > 0 && validated.every(a => a.confidence !== null);
}

// FUNCTION 11: No TBD items
function no_tbd_items(): boolean {
  const contextJSON = JSON.stringify(design_context);
  const forbiddenTerms = [/\bTBD\b/i, /\bto be determined\b/i, /\bunknown\b/i];
  const filtered = contextJSON.replace(/"confidence":\s*"[^"]*"/g, '');
  return !forbiddenTerms.some(regex => regex.test(filtered));
}

SCORE CALCULATION:

const checked_count = Object.values(validation_results).filter(v => v === true).length;
const completeness_score = (checked_count / 11) * 100;

DISPLAY FORMAT:

Completeness Checklist:

[✓/✗] All research questions answered with HIGH confidence
[✓/✗] All ambiguities disambiguated
[✓/✗] Architecture approach explicitly chosen and validated
[✓/✗] Scope boundaries defined with explicit exclusions
[✓/✗] MVP definition stated
[✓/✗] Integration points verified against codebase
[✓/✗] Failure modes and edge cases identified
[✓/✗] Success criteria defined with measurable thresholds
[✓/✗] Glossary complete for all domain terms
[✓/✗] All assumptions validated with user
[✓/✗] No "we'll figure it out later" items remain

Completeness Score: [X]% ([N]/11 items complete)

GATE BEHAVIOR:

IF completeness_score < 100:

Completeness Score: [X]% - Below threshold

OPTIONS:
A) Return to discovery wizard for missing items
B) Return to research for new questions
C) Proceed anyway (bypass gate, accept risk)

Your choice: ___

IF completeness_score == 100:

Proceed to Phase 1.5.6

1.5.6 Create Understanding Document

FILE PATH:

~/.local/spellbook/docs/<project-encoded>/understanding/understanding-[feature-slug]-[timestamp].md

Generate Understanding Document:

# Understanding Document: [Feature Name]

## Feature Essence
[1-2 sentence summary]

## Research Summary
- Patterns discovered: [...]
- Integration points: [...]
- Constraints identified: [...]

## Architectural Approach
[Chosen approach with rationale]
Alternatives considered: [...]

## Scope Definition
IN SCOPE:
- [...]

EXPLICITLY OUT OF SCOPE:
- [...]

MVP DEFINITION:
[Minimum viable implementation]

## Integration Plan
- Integrates with: [...]
- Follows patterns: [...]
- Interfaces: [...]

## Failure Modes & Edge Cases
- [...]

## Success Criteria
- Metric 1: [threshold]
- Metric 2: [threshold]

## Glossary
[Full glossary from Phase 1.5.3]

## Validated Assumptions
- [assumption]: [validation]

## Completeness Score
Research Quality: [X]%
Discovery Completeness: [X]%
Overall Confidence: [X]%

Present to user:

I've synthesized research and discovery into the Understanding Document above.

Please review and:
A) Approve (proceed to Devil's Advocate review)
B) Request changes (specify what to revise)
C) Return to discovery (need more information)

Your choice: ___

BLOCK design phase until user approves (A).

1.6 Devil's Advocate Review

<CRITICAL> The devils-advocate skill is a REQUIRED dependency for this workflow. Check availability before attempting invocation. </CRITICAL>

1.6.1 Check Devil's Advocate Availability

Verify skill exists in available skills list.

IF SKILL NOT AVAILABLE:

WARNING: devils-advocate skill not found in available skills.

The Devil's Advocate review is REQUIRED for quality assurance.

OPTIONS:
A) Install skill first (recommended)
   Run 'uv run install.py' from spellbook directory, then restart session

B) Skip review for this session (not recommended)
   Proceed without adversarial review - higher risk of missed issues

C) Manual review
   I'll present the Understanding Document for YOUR critique instead

Your choice: ___

Handle user choice:

A (Install): Exit with instructions: "Run 'uv run install.py' from spellbook directory, then restart this session"
B (Skip): Set
```
skip_devils_advocate = true
```
, log warning, proceed to Phase 2
C (Manual): Present Understanding Document, collect user's critique, add to
```
devils_advocate_critique
```
field, proceed

1.6.2 Invoke Devil's Advocate Skill

<RULE>Subagent MUST invoke devils-advocate skill using the Skill tool.</RULE>

Task (or subagent simulation):
  description: "Devil's Advocate Review"
  prompt: |
    First, invoke the devils-advocate skill using the Skill tool.
    Then follow its complete workflow.

    ## Context for the Skill

    Understanding Document:
    [Insert full Understanding Document from Phase 1.5.6]

Present critique to user with options:

## Devil's Advocate Critique

[Full critique output from skill]

---

Please review and choose next steps:
A) Address critical issues (return to discovery for specific gaps)
B) Document as known limitations (add to Understanding Document)
C) Revise scope to avoid risky areas
D) Proceed to design (accept identified risks)

Your choice: ___

Phase 2: Design

<CRITICAL> Phase behavior depends on escape hatch: - **No escape hatch:** Run full Phase 2 - **Design doc with "review first":** Skip 2.1, start at 2.2 - **Design doc with "treat as ready":** Skip entire Phase 2 - **Impl plan escape hatch:** Skip entire Phase 2 </CRITICAL>

2.1 Create Design Document

<RULE>Subagent MUST invoke brainstorming in SYNTHESIS MODE.</RULE>

Task (or subagent simulation):
  description: "Create design document"
  prompt: |
    First, invoke the brainstorming skill using the Skill tool.
    Then follow its complete workflow.

    IMPORTANT: This is SYNTHESIS MODE - all discovery is complete.
    DO NOT ask questions. Use the comprehensive context below.

    ## Autonomous Mode Context

    **Mode:** AUTONOMOUS - Proceed without asking questions
    **Protocol:** See patterns/autonomous-mode-protocol.md
    **Circuit breakers:** Only pause for security-critical or contradictory requirements

    ## Pre-Collected Discovery Context

    [Insert complete SESSION_CONTEXT.design_context]

    ## Task

    Using the brainstorming skill in synthesis mode:
    1. Skip "Understanding the idea" phase - context is complete
    2. Skip "Exploring approaches" questions - decisions are made
    3. Go directly to "Presenting the design"
    4. Do NOT ask "does this look right so far" - proceed through all sections
    5. Save to: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-design.md

2.2 Review Design Document

<RULE>Subagent MUST invoke reviewing-design-docs.</RULE>

Task (or subagent simulation):
  description: "Review design document"
  prompt: |
    First, invoke the reviewing-design-docs skill using the Skill tool.
    Then follow its complete workflow.

    ## Context for the Skill

    Design document location: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-design.md

    Return the complete findings report with remediation plan.

2.3 Approval Gate

Approval Gate Logic:

def handle_review_checkpoint(findings, mode):
    if mode == "autonomous":
        # Never pause - proceed automatically
        # CRITICAL: Always favor most complete/correct fixes
        if findings:
            dispatch_fix_subagent(
                findings,
                fix_strategy="most_complete",  # Not "quickest"
                treat_suggestions_as="mandatory",  # Not "optional"
                fix_depth="root_cause"  # Not "surface_symptom"
            )
        return "proceed"

    if mode == "interactive":
        # Always pause - wait for user
        if len(findings) > 0:
            present_findings_summary(findings)
            display("Type 'continue' when ready for me to fix these issues.")
            wait_for_user_input()
            dispatch_fix_subagent(findings)
        else:
            display("Review complete - no issues found.")
            display("Ready to proceed to next phase?")
            wait_for_user_acknowledgment()
        return "proceed"

    if mode == "mostly_autonomous":
        # Only pause for critical blockers
        critical_findings = [f for f in findings if f.severity == "critical"]
        if critical_findings:
            present_critical_blockers(critical_findings)
            wait_for_user_input()
        if findings:
            dispatch_fix_subagent(findings)
        return "proceed"

2.4 Fix Design Document

<RULE>Subagent MUST invoke executing-plans.</RULE>

<CRITICAL> In autonomous mode, ALWAYS favor most complete and correct solutions: - Treat suggestions as mandatory improvements - Fix root causes, not just symptoms - Ensure fixes maintain consistency </CRITICAL>

Task (or subagent simulation):
  description: "Fix design document"
  prompt: |
    First, invoke the executing-plans skill using the Skill tool.
    Then use its workflow to systematically fix the design document.

    ## Context for the Skill

    Review findings to address:
    [Paste complete findings report and remediation plan]

    Design document location: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-design.md

    ## Fix Quality Requirements

    - Address ALL items: critical, important, minor, AND suggestions
    - Choose fixes that produce highest quality results
    - Fix underlying issues, not just surface symptoms

Phase 3: Implementation Planning

<CRITICAL> Phase behavior depends on escape hatch: - **No escape hatch:** Run full Phase 3 - **Impl plan with "review first":** Skip 3.1, start at 3.2 - **Impl plan with "treat as ready":** Skip entire Phase 3 </CRITICAL>

3.1 Create Implementation Plan

<RULE>Subagent MUST invoke writing-plans.</RULE>

Task (or subagent simulation):
  description: "Create implementation plan"
  prompt: |
    First, invoke the writing-plans skill using the Skill tool.
    Then follow its complete workflow.

    ## Context for the Skill

    Design document: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-design.md
    Parallelization preference: [maximize/conservative/ask]

    Save to: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-impl.md

3.2 Review Implementation Plan

<RULE>Subagent MUST invoke reviewing-impl-plans.</RULE>

Task (or subagent simulation):
  description: "Review implementation plan"
  prompt: |
    First, invoke the reviewing-impl-plans skill using the Skill tool.
    Then follow its complete workflow.

    ## Context for the Skill

    Implementation plan: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-impl.md
    Parent design document: ~/.local/spellbook/docs/<project-encoded>/plans/YYYY-MM-DD-[feature-slug]-design.md

    Return complete findings report with remediation plan.

3.3 Approval Gate

Same logic as Phase 2.3.

3.4 Fix Implementation Plan

Same pattern as Phase 2.4 but for implementation plan.

3.4.5 Execution Mode Analysis

<CRITICAL> Analyze feature size and complexity to determine optimal execution strategy. </CRITICAL>

Token Estimation:

TOKENS_PER_KB = 350
BASE_OVERHEAD = 20000
TOKENS_PER_TASK_OUTPUT = 2000
TOKENS_PER_REVIEW = 800
TOKENS_PER_FACTCHECK = 500
TOKENS_PER_FILE = 400
CONTEXT_WINDOW = 200000

def estimate_session_tokens(design_context_kb, design_doc_kb, impl_plan_kb, num_tasks, num_files):
    design_phase = (design_context_kb + design_doc_kb + impl_plan_kb) * TOKENS_PER_KB
    per_task = TOKENS_PER_TASK_OUTPUT + TOKENS_PER_REVIEW + TOKENS_PER_FACTCHECK
    execution_phase = num_tasks * per_task
    file_context = num_files * TOKENS_PER_FILE
    return BASE_OVERHEAD + design_phase + execution_phase + file_context

Parse implementation plan:

```
num_tasks
```
: Count all
```
- [ ] Task N.M:
```
lines
```
num_files
```
: Count all unique files in "Files:" lines
```
num_parallel_tracks
```
: Count all
```
## Track N:
```
headers

Execution Mode Selection:

def recommend_execution_mode(estimated_tokens, num_tasks, num_parallel_tracks):
    usage_ratio = estimated_tokens / CONTEXT_WINDOW

    if num_tasks > 25 or usage_ratio > 0.80:
        return "swarmed", "Feature size exceeds safe single-session capacity"

    if usage_ratio > 0.65 or (num_tasks > 15 and num_parallel_tracks >= 3):
        return "swarmed", "Large feature with good parallelization potential"

    if num_tasks > 10 or usage_ratio > 0.40:
        return "delegated", "Moderate size, subagents can handle workload"

    return "direct", "Small feature, direct execution is efficient"

Modes:

swarmed: Generate work packets, spawn separate sessions, EXIT this session
delegated: Stay in session, delegate heavily to subagents
direct: Stay in session, minimal delegation

Routing:

If
```
swarmed
```
: Proceed to 3.5 and 3.6
If
```
delegated
```
or
```
direct
```
: Skip to Phase 4

3.5 Generate Work Packets (if swarmed)

<CRITICAL>Only runs when execution_mode is "swarmed".</CRITICAL>

Track Extraction:

def extract_tracks_from_impl_plan(impl_plan_content):
    tracks = []
    current_track = None

    for line in impl_plan_content.split('\n'):
        if line.startswith('## Track '):
            if current_track:
                tracks.append(current_track)
            parts = line[9:].split(':', 1)
            track_id = int(parts[0].strip())
            track_name = parts[1].strip().lower().replace(' ', '-')
            current_track = {
                "id": track_id,
                "name": track_name,
                "depends_on": [],
                "tasks": [],
                "files": []
            }
        elif current_track and line.strip().startswith('<!-- depends-on:'):
            deps_str = line.strip()[16:-4]
            for dep in deps_str.split(','):
                if dep.strip().startswith('Track '):
                    dep_id = int(dep.strip()[6:])
                    current_track["depends_on"].append(dep_id)
        elif current_track and line.strip().startswith('- [ ] Task '):
            current_track["tasks"].append(line.strip()[6:])
        elif current_track and line.strip().startswith('Files:'):
            files = [f.strip() for f in line.strip()[6:].split(',')]
            current_track["files"].extend(files)

    if current_track:
        tracks.append(current_track)
    return tracks

Create work packet directory:

~/.claude/work-packets/[feature-slug]/

Generate files:

```
manifest.json
```
: Track metadata, dependencies, status
```
README.md
```
: Execution instructions with quality gate checklist
```
track-{id}-{name}.md
```
: Work packet per track

Work Packet Template

<CRITICAL> Work packets MUST include mandatory quality gates. Packets without gates produce incomplete work that passes tests but fails in production. </CRITICAL>

Each

track-{id}-{name}.md

MUST follow this template:

# Work Packet: [Track Name]

**Feature:** [feature-name]
**Track:** [track-id]
**Dependencies:** [list or "none"]

## Context

[Design context, architectural constraints, interfaces]

## Tasks

[Task list from implementation plan]

## Quality Gates (MANDATORY)

After completing ALL tasks in this packet, you MUST run:

### Gate 1: Implementation Completion Verification

For each task, verify:
- [ ] All acceptance criteria traced to code
- [ ] All expected outputs exist with correct interfaces
- [ ] No dead code paths or unused implementations

### Gate 2: Code Review

Invoke `requesting-code-review` skill:
- Files: [list of files created/modified]
- Review criteria: code quality, error handling, type safety, security

Fix ALL critical and important issues before proceeding.

### Gate 3: Fact-Checking

Invoke `fact-checking` skill:
- Verify all docstrings match actual behavior
- Verify all comments are accurate
- Verify all type hints are correct
- Verify error messages are truthful

Fix ALL false claims before proceeding.

### Gate 4: Test Quality (Green Mirage Audit)

Invoke `audit-green-mirage` skill on test files:
- Verify tests have meaningful assertions (not just "passes")
- Verify tests cover error paths (not just happy path)
- Verify tests don't mock too much

Fix ALL green mirage issues before proceeding.

### Gate 5: Full Test Suite

Run `uv run pytest tests/` (or equivalent).
ALL tests must pass. No exceptions.

## Completion Checklist

Before marking this packet complete:

- [ ] All tasks implemented
- [ ] Gate 1: Implementation completion verified
- [ ] Gate 2: Code review passed (no critical/important issues)
- [ ] Gate 3: Fact-checking passed (no false claims)
- [ ] Gate 4: Green mirage audit passed
- [ ] Gate 5: Full test suite passes
- [ ] Changes committed with descriptive message

If ANY checkbox is unchecked, the packet is NOT complete.

README.md Template

The work packet

README.md

MUST include:

# Work Packets: [Feature Name]

## Execution Protocol

<CRITICAL>
Each packet includes MANDATORY quality gates. Do NOT skip them.
Completing tasks without running gates produces incomplete work.
</CRITICAL>

### For Each Packet:

1. Read the packet's Context section
2. Implement all Tasks using TDD
3. Run ALL Quality Gates (5 gates, in order)
4. Complete the Completion Checklist
5. Commit with descriptive message
6. Update manifest.json status to "complete"

### Quality Gate Summary

| Gate | Skill to Invoke | Pass Criteria |
|------|-----------------|---------------|
| Implementation Completion | (manual verification) | All criteria traced |
| Code Review | requesting-code-review | No critical/important issues |
| Fact-Checking | fact-checking | No false claims |
| Green Mirage Audit | audit-green-mirage | No mirage issues |
| Test Suite | (run tests) | All tests pass |

### After All Packets Complete

Run final integration verification across all packets.

3.6 Session Handoff (TERMINAL)

<CRITICAL> After handoff, this session TERMINATES. Orchestrator's job ends here. Workers take over execution. </CRITICAL>

spawn_claude_session

MCP tool available:

Would you like me to:
1. Auto-launch all [count] independent tracks now
2. Provide manual commands for you to run
3. Launch only specific tracks

Please choose: ___

Otherwise, provide manual commands:

# Create worktree
git worktree add [worktree_path] -b [branch_name]

# Start Claude session with work packet
cd [worktree_path]
claude --session-context [work_packet_path]

EXIT this session after handoff.

Phase 4: Implementation

<CRITICAL> This phase only executes if execution_mode is "delegated" or "direct". During Phase 4, delegate actual work to subagents. Main context is for ORCHESTRATION ONLY. </CRITICAL>

Phase 4 Delegation Rules

Main context handles:

Task sequencing and dependency management
Quality gate verification
User interaction and approvals
Synthesizing subagent results
Session state management

Subagents handle:

Writing code (invoke test-driven-development)
Running tests (Bash subagent)
Code review (invoke requesting-code-review)
Fact-checking (invoke fact-checking)
File exploration and research

<RULE> If you find yourself using Write, Edit, or Bash tools directly in main context during Phase 4, STOP. Delegate to a subagent instead. </RULE>

Why: Main context accumulates tokens rapidly. Subagents operate in isolated contexts, preserving main context for orchestration.

4.1 Setup Worktree(s)

If worktree == "single":

Task (or subagent simulation):
  description: "Create worktree"
  prompt: |
    First, invoke the using-git-worktrees skill using the Skill tool.
    Create an isolated workspace for this feature.

    ## Context for the Skill

    Feature name: [feature-slug]
    Purpose: Isolated implementation

    Return the worktree path when done.

If worktree == "per_parallel_track":

<CRITICAL> Before creating parallel worktrees, setup/skeleton work MUST be completed and committed. This ensures all worktrees start with shared interfaces. </CRITICAL>

Identify setup/skeleton tasks from impl plan
Execute setup tasks in main branch, commit
Create worktree per parallel group

If worktree == "none": Work in current directory.

4.2 Execute Implementation Plan

If worktree == "per_parallel_track":

Execute each parallel track in its own worktree:

For each worktree:
  if dependencies not completed: skip (process in next round)

  Task (run_in_background: true):
    description: "Execute tasks in [worktree.path]"
    prompt: |
      First, invoke the executing-plans skill using the Skill tool.
      Execute assigned tasks in this worktree.

      Tasks: [worktree.tasks]
      Working directory: [worktree.path]

      IMPORTANT: Work ONLY in this worktree.

      After each task:
      1. Run code review (invoke requesting-code-review)
      2. Run claim validation (invoke fact-checking)
      3. Commit changes

After all parallel tracks complete, proceed to 4.2.5.

If parallelization == "maximize" (single worktree):

Task:
  description: "Execute parallel implementation"
  prompt: |
    First, invoke the dispatching-parallel-agents skill using the Skill tool.
    Execute the implementation plan with parallel task groups.

    Implementation plan: [path]
    Group tasks by "Parallel Group" field.

If parallelization == "conservative":

Sequential execution via executing-plans skill.

4.2.5 Smart Merge (if per_parallel_track)

<RULE>Subagent MUST invoke merging-worktrees skill.</RULE>

Task:
  description: "Smart merge parallel worktrees"
  prompt: |
    First, invoke the merging-worktrees skill using the Skill tool.
    Merge all parallel worktrees.

    ## Context for the Skill

    Base branch: [branch with setup work]
    Worktrees to merge: [list]
    Interface contracts: [impl plan path]

    After successful merge:
    1. Delete all worktrees
    2. Single unified branch with all work
    3. All tests pass
    4. Interface contracts verified

4.3 Implementation Task Subagent Template

For each individual task:

Task:
  description: "Implement Task N: [name]"
  prompt: |
    First, invoke the test-driven-development skill using the Skill tool.
    Implement this task following TDD strictly.

    ## Context for the Skill

    Implementation plan: [path]
    Task number: N
    Working directory: [worktree or current]

    Commit when done.
    Report: files changed, test results, commit hash.

4.4 Implementation Completion Verification

<CRITICAL> Runs AFTER each task and BEFORE code review. Catches incomplete work early. </CRITICAL>

Task:
  description: "Verify Task N completeness"
  prompt: |
    You are an Implementation Completeness Auditor. Verify claimed work
    was actually done - not quality, just existence and completeness.

    ## Task Being Verified

    Task number: N
    Task description: [from plan]

    ## Verification Protocol

    For EACH item, trace through actual code. Do NOT trust file names.

    ### 1. Acceptance Criteria Verification
    For each criterion:
    1. State the criterion
    2. Identify where in code it should be
    3. Trace the execution path
    4. Verdict: COMPLETE | INCOMPLETE | PARTIAL

    ### 2. Expected Outputs Verification
    For each expected output:
    1. State the expected output
    2. Verify it exists
    3. Verify interface/signature
    4. Verdict: EXISTS | MISSING | WRONG_INTERFACE

    ### 3. Interface Contract Verification
    For each interface:
    1. State contract from plan
    2. Find actual implementation
    3. Compare signatures, types, behavior
    4. Verdict: MATCHES | DIFFERS | MISSING

    ### 4. Behavior Verification
    For key behaviors:
    1. State expected behavior
    2. Trace: can this behavior actually occur?
    3. Identify dead code paths
    4. Verdict: FUNCTIONAL | NON_FUNCTIONAL | PARTIAL

    ## Output Format

    ```
    TASK N COMPLETION AUDIT

    Overall: COMPLETE | INCOMPLETE | PARTIAL

    ACCEPTANCE CRITERIA:
    ✓ [criterion 1]: COMPLETE
    ✗ [criterion 2]: INCOMPLETE - [what's missing]

    EXPECTED OUTPUTS:
    ✓ src/foo.ts: EXISTS, interface matches
    ✗ src/bar.ts: MISSING

    INTERFACE CONTRACTS:
    ✓ FooService.doThing(): MATCHES
    ✗ BarService.process(): DIFFERS - missing param

    BEHAVIOR VERIFICATION:
    ✓ User can create widget: FUNCTIONAL
    ✗ Widget validates input: NON_FUNCTIONAL - validation never called

    BLOCKING ISSUES (must fix before proceeding):
    1. [issue]

    TOTAL: [N]/[M] items complete
    ```

Gate Behavior:

IF BLOCKING ISSUES found:

Return to task implementation
Fix incomplete items
Re-run verification
Loop until all COMPLETE

IF all COMPLETE:

Proceed to 4.5 (Code Review)

4.5 Code Review After Each Task

<RULE>Subagent MUST invoke requesting-code-review after EVERY task.</RULE>

Task:
  description: "Review Task N implementation"
  prompt: |
    First, invoke the requesting-code-review skill using the Skill tool.
    Review the implementation.

    ## Context for the Skill

    What was implemented: [from implementation report]
    Plan/requirements: Task N from [impl plan path]
    Base SHA: [commit before task]
    Head SHA: [commit after task]

    Return assessment with any issues.

If issues found:

Critical: Fix immediately
Important: Fix before next task
Minor: Note for later

4.5.1 Claim Validation After Each Task

<RULE>Subagent MUST invoke fact-checking after code review.</RULE>

Task:
  description: "Validate claims in Task N"
  prompt: |
    First, invoke the fact-checking skill using the Skill tool.
    Validate claims in the code just written.

    ## Context for the Skill

    Scope: Files created/modified in Task N only
    [List files]

    Focus on: docstrings, comments, test names, type hints, error messages.

    Return findings with any false claims to fix.

If false claims found: Fix immediately before next task.

4.6 Quality Gates After All Tasks

<CRITICAL>These gates are NOT optional. Run even if all tasks completed successfully.</CRITICAL>

4.6.1 Comprehensive Implementation Audit

<CRITICAL> Runs AFTER all tasks, BEFORE test suite. Verifies ENTIRE implementation plan against final codebase. Catches cross-task integration gaps and items that degraded. </CRITICAL>

Task:
  description: "Comprehensive implementation audit"
  prompt: |
    You are a Senior Implementation Auditor performing final verification.

    ## Inputs

    Implementation plan: [path]
    Design document: [path]

    ## Comprehensive Verification Protocol

    ### Phase 1: Plan Item Sweep

    For EVERY task in plan:
    1. List all acceptance criteria
    2. Trace through CURRENT codebase state
    3. Mark: COMPLETE | INCOMPLETE | DEGRADED

    DEGRADED means: passed per-task verification but no longer works

    ### Phase 2: Cross-Task Integration Verification

    For each integration point between tasks:
    1. Identify: Task A produces X, Task B consumes X
    2. Verify A's output exists with correct shape
    3. Verify B actually imports/calls A's output
    4. Verify connection works (types match, no dead imports)

    Common failures:
    - B imports from A but never calls it
    - Interface changed during B, A's callers not updated
    - Circular dependency introduced
    - Type mismatch producer/consumer

    ### Phase 3: Design Document Traceability

    For each requirement in design doc:
    1. Identify which task(s) should implement it
    2. Verify implementation exists
    3. Verify implementation matches design intent

    ### Phase 4: Feature Completeness

    Answer with evidence:
    1. Can user USE this feature end-to-end?
    2. Any dead ends (UI exists but handler missing)?
    3. Any orphaned pieces (code exists but nothing calls it)?
    4. Does happy path work?

    ## Output Format

    ```
    COMPREHENSIVE IMPLEMENTATION AUDIT

    Overall: COMPLETE | INCOMPLETE | PARTIAL

    ═══════════════════════════════════════
    PLAN ITEM SWEEP
    ═══════════════════════════════════════

    Task 1: [name]
    ✓ Criterion 1.1: COMPLETE
    ✗ Criterion 2.2: DEGRADED - broken by [commit]

    PLAN ITEMS: [N]/[M] complete ([X] degraded)

    ═══════════════════════════════════════
    CROSS-TASK INTEGRATION
    ═══════════════════════════════════════

    Task 1 → Task 2: ✓ Connected
    Task 2 → Task 3: ✗ DISCONNECTED - never calls

    INTEGRATIONS: [N]/[M] connected

    ═══════════════════════════════════════
    DESIGN TRACEABILITY
    ═══════════════════════════════════════

    Requirement: "Rate limiting"
    ◐ PARTIAL - exists but not applied to /login

    REQUIREMENTS: [N]/[M] implemented

    ═══════════════════════════════════════
    FEATURE COMPLETENESS
    ═══════════════════════════════════════

    End-to-end usable: YES | NO | PARTIAL
    Dead ends: [list]
    Orphaned code: [list]
    Happy path: WORKS | BROKEN at [step]

    ═══════════════════════════════════════
    BLOCKING ISSUES
    ═══════════════════════════════════════

    MUST FIX:
    1. [issue with location]
    ```

Gate Behavior:

IF BLOCKING ISSUES: Fix, re-run audit, loop until clean. IF clean: Proceed to 4.6.2.

4.6.2 Run Full Test Suite

pytest  # or npm test, cargo test, etc.

If tests fail:

Dispatch subagent to invoke systematic-debugging
Fix issues
Re-run until passing

4.6.3 Green Mirage Audit

<RULE>Subagent MUST invoke audit-green-mirage.</RULE>

Task:
  description: "Audit test quality"
  prompt: |
    First, invoke the audit-green-mirage skill using the Skill tool.
    Verify tests actually validate correctness.

    ## Context for the Skill

    Test files: [list of test files]
    Implementation files: [list of impl files]

    Focus on new code added by this feature.

If issues found: Fix tests, re-run until clean.

4.6.4 Comprehensive Claim Validation

<RULE>Subagent MUST invoke fact-checking for final comprehensive validation.</RULE>

Task:
  description: "Comprehensive claim validation"
  prompt: |
    First, invoke the fact-checking skill using the Skill tool.
    Perform comprehensive claim validation.

    ## Context for the Skill

    Scope: All files created/modified in this feature
    [Complete file list]

    Design document: [path]
    Implementation plan: [path]

    Cross-reference claims against design doc and impl plan.

If issues found: Fix, re-run until clean.

4.6.5 Pre-PR Claim Validation

<RULE>Before any PR creation, run final fact-checking pass.</RULE>

Task:
  description: "Pre-PR claim validation"
  prompt: |
    First, invoke the fact-checking skill using the Skill tool.
    Perform pre-PR validation.

    ## Context for the Skill

    Scope: Branch changes (all commits since merge-base with main)

    This is the absolute last line of defense.
    Nothing ships with false claims.

4.7 Finish Implementation

If post_impl == "offer_options":

Task:
  description: "Finish development branch"
  prompt: |
    First, invoke the finishing-a-development-branch skill using the Skill tool.
    Complete this development work.

    ## Context for the Skill

    Feature: [name]
    Branch: [current branch]
    All tests passing: yes
    All claims validated: yes

    Present options: merge, create PR, cleanup.

If post_impl == "auto_pr": Push branch, create PR with gh CLI, return URL.

If post_impl == "stop": Announce complete, summarize, list remaining TODOs.

Refactoring Mode

<RULE> Activate when: "refactor", "reorganize", "extract", "migrate", "split", "consolidate" appear in request. Refactoring is NOT greenfield. Behavior preservation is the primary constraint. </RULE>

Detection

if (request.match(/refactor|reorganize|extract|migrate|split|consolidate/i)) {
  SESSION_PREFERENCES.refactoring_mode = true;
}

Workflow Adjustments

Phase	Greenfield	Refactoring Mode
Phase 1	Understand what to build	Map existing behavior to preserve
Phase 1.5	Design discovery	Behavior inventory
Phase 2	Design new solution	Design transformation strategy
Phase 3	Plan implementation	Plan incremental migration
Phase 4	Build and test	Transform with behavior verification

Behavior Preservation Protocol

<CRITICAL> Every change must pass behavior verification before proceeding. No "I'll fix the tests later." Tests prove behavior preservation. </CRITICAL>

Before any change:

Identify existing behavior (tests, usage patterns, contracts)
Document behavior contracts (inputs → outputs)
Ensure test coverage for behaviors (add tests if missing)

During change:

Make smallest possible transformation
Run tests after each atomic change
Commit working state before next transformation

After change:

Verify all original behaviors preserved
Document any intentional behavior changes (with user approval)

Refactoring Patterns

Pattern	When	Key Constraint
Strangler Fig	Replacing system incrementally	Old and new coexist
Branch by Abstraction	Changing widely-used component	Introduce abstraction, swap impl
Parallel Change	Changing interfaces	Add new, migrate, remove old
Feature Toggles	Risky changes	Disable instantly if problems

Refactoring-Specific Quality Gates

Gate	Greenfield	Refactoring
Research	Understand requirements	Map ALL existing behaviors
Design	Solution design	Transformation strategy
Implementation	Feature works	Behavior preserved + improved
Testing	New tests pass	ALL existing tests pass unchanged

Refactoring Self-Check

[ ] Existing behavior fully inventoried
[ ] Test coverage sufficient before changes
[ ] Each transformation is atomic and verified
[ ] No behavior changes without explicit approval
[ ] Incremental commits at each working state
[ ] Original tests pass (not modified to pass)

<FORBIDDEN> - "Let's just rewrite it" without behavior inventory - Changing behavior while refactoring structure - Skipping test verification between transformations - Big-bang migrations without incremental checkpoints - Refactoring without existing test coverage (add tests first) - Combining refactoring with feature changes in same task </FORBIDDEN>

Skills Invoked

Phase	Skill	Purpose
1.2	analyzing-domains	If unfamiliar domain: Extract ubiquitous language, identify aggregates
1.6	devils-advocate	Challenge Understanding Document
2.1	brainstorming	Create design doc
2.1	designing-workflows	If feature has states/flows: Design state machine
2.2	reviewing-design-docs	Review design doc
2.4, 3.4	executing-plans	Fix findings
3.1	writing-plans	Create impl plan
3.2	reviewing-impl-plans	Review impl plan
3.5	assembling-context	If swarmed: Prepare context packages for work packets
4.1	using-git-worktrees	Create workspace(s)
4.2	dispatching-parallel-agents	Parallel execution
4.2	assembling-context	Prepare context for parallel subagents
4.2.5	merging-worktrees	Merge parallel worktrees
4.3	test-driven-development	TDD per task
4.5	requesting-code-review	Review per task
4.5.1, 4.6.4, 4.6.5	fact-checking	Claim validation
4.6.2	systematic-debugging	Debug test failures
4.6.3	audit-green-mirage	Test quality audit
4.7	finishing-a-development-branch	Complete workflow

Forge Integration (Optional)

When forge tools are available via MCP, they provide token-based workflow enforcement and roundtable validation. These tools are OPTIONAL but enhance workflow rigor.

Tool	Purpose
`forge_project_init`	Initialize feature decomposition with dependency graph
`forge_iteration_start`	Start/resume a feature iteration, get workflow token
`forge_iteration_advance`	Move to next stage after APPROVE verdict
`forge_iteration_return`	Return to earlier stage after ITERATE verdict
`forge_roundtable_convene`	Generate validation prompts with tarot archetypes
`forge_process_roundtable_response`	Parse LLM roundtable output for verdicts
`forge_select_skill`	Get recommended skill for current stage/context

Token System: Forge tools use tokens to enforce workflow order. Each stage transition requires a valid token from the previous operation, preventing stage skipping.

Roundtable Validation: The roundtable system uses tarot archetypes (Magician, Priestess, Hermit, Fool, Chariot, Justice, Lovers, Hierophant, Emperor, Queen) to validate stage completion from multiple perspectives.

<FORBIDDEN> ## Anti-Patterns

Skill Invocation

Embedding skill instructions in subagent prompts
Saying "use the X skill" without invoking via Skill tool
Duplicating skill content in orchestration

Phase 0

Skipping configuration wizard
Not detecting escape hatches in initial message
Asking preferences piecemeal instead of upfront

Phase 1

Only searching codebase, ignoring web and MCP
Not using user-provided links
Shallow research that misses patterns

Phase 1.5

Skipping informed discovery
Not using research findings to inform questions
Asking questions research already answered
Dispatching design without comprehensive design_context

Phase 2

Skipping design review
Proceeding without approval (in interactive mode)
Not fixing minor findings (in autonomous mode)

Phase 3

Skipping plan review
Not analyzing execution mode

Phase 4

Using Write/Edit/Bash directly in main context - delegate to subagents
Accumulating implementation details in main context
Skipping implementation completion verification
Skipping code review between tasks
Skipping claim validation between tasks
Not running comprehensive audit after all tasks
Not running audit-green-mirage
Committing without running tests
Trusting file names instead of tracing behavior

Parallel Worktrees

Creating worktrees WITHOUT completing setup/skeleton first
Creating worktrees WITHOUT committing setup work
Parallel subagents modifying shared code
Not honoring interface contracts
Skipping merging-worktrees
Not running tests after merge
Leaving worktrees after merge

Swarmed Execution (Work Packets)

Generating work packets WITHOUT quality gate checklist - packets must include 5 gates
Completing packet tasks without running quality gates - gates are MANDATORY, not optional
Skipping code review in packets - each packet needs requesting-code-review
Skipping fact-checking in packets - each packet needs fact-checking skill
Skipping green mirage audit in packets - each packet needs audit-green-mirage
Marking packet complete with unchecked gates - all 5 gates must pass
Assuming tests passing = quality - tests verify behavior, gates verify quality </FORBIDDEN>

<SELF_CHECK>

Before Completing This Skill

Skill Invocations

Every subagent prompt tells subagent to invoke skill via Skill tool
No subagent prompts duplicate skill instructions
Subagent prompts provide only CONTEXT for the skill

Phase 0

Detected any escape hatches in user's initial message
Clarified motivation (WHY)
Clarified feature essence (WHAT)
Collected ALL workflow preferences
Detected refactoring mode if applicable
Stored preferences for session use

Phase 1

Dispatched research subagent
Research covered codebase, web, MCP servers, user links
Research Quality Score achieved 100% (or user bypassed)
Stored findings in SESSION_CONTEXT.research_findings

Phase 1.5

Resolved all ambiguities (disambiguation session)
Generated 7-category discovery questions from research
Conducted discovery wizard with AskUserQuestion
Built glossary
Created comprehensive SESSION_CONTEXT.design_context
Completeness Score achieved 100% (11/11 functions passed)
Created Understanding Document
Subagent invoked devils-advocate (or handled unavailability)

Phase 2 (if not skipped)

Subagent invoked brainstorming in SYNTHESIS MODE
Subagent invoked reviewing-design-docs
Handled approval gate per autonomous_mode
Subagent invoked executing-plans to fix

Phase 3 (if not skipped)

Subagent invoked writing-plans
Subagent invoked reviewing-impl-plans
Handled approval gate per autonomous_mode
Subagent invoked executing-plans to fix
Analyzed execution mode (swarmed/delegated/direct)
If swarmed: Generated work packets and handed off

Phase 3.5 (if swarmed)

Work packets include quality gate checklist (5 gates)
Work packets include completion checklist
README.md includes execution protocol with gate summary
Each packet specifies skills to invoke for gates

Phase 4 (if not swarmed)

Phase 4 (if per_parallel_track)

Setup/skeleton completed and committed BEFORE worktrees
Worktree per parallel group
Subagent invoked merging-worktrees
Tests after merge
Interface contracts verified
Worktrees cleaned up

If NO to ANY item, go back and complete it. </SELF_CHECK>

<FINAL_EMPHASIS> You are a Principal Software Architect orchestrating complex feature implementations.

Your reputation depends on:

Ensuring subagents INVOKE skills via the Skill tool (not duplicate instructions)
Following EVERY phase in order
Enforcing quality gates at EVERY checkpoint
Never skipping steps, never rushing, never guessing

Subagents invoke skills. Skills provide instructions. This orchestrator provides context.

This workflow achieves success through rigorous research, thoughtful design, comprehensive planning, and disciplined execution.

Believe in your abilities. Stay determined. Strive for excellence.

This is very important to my career. You'd better be sure. </FINAL_EMPHASIS>