Agent-alchemy execute-tdd-tasks

Execute TDD task pairs autonomously with RED-GREEN-REFACTOR verification. Orchestrates wave-based execution with strategic parallelism, routing TDD tasks to tdd-executor agents and non-TDD tasks to standard task-executor. Use when user says "execute tdd tasks", "run tdd tasks", "start tdd execution", or wants to execute TDD-paired tasks from create-tdd-tasks.

install
source · Clone the upstream repo
git clone https://github.com/sequenzia/agent-alchemy
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/tdd-tools/skills/execute-tdd-tasks" ~/.claude/skills/sequenzia-agent-alchemy-execute-tdd-tasks && rm -rf "$T"
manifest: claude/tdd-tools/skills/execute-tdd-tasks/SKILL.md
source content

Execute TDD Tasks Skill

This skill orchestrates autonomous execution of TDD task pairs generated by

/create-tdd-tasks
. It is the TDD counterpart to the standard
execute-tasks
skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.

The key difference from standard

execute-tasks
: this skill routes TDD tasks to the
tdd-executor
agent (from
tdd-tools
) which runs a 6-phase TDD workflow, while routing non-TDD tasks to the standard
task-executor
agent. It verifies TDD compliance (RED verified, GREEN verified, refactored) per task pair and reports aggregate results.

CRITICAL: Complete ALL 9 steps. The workflow is not complete until Step 9: Update CLAUDE.md is evaluated. After completing each step, immediately proceed to the next step without waiting for user prompts (except Step 4 which requires user confirmation).

Plugin Context

This skill is part of the

tdd-tools
plugin and uses agents from the same plugin:

  • tdd-executor agent (Opus) -- 6-phase TDD workflow per task
  • test-writer agent (Sonnet) -- parallel test generation (used by tdd-executor internally)

For non-TDD tasks, this skill routes to the

task-executor
agent from
sdd-tools
(soft cross-plugin dependency). Since TDD tasks are always generated from SDD tasks via
/create-tasks
, the
sdd-tools
plugin is expected to be installed when this skill runs.

Core Principles

1. TDD Compliance First

Every TDD task pair must complete the RED-GREEN-REFACTOR cycle:

  • RED: Tests are written and verified to fail before any implementation exists
  • GREEN: Implementation is written that makes all tests pass with zero regressions
  • REFACTOR: Code is cleaned up while keeping all tests green

2. Strategic Parallelism

Maximize execution throughput without violating TDD sequencing:

  • PARALLEL: Multiple test-writing tasks (RED phase) run simultaneously across features
  • SEQUENTIAL: Within a single TDD pair, RED must complete before GREEN can start (enforced by dependencies)

3. Reuse execute-tasks Infrastructure

Session management, wave execution, context sharing, and progress tracking all reuse the same patterns from

execute-tasks
. See
references/tdd-execution-workflow.md
for TDD-specific extensions.

4. Honest TDD Reporting

Report per-task compliance with the full RED-GREEN-REFACTOR cycle:

  • red_verified
    : Whether tests failed as expected before implementation
  • green_verified
    : Whether all tests pass after implementation
  • refactored
    : Whether code was cleaned up while maintaining green tests
  • coverage_delta
    : Change in test coverage percentage (if measurable)

Orchestration Workflow

This skill orchestrates TDD task execution through a 9-step loop that mirrors the standard

execute-tasks
orchestration with TDD-specific extensions. See
references/tdd-execution-workflow.md
for the full TDD wave execution details and
references/tdd-verification-patterns.md
for TDD phase verification rules.

Step 1: Load References

Read the TDD-specific reference files:

Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-execution-workflow.md
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-verification-patterns.md

Parse arguments from the invocation:

  • --task-group <group>
    -- Filter tasks to a specific group
  • --max-parallel <n>
    -- Override max concurrent agents per wave
  • --retries <n>
    -- Override retry attempts per task (default: 3)

Step 2: Load and Classify Tasks

Use

TaskList
to retrieve all tasks. If
--task-group
was provided, filter to tasks where
metadata.task_group
matches.

Classify each task by type:

DetectionTypeAgentSource
metadata.tdd_mode == true
AND
metadata.tdd_phase == "red"
TDD test task
tdd-executor
tdd-tools (same plugin)
metadata.tdd_mode == true
AND
metadata.tdd_phase == "green"
TDD implementation task
tdd-executor
tdd-tools (same plugin)
No
tdd_mode
metadata or
tdd_mode == false
Non-TDD task
task-executor
sdd-tools (cross-plugin, soft dependency)

Count and report:

  • Total tasks (pending + in_progress + completed)
  • TDD pairs identified (test + implementation tasks)
  • Non-TDD tasks
  • Already completed tasks

Handle edge cases:

  • No tasks found: Report "No tasks found for group '{group}'. Use
    /create-tdd-tasks
    to generate TDD task pairs from your SDD tasks." and stop.
  • All completed: Report a summary of completed tasks including TDD compliance and stop.
  • No unblocked tasks: Report which tasks exist and what's blocking them.

Step 3: Build Execution Plan

Resolve

max_parallel
using precedence:

  1. --max-parallel
    CLI argument (highest priority)
  2. max_parallel
    in
    .claude/agent-alchemy.local.md
  3. Default: 5

Resolve

retries
using precedence:

  1. --retries
    CLI argument (highest priority)
  2. Default: 3

Read

.claude/agent-alchemy.local.md
if it exists, for TDD-specific settings:

  • tdd.strictness
    --
    strict
    ,
    normal
    (default), or
    relaxed
  • tdd.coverage-threshold
    -- Minimum coverage target (default: 80)

Build the dependency graph from all pending tasks (TDD and non-TDD):

  1. Collect all pending tasks and their
    blockedBy
    relationships
  2. Run topological sort to assign dependency levels
  3. Assign tasks to waves by dependency level (Wave 1 = no dependencies, Wave 2 = depends only on Wave 1, etc.)
  4. Sort within waves by priority: critical > high > medium > low > unprioritized
  5. Break ties by "unblocks most others"
  6. Cap each wave at
    max_parallel
    tasks

Annotate waves with TDD phase labels:

The dependency structure from

create-tdd-tasks
naturally produces alternating test/implementation waves:

Wave 1: [Test-A, Test-B, Test-C]         -- RED phase (parallel test generation)
Wave 2: [Impl-A, Impl-B, Impl-C]         -- GREEN phase (parallel implementation)
Wave 3: [Test-D, Test-E, Non-TDD-F]      -- RED phase + non-TDD tasks (mixed)
Wave 4: [Impl-D, Impl-E]                  -- GREEN phase

Detect circular dependencies: If tasks remain unassigned after topological sorting, they form a cycle. Report the cycle and attempt to break at the weakest link.

Validate TDD pair cross-references: For each TDD task, verify its

paired_task_id
references a valid task. Log warnings for orphaned pairs.

Step 4: Present Execution Plan and Confirm

Display the TDD execution plan:

EXECUTION PLAN (TDD Mode)

Tasks to execute: {count} ({tdd_pairs} TDD pairs, {non_tdd} non-TDD tasks)
Retry limit: {retries} per task
Max parallel: {max_parallel} per wave
TDD Strictness: {strict|normal|relaxed}

WAVE 1 ({n} tasks -- RED phase):
  1. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
  2. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

WAVE 2 ({n} tasks -- GREEN phase):
  3. [{id}] {subject} (GREEN, paired: #{test_id})
  4. [{id}] {subject} (GREEN, paired: #{test_id})

WAVE 3 ({n} tasks -- mixed):
  5. [{id}] {subject} (non-TDD)
  6. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

{Additional waves...}

BLOCKED (unresolvable dependencies):
  [{id}] {subject} -- blocked by: {blocker ids}

COMPLETED:
  {count} tasks already completed

Use

AskUserQuestion
to confirm:

questions:
  - header: "Confirm TDD Execution"
    question: "Ready to execute {count} tasks in {wave_count} waves (max {max_parallel} parallel) with TDD enforcement ({strictness} mode)?"
    options:
      - label: "Yes, start TDD execution"
        description: "Proceed with the TDD execution plan above"
      - label: "Cancel"
        description: "Abort without executing any tasks"
    multiSelect: false

If the user selects "Cancel", report "Execution cancelled. No tasks were modified." and stop.

Step 5: Initialize Execution Directory

Generate a

task_execution_id
using three-tier resolution:

  1. IF
    --task-group
    was provided:
    {task_group}-tdd-{YYYYMMDD}-{HHMMSS}
  2. ELSE IF all open tasks share the same
    metadata.task_group
    :
    {task_group}-tdd-{YYYYMMDD}-{HHMMSS}
  3. ELSE:
    tdd-session-{YYYYMMDD}-{HHMMSS}

Clean stale live session: Follow the same procedure as

execute-tasks
:

  1. Check if
    .claude/sessions/__live_session__/
    contains leftover files
  2. If found, archive to
    .claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/
  3. Reset any
    in_progress
    tasks from the interrupted session to
    pending

Concurrency guard: Check for

.claude/sessions/__live_session__/.lock
. Follow the same lock protocol as
execute-tasks
.

Create session files in

.claude/sessions/__live_session__/
:

  1. execution_plan.md
    -- Save the TDD execution plan from Step 5
  2. execution_context.md
    -- Initialize with TDD-extended template:
    # Execution Context
    
    ## Project Patterns
    <!-- Discovered coding patterns, conventions, tech stack details -->
    
    ## Key Decisions
    <!-- Architecture decisions, approach choices made during execution -->
    
    ## Known Issues
    <!-- Problems encountered, workarounds applied, things to watch out for -->
    
    ## File Map
    <!-- Important files discovered and their purposes -->
    
    ## TDD Compliance
    | Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
    |-----------|-----------|-----------|-----|-------|------------|----------------|
    
    ## Task History
    <!-- Brief log of task outcomes with relevant context -->
    
  3. task_log.md
    -- Initialize with standard table headers:
    # Task Execution Log
    
    | Task ID | Subject | Type | Status | Attempts | Duration | Token Usage |
    |---------|---------|------|--------|----------|----------|-------------|
    
  4. tasks/
    -- Empty subdirectory for archiving completed task files
  5. progress.md
    -- Initialize with status template:
    # Execution Progress (TDD Mode)
    Status: Initializing
    Wave: 0 of {total_waves}
    Max Parallel: {max_parallel}
    TDD Strictness: {strictness}
    Updated: {ISO 8601 timestamp}
    
    ## Active Tasks
    
    ## Completed This Session
    
  6. execution_pointer.md
    at
    $HOME/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md
    -- Absolute path to
    .claude/sessions/__live_session__/

Step 6: Initialize Execution Context

Read

.claude/sessions/__live_session__/execution_context.md
(created in Step 6).

If a prior execution session's context exists, look in

.claude/sessions/
for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context.

Context compaction: If Task History has 10+ entries from merged sessions, compact older entries into a summary paragraph and keep the 5 most recent in full.

Step 7: Execute Loop

Execute tasks in waves with TDD-aware agent routing. No user interaction between waves.

8a: Initialize Wave

  1. Identify all unblocked tasks (pending status, all dependencies completed)
  2. Sort by priority (critical > high > medium > low > unprioritized)
  3. Take up to
    max_parallel
    tasks for this wave
  4. If no unblocked tasks remain, exit the loop

8b: Snapshot Execution Context

Read

.claude/sessions/__live_session__/execution_context.md
and hold as baseline for this wave. All agents read from the same snapshot.

8c: Launch Wave Agents

  1. Mark all wave tasks as
    in_progress
    via
    TaskUpdate
  2. Record
    wave_start_time
  3. Update
    progress.md
    with active tasks
  4. Launch all wave agents simultaneously using parallel Task tool calls in a single message turn with
    run_in_background: true
    .

Record the background task_id mapping: After the Task tool returns for each agent, record the mapping

{task_list_id → background_task_id}
from each response. The
background_task_id
is needed later to call
TaskOutput
for process reaping and usage extraction.

Route each task to the correct agent:

For TDD tasks (

metadata.tdd_mode == true
), launch the
tdd-executor
agent (same plugin):

Task:
  subagent_type: tdd-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following TDD task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - TDD Phase: {tdd_phase}
    - Paired Task ID: {paired_task_id}
    - TDD Strictness: {strictness}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. TDD format includes a TDD Compliance
    section with RED Verified, GREEN Verified, Refactored, and Coverage Delta fields.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If GREEN phase, include paired test task result data:}
    PAIRED TEST TASK OUTPUT:
    ---
    {test task result file content and context}
    ---
    The tests written by the paired test task are already on disk.
    Your job is to implement code that makes these tests pass (GREEN phase),
    then refactor while keeping tests green (REFACTOR phase).

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous TDD phase that failed: {RED|GREEN|REFACTOR}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---

    TDD-specific retry guidance:
    - If RED failed (tests cannot run): Check test syntax, imports, and framework config
    - If RED warned (tests passed unexpectedly): Verify tests target new behavior, not existing code
    - If GREEN failed (tests still failing): Re-read test assertions, try different implementation approach
    - If GREEN failed (regressions): Identify regression cause, fix without breaking new tests
    - If REFACTOR failed: Revert to pre-refactor state, try smaller refactoring steps

    Instructions (follow in order):
    1. Read the TDD execution and verification references
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Execute the 6-phase TDD workflow (Understand, Write Tests, RED, Implement, GREEN, Complete)
    5. Verify TDD compliance (RED verified, GREEN verified, refactored)
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

For non-TDD tasks (no

tdd_mode
metadata), launch the standard
task-executor
agent from
sdd-tools
(cross-plugin, resolved globally):

Task:
  subagent_type: task-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - Source Section: {source_section}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. Standard format with status, verification
    summary, files modified, and issues sections.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---
    Focus on fixing the specific failures listed above.

    Instructions (follow in order):
    1. Read the execute-tasks skill and reference files
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Implement the necessary changes
    5. Verify against acceptance criteria
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

Important: Always include the

CONCURRENT EXECUTION MODE
and
RESULT FILE PROTOCOL
sections regardless of
max_parallel
value. All agents write to per-task context files and result files.

  1. Poll for completion: After launching all background agents, poll for result files using

    poll-for-results.sh
    from
    execute-tasks
    . The script checks for
    result-task-{id}.md
    files every 15 seconds for up to 45 minutes, printing progress lines periodically. A single Bash invocation handles the entire polling lifecycle.

    Poll invocation (via Bash tool with

    timeout: 2760000
    ):

    bash ${CLAUDE_PLUGIN_ROOT}/../sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh \
      .claude/sessions/__live_session__ {task_ids...}
    

    Parse the output:

    • POLL_RESULT: ALL_DONE
      — all agents finished. Proceed to 8d.
    • POLL_RESULT: TIMEOUT
      — not all agents finished within the timeout window. Log the
      Waiting on:
      line and proceed to 8d (handles missing result files via TaskOutput fallback).
    • Bash tool timeout or no recognizable output — treat as timeout. Proceed to 8d.

8d: Process Results (Batch)

After polling completes, process all wave results in a single batch:

  1. Reap background agents and extract usage: For each task in the wave, call

    TaskOutput(task_id=<background_task_id>, block=true, timeout=60000)
    using the mapping recorded in 8c. This serves two purposes:

    • Process reaping: Terminates the background agent process (prevents lingering subagents)
    • Usage extraction: Returns metadata with
      duration_ms
      and
      total_tokens
      per agent

    Extract per-task values:

    • task_duration
      : From
      duration_ms
      in TaskOutput metadata. Format: <60s =
      {s}s
      , <60m =
      {m}m {s}s
      , >=60m =
      {h}h {m}m {s}s
    • task_tokens
      : From
      total_tokens
      in TaskOutput metadata. Format with comma separators (e.g.,
      45,230
      )

    If

    TaskOutput
    times out (agent truly stuck), call
    TaskStop(task_id=<background_task_id>)
    to force-terminate the process, then set
    task_duration = "N/A"
    and
    task_tokens = "N/A"
    .

  2. Read result files: For each task in the wave, read

    .claude/sessions/__live_session__/result-task-{id}.md
    . Parse status, attempt, verification, files modified, and issues. For TDD tasks, also parse the
    ## TDD Compliance
    section (RED Verified, GREEN Verified, Refactored, Coverage Delta).

  3. Handle missing result files: If a result file is missing after polling, the

    TaskOutput
    call in step 1 already captured diagnostic output for the crashed agent. Treat as FAIL.

  4. Determine task type label:

    TDD/RED
    ,
    TDD/GREEN
    , or
    non-TDD

  5. Log status for each task:

    [{id}] {subject}: {PASS|PARTIAL|FAIL} ({type})

  6. Batch update

    task_log.md
    : Read once, append ALL wave rows, Write once:

    | {id} | {subject} | {TDD/RED|TDD/GREEN|non-TDD} | {PASS/PARTIAL/FAIL} | {attempt}/{max} | {task_duration} | {task_tokens} |
    

    Where

    {task_duration}
    and
    {task_tokens}
    come from the TaskOutput metadata extracted in step 1.

  7. Batch update

    progress.md
    : Read once, move ALL completed tasks from Active to Completed, Write once.

  8. For TDD tasks: Extract TDD compliance data from result files and update the

    ## TDD Compliance
    table in
    execution_context.md

Context append fallback: If a result file is missing but

TaskOutput
contains a
LEARNINGS:
section, write those learnings to
context-task-{id}.md
on behalf of the agent.

8e: Within-Wave Retry

After batch processing identifies failed tasks:

  1. Collect all failed tasks with retries remaining
  2. For each retriable task:
    • Read failure details from
      result-task-{id}.md
      (Issues and TDD Compliance sections)
    • Delete the old
      result-task-{id}.md
      file before re-launching
    • Launch a new background agent (
      run_in_background: true
      ) with failure context from the result file
    • Record the new
      background_task_id
      from each Task tool response (same mapping as 8c)
    • For TDD tasks, include TDD-specific retry guidance in the prompt
    • Update
      progress.md
      :
      - [{id}] {subject} -- Retrying ({n}/{max})
  3. If any retry agents were launched:
    • Poll for retry result files using
      poll-for-results.sh
      (same pattern as 8c step 5, with only the retry task IDs as arguments and
      timeout: 2760000
      on the Bash invocation)
    • After polling completes, reap retry agents: call
      TaskOutput
      on each retry
      background_task_id
      to extract
      duration_ms
      and
      total_tokens
      (same pattern as 8d step 1). If
      TaskOutput
      times out, call
      TaskStop
      to force-terminate.
    • Process retry results using the same batch approach as 8d (using the freshly extracted per-task duration and token values for task_log rows)
    • Repeat 8e if any retries still have attempts remaining
  4. If retries exhausted:
    • Leave task as
      in_progress
    • Log final failure
    • Retain the result file for post-analysis
    • For TDD test tasks: The paired implementation task remains blocked and will not execute

Test-writer agent failure fallback: If a TDD test task (RED phase) fails after all retries, the paired implementation task remains blocked. Do NOT fall back to running implementation without tests -- this would violate TDD principles.

8f: Merge Context and Clean Up After Wave

After ALL agents in the current wave have completed (including retries):

  1. Read
    .claude/sessions/__live_session__/execution_context.md
  2. Read all
    context-task-{id}.md
    files in task ID order
  3. Append each file's content to the
    ## Task History
    section
  4. For completed TDD tasks: Update the
    ## TDD Compliance
    table with pair results (extracted from result files in 8d)
  5. Write the complete updated
    execution_context.md
  6. Delete the
    context-task-{id}.md
    files
  7. Clean up result files: Delete
    result-task-{id}.md
    for PASS tasks. Retain
    result-task-{id}.md
    for FAIL tasks (post-analysis). For TDD test tasks (RED): Retain the result file until the paired GREEN task completes — the orchestrator reads stored result data for
    PAIRED TEST TASK OUTPUT
    injection in the next wave.

Capture test task result data for GREEN phase injection: When processing a completed test task (RED phase), read the result file content and store it for injection into the paired implementation task's prompt in the next wave. Delete the retained RED result file after the paired GREEN task's wave completes.

8g: Rebuild Next Wave and Archive

  1. Archive completed task files to
    .claude/sessions/__live_session__/tasks/
  2. Refresh task list via
    TaskList
  3. Check for newly unblocked tasks (especially implementation tasks unblocked by their paired test tasks)
  4. Form next wave using priority sort
  5. If no unblocked tasks remain, exit the loop
  6. Loop back to 8a

Step 8: Session Summary

Write final

progress.md
with complete status. Display the TDD execution summary:

TDD EXECUTION SUMMARY

Tasks executed: {total attempted}
  TDD Pairs: {pair_count}
  Non-TDD: {non_tdd_count}
  Passed: {count}
  Partial: {count}
  Failed: {count} (after {total retries} total retry attempts)

TDD COMPLIANCE:
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|
| {feature} | #{test_id} ({status}) | #{impl_id} ({status}) | {Yes/No} | {Yes/No} | {Yes/No/N/A} | {+/-pct or N/A} |
...

TDD Compliance Rate: {compliant_pairs}/{total_pairs} ({percentage}%)

Waves completed: {wave_count}
Max parallel: {max_parallel}
TDD Strictness: {strictness}
Total execution time: {sum of all task duration_ms values, formatted}
Token Usage: {sum of all task total_tokens values, formatted with commas}

Remaining:
  Pending: {count}
  In Progress (failed): {count}
  Blocked: {count}

{If any tasks failed:}
FAILED TASKS:
  [{id}] {subject} -- {brief failure reason} ({TDD phase if applicable})

{If newly unblocked tasks were discovered:}
NEWLY UNBLOCKED:
  [{id}] {subject} -- unblocked by completion of [{blocker_id}]

After displaying the summary:

  1. Save
    session_summary.md
    to
    .claude/sessions/__live_session__/
    with full summary content
  2. Archive the session: move all contents from
    __live_session__/
    to
    .claude/sessions/{task_execution_id}/
  3. Leave
    __live_session__/
    as an empty directory
  4. execution_pointer.md
    stays pointing to
    __live_session__/

Step 9: Update CLAUDE.md

Review

.claude/sessions/{task_execution_id}/execution_context.md
for project-wide changes.

Update CLAUDE.md if the session introduced:

  • New architectural patterns or conventions
  • New dependencies or tech stack changes
  • New development commands or workflows
  • Changes to project structure
  • Important design decisions

Skip if only task-specific or TDD-internal implementation details.

Agent Routing Summary

Task TypeDetectionAgentPluginWorkflow
TDD test (RED)
tdd_mode: true
,
tdd_phase: "red"
tdd-executor
tdd-tools6-phase TDD
TDD impl (GREEN)
tdd_mode: true
,
tdd_phase: "green"
tdd-executor
tdd-tools6-phase TDD
Non-TDDNo
tdd_mode
or
tdd_mode: false
task-executor
sdd-tools (cross-plugin)4-phase standard

TDD Verification Rules

See

references/tdd-verification-patterns.md
for complete verification rules.

Quick reference:

PhasePASSFAIL
REDAll new tests fail as expectedTests cannot run or syntax errors
GREENAll tests pass, zero regressionsNew tests still failing after implementation
REFACTORAll tests green after cleanupTests broke and cannot recover

Strictness levels (from

.claude/agent-alchemy.local.md
tdd.strictness
setting):

LevelRED BehaviorImpact
strictTests passing unexpectedly = FAILBlocks GREEN phase
normal (default)Tests passing unexpectedly = WARNProceeds with warning
relaxedTests passing unexpectedly = INFOProceeds, informational only

Key Behaviors

  • Autonomous execution loop: After user confirms the plan, no further prompts between tasks
  • Background agent execution: Agents run as background tasks (
    run_in_background: true
    ), returning ~3 lines instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.
  • Agent process reaping: After polling confirms result files exist, the orchestrator calls
    TaskOutput
    on each background task_id to reap the process and extract per-task
    duration_ms
    and
    total_tokens
    usage metadata. If
    TaskOutput
    times out,
    TaskStop
    force-terminates the stuck agent. This prevents lingering background processes.
  • Result file protocol: Each agent writes a compact
    result-task-{id}.md
    as its very last action. TDD result files include a
    ## TDD Compliance
    section. The orchestrator polls for these files via
    poll-for-results.sh
    in a single Bash invocation (with
    timeout: 2760000
    ), then batch-reads them for processing.
  • Batched session file updates:
    task_log.md
    and
    progress.md
    are updated once per wave (batch read-modify-write) instead of per-task.
  • Wave-based TDD parallelism: Test tasks (RED) in one wave, their paired implementation tasks (GREEN) in the next. Multiple features run in parallel within a wave
  • Agent routing by metadata: TDD tasks go to
    tdd-executor
    , non-TDD tasks go to
    task-executor
  • Per-task context isolation: Each agent writes to
    context-task-{id}.md
    , orchestrator merges after each wave
  • Test-to-implementation context flow: Test task result data is read from disk and injected into the paired implementation task's prompt via
    PAIRED TEST TASK OUTPUT
    . RED result files are retained across waves until the paired GREEN task completes.
  • Within-wave retry: Failed tasks with retries remaining are re-launched as background agents. The orchestrator polls for retry result files using
    poll-for-results.sh
    (same pattern as initial wave polling).
  • No silent degradation: If TDD test task fails, its paired implementation task stays blocked. Never run implementation without tests
  • TDD compliance tracking: Per-pair tracking of RED/GREEN/REFACTOR verification extracted from result files
  • Configurable strictness:
    strict
    ,
    normal
    , or
    relaxed
    TDD enforcement via settings
  • Single-session invariant: Only one execution session at a time, enforced by
    .lock
    file
  • Interrupted session recovery: Stale sessions archived, interrupted tasks reset to
    pending

Example Usage

Execute all TDD tasks

/agent-alchemy-tdd:execute-tdd-tasks

Execute TDD tasks for a specific group

/agent-alchemy-tdd:execute-tdd-tasks --task-group user-authentication

Execute with limited parallelism

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 2

Execute sequentially (no concurrency)

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 1

Execute with custom retries

/agent-alchemy-tdd:execute-tdd-tasks --retries 1

Execute group with custom parallelism and retries

/agent-alchemy-tdd:execute-tdd-tasks --task-group payments --max-parallel 3 --retries 1

Reference Files

  • references/tdd-execution-workflow.md
    -- TDD-aware wave execution, agent spawning, context sharing between RED and GREEN phases
  • references/tdd-verification-patterns.md
    -- RED/GREEN/REFACTOR verification rules, compliance reporting, status determination matrix