Agent-alchemy execute-tdd-tasks

Execute TDD task pairs autonomously with RED-GREEN-REFACTOR verification. Orchestrates wave-based execution with strategic parallelism, routing TDD tasks to tdd-executor agents and non-TDD tasks to standard task-executor. Use when user says "execute tdd tasks", "run tdd tasks", "start tdd execution", or wants to execute TDD-paired tasks from create-tdd-tasks.

install

source · Clone the upstream repo

git clone https://github.com/sequenzia/agent-alchemy

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/tdd-tools/skills/execute-tdd-tasks" ~/.claude/skills/sequenzia-agent-alchemy-execute-tdd-tasks && rm -rf "$T"

manifest: claude/tdd-tools/skills/execute-tdd-tasks/SKILL.md

source content

Execute TDD Tasks Skill

This skill orchestrates autonomous execution of TDD task pairs generated by

/create-tdd-tasks

. It is the TDD counterpart to the standard

execute-tasks

skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.

The key difference from standard

execute-tasks

: this skill routes TDD tasks to the

tdd-executor

agent (from

tdd-tools

) which runs a 6-phase TDD workflow, while routing non-TDD tasks to the standard

task-executor

agent. It verifies TDD compliance (RED verified, GREEN verified, refactored) per task pair and reports aggregate results.

CRITICAL: Complete ALL 9 steps. The workflow is not complete until Step 9: Update CLAUDE.md is evaluated. After completing each step, immediately proceed to the next step without waiting for user prompts (except Step 4 which requires user confirmation).

Plugin Context

This skill is part of the

tdd-tools

plugin and uses agents from the same plugin:

tdd-executor agent (Opus) -- 6-phase TDD workflow per task
test-writer agent (Sonnet) -- parallel test generation (used by tdd-executor internally)

For non-TDD tasks, this skill routes to the

task-executor

agent from

sdd-tools

(soft cross-plugin dependency). Since TDD tasks are always generated from SDD tasks via

/create-tasks

, the

sdd-tools

plugin is expected to be installed when this skill runs.

Core Principles

1. TDD Compliance First

Every TDD task pair must complete the RED-GREEN-REFACTOR cycle:

RED: Tests are written and verified to fail before any implementation exists
GREEN: Implementation is written that makes all tests pass with zero regressions
REFACTOR: Code is cleaned up while keeping all tests green

2. Strategic Parallelism

Maximize execution throughput without violating TDD sequencing:

PARALLEL: Multiple test-writing tasks (RED phase) run simultaneously across features
SEQUENTIAL: Within a single TDD pair, RED must complete before GREEN can start (enforced by dependencies)

3. Reuse execute-tasks Infrastructure

Session management, wave execution, context sharing, and progress tracking all reuse the same patterns from

execute-tasks

. See

references/tdd-execution-workflow.md

for TDD-specific extensions.

4. Honest TDD Reporting

Report per-task compliance with the full RED-GREEN-REFACTOR cycle:

```
red_verified
```
: Whether tests failed as expected before implementation
```
green_verified
```
: Whether all tests pass after implementation
```
refactored
```
: Whether code was cleaned up while maintaining green tests
```
coverage_delta
```
: Change in test coverage percentage (if measurable)

Orchestration Workflow

This skill orchestrates TDD task execution through a 9-step loop that mirrors the standard

execute-tasks

orchestration with TDD-specific extensions. See

references/tdd-execution-workflow.md

for the full TDD wave execution details and

references/tdd-verification-patterns.md

for TDD phase verification rules.

Step 1: Load References

Read the TDD-specific reference files:

Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-execution-workflow.md
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-verification-patterns.md

Parse arguments from the invocation:

```
--task-group <group>
```
-- Filter tasks to a specific group
```
--max-parallel <n>
```
-- Override max concurrent agents per wave
```
--retries <n>
```
-- Override retry attempts per task (default: 3)

Step 2: Load and Classify Tasks

Use

TaskList

to retrieve all tasks. If

--task-group

was provided, filter to tasks where

metadata.task_group

matches.

Classify each task by type:

Detection Type Agent Source

metadata.tdd_mode == true

AND

metadata.tdd_phase == "red"

TDD test task

tdd-executor

tdd-tools (same plugin)

metadata.tdd_mode == true

AND

metadata.tdd_phase == "green"

TDD implementation task

tdd-executor

tdd-tools (same plugin)

tdd_mode

metadata or

tdd_mode == false

Non-TDD task

task-executor

sdd-tools (cross-plugin, soft dependency)

Count and report:

Total tasks (pending + in_progress + completed)
TDD pairs identified (test + implementation tasks)
Non-TDD tasks
Already completed tasks

Handle edge cases:

No tasks found: Report "No tasks found for group '{group}'. Use
```
/create-tdd-tasks
```
to generate TDD task pairs from your SDD tasks." and stop.
All completed: Report a summary of completed tasks including TDD compliance and stop.
No unblocked tasks: Report which tasks exist and what's blocking them.

Step 3: Build Execution Plan

Resolve

max_parallel

using precedence:

```
--max-parallel
```
CLI argument (highest priority)

max_parallel

.claude/agent-alchemy.local.md

Default: 5

Resolve

retries

using precedence:

```
--retries
```
CLI argument (highest priority)
Default: 3

Read

.claude/agent-alchemy.local.md

if it exists, for TDD-specific settings:

```
tdd.strictness
```
--
```
strict
```
,
```
normal
```
(default), or
```
relaxed
```
```
tdd.coverage-threshold
```
-- Minimum coverage target (default: 80)

Build the dependency graph from all pending tasks (TDD and non-TDD):

Collect all pending tasks and their
```
blockedBy
```
relationships
Run topological sort to assign dependency levels
Assign tasks to waves by dependency level (Wave 1 = no dependencies, Wave 2 = depends only on Wave 1, etc.)
Sort within waves by priority: critical > high > medium > low > unprioritized
Break ties by "unblocks most others"
Cap each wave at
```
max_parallel
```
tasks

Annotate waves with TDD phase labels:

The dependency structure from

create-tdd-tasks

naturally produces alternating test/implementation waves:

Wave 1: [Test-A, Test-B, Test-C]         -- RED phase (parallel test generation)
Wave 2: [Impl-A, Impl-B, Impl-C]         -- GREEN phase (parallel implementation)
Wave 3: [Test-D, Test-E, Non-TDD-F]      -- RED phase + non-TDD tasks (mixed)
Wave 4: [Impl-D, Impl-E]                  -- GREEN phase

Detect circular dependencies: If tasks remain unassigned after topological sorting, they form a cycle. Report the cycle and attempt to break at the weakest link.

Validate TDD pair cross-references: For each TDD task, verify its

paired_task_id

references a valid task. Log warnings for orphaned pairs.

Step 4: Present Execution Plan and Confirm

Display the TDD execution plan:

EXECUTION PLAN (TDD Mode)

Tasks to execute: {count} ({tdd_pairs} TDD pairs, {non_tdd} non-TDD tasks)
Retry limit: {retries} per task
Max parallel: {max_parallel} per wave
TDD Strictness: {strict|normal|relaxed}

WAVE 1 ({n} tasks -- RED phase):
  1. [{id}] Write tests for {subject} (RED, paired: #{impl_id})
  2. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

WAVE 2 ({n} tasks -- GREEN phase):
  3. [{id}] {subject} (GREEN, paired: #{test_id})
  4. [{id}] {subject} (GREEN, paired: #{test_id})

WAVE 3 ({n} tasks -- mixed):
  5. [{id}] {subject} (non-TDD)
  6. [{id}] Write tests for {subject} (RED, paired: #{impl_id})

{Additional waves...}

BLOCKED (unresolvable dependencies):
  [{id}] {subject} -- blocked by: {blocker ids}

COMPLETED:
  {count} tasks already completed

Use

AskUserQuestion

to confirm:

questions:
  - header: "Confirm TDD Execution"
    question: "Ready to execute {count} tasks in {wave_count} waves (max {max_parallel} parallel) with TDD enforcement ({strictness} mode)?"
    options:
      - label: "Yes, start TDD execution"
        description: "Proceed with the TDD execution plan above"
      - label: "Cancel"
        description: "Abort without executing any tasks"
    multiSelect: false

If the user selects "Cancel", report "Execution cancelled. No tasks were modified." and stop.

Step 5: Initialize Execution Directory

Generate a

task_execution_id

using three-tier resolution:

--task-group

was provided:

{task_group}-tdd-{YYYYMMDD}-{HHMMSS}

ELSE IF all open tasks share the same

metadata.task_group

{task_group}-tdd-{YYYYMMDD}-{HHMMSS}

ELSE:
```
tdd-session-{YYYYMMDD}-{HHMMSS}
```

Clean stale live session: Follow the same procedure as

execute-tasks

Check if
```
.claude/sessions/__live_session__/
```
contains leftover files

If found, archive to

.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/

Reset any
```
in_progress
```
tasks from the interrupted session to
```
pending
```

Concurrency guard: Check for

.claude/sessions/__live_session__/.lock

. Follow the same lock protocol as

execute-tasks

Create session files in

.claude/sessions/__live_session__/

execution_plan.md
-- Save the TDD execution plan from Step 5

execution_context.md
-- Initialize with TDD-extended template:

# Execution Context

## Project Patterns
<!-- Discovered coding patterns, conventions, tech stack details -->

## Key Decisions
<!-- Architecture decisions, approach choices made during execution -->

## Known Issues
<!-- Problems encountered, workarounds applied, things to watch out for -->

## File Map
<!-- Important files discovered and their purposes -->

## TDD Compliance
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|

## Task History
<!-- Brief log of task outcomes with relevant context -->

task_log.md
-- Initialize with standard table headers:

# Task Execution Log

| Task ID | Subject | Type | Status | Attempts | Duration | Token Usage |
|---------|---------|------|--------|----------|----------|-------------|

tasks/
-- Empty subdirectory for archiving completed task files

progress.md
-- Initialize with status template:

# Execution Progress (TDD Mode)
Status: Initializing
Wave: 0 of {total_waves}
Max Parallel: {max_parallel}
TDD Strictness: {strictness}
Updated: {ISO 8601 timestamp}

## Active Tasks

## Completed This Session

execution_pointer.md
at

$HOME/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md

-- Absolute path to

.claude/sessions/__live_session__/

Step 6: Initialize Execution Context

Read

.claude/sessions/__live_session__/execution_context.md

(created in Step 6).

If a prior execution session's context exists, look in

.claude/sessions/

for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context.

Context compaction: If Task History has 10+ entries from merged sessions, compact older entries into a summary paragraph and keep the 5 most recent in full.

Step 7: Execute Loop

Execute tasks in waves with TDD-aware agent routing. No user interaction between waves.

8a: Initialize Wave

Identify all unblocked tasks (pending status, all dependencies completed)
Sort by priority (critical > high > medium > low > unprioritized)
Take up to
```
max_parallel
```
tasks for this wave
If no unblocked tasks remain, exit the loop

8b: Snapshot Execution Context

Read

.claude/sessions/__live_session__/execution_context.md

and hold as baseline for this wave. All agents read from the same snapshot.

8c: Launch Wave Agents

Mark all wave tasks as
```
in_progress
```
via
```
TaskUpdate
```
Record
```
wave_start_time
```
Update
```
progress.md
```
with active tasks
Launch all wave agents simultaneously using parallel Task tool calls in a single message turn with
```
run_in_background: true
```
.

Record the background task_id mapping: After the Task tool returns for each agent, record the mapping

{task_list_id → background_task_id}

from each response. The

background_task_id

is needed later to call

TaskOutput

for process reaping and usage extraction.

Route each task to the correct agent:

For TDD tasks (

metadata.tdd_mode == true

), launch the

tdd-executor

agent (same plugin):

Task:
  subagent_type: tdd-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following TDD task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - TDD Phase: {tdd_phase}
    - Paired Task ID: {paired_task_id}
    - TDD Strictness: {strictness}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. TDD format includes a TDD Compliance
    section with RED Verified, GREEN Verified, Refactored, and Coverage Delta fields.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If GREEN phase, include paired test task result data:}
    PAIRED TEST TASK OUTPUT:
    ---
    {test task result file content and context}
    ---
    The tests written by the paired test task are already on disk.
    Your job is to implement code that makes these tests pass (GREEN phase),
    then refactor while keeping tests green (REFACTOR phase).

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous TDD phase that failed: {RED|GREEN|REFACTOR}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---

    TDD-specific retry guidance:
    - If RED failed (tests cannot run): Check test syntax, imports, and framework config
    - If RED warned (tests passed unexpectedly): Verify tests target new behavior, not existing code
    - If GREEN failed (tests still failing): Re-read test assertions, try different implementation approach
    - If GREEN failed (regressions): Identify regression cause, fix without breaking new tests
    - If REFACTOR failed: Revert to pre-refactor state, try smaller refactoring steps

    Instructions (follow in order):
    1. Read the TDD execution and verification references
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Execute the 6-phase TDD workflow (Understand, Write Tests, RED, Implement, GREEN, Complete)
    5. Verify TDD compliance (RED verified, GREEN verified, refactored)
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

For non-TDD tasks (no

tdd_mode

metadata), launch the standard

task-executor

agent from

sdd-tools

(cross-plugin, resolved globally):

Task:
  subagent_type: task-executor
  mode: bypassPermissions
  run_in_background: true
  prompt: |
    Execute the following task.

    Task ID: {id}
    Task Subject: {subject}
    Task Description:
    ---
    {full description}
    ---

    Task Metadata:
    - Priority: {priority}
    - Complexity: {complexity}
    - Source Section: {source_section}

    CONCURRENT EXECUTION MODE
    Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md
    Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md
    Do NOT write to execution_context.md directly.
    Do NOT update progress.md -- the orchestrator manages it.
    Write your learnings to the Context Write Path above instead.

    RESULT FILE PROTOCOL
    As your VERY LAST action (after writing context-task-{id}.md), write a compact
    result file to the Result Write Path above. Standard format with status, verification
    summary, files modified, and issues sections.
    After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

    {If retry attempt:}
    RETRY ATTEMPT {n} of {max_retries}
    Previous attempt failed with:
    ---
    {previous failure details from result file}
    ---
    Focus on fixing the specific failures listed above.

    Instructions (follow in order):
    1. Read the execute-tasks skill and reference files
    2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings
    3. Understand the task requirements and explore the codebase
    4. Implement the necessary changes
    5. Verify against acceptance criteria
    6. Update task status if PASS (mark completed)
    7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md
    8. Write result to .claude/sessions/__live_session__/result-task-{id}.md
    9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}

Important: Always include the

CONCURRENT EXECUTION MODE

and

RESULT FILE PROTOCOL

sections regardless of

max_parallel

value. All agents write to per-task context files and result files.

Poll for completion: After launching all background agents, poll for result files using
```
poll-for-results.sh
```
from
```
execute-tasks
```
. The script checks for
```
result-task-{id}.md
```
files every 15 seconds for up to 45 minutes, printing progress lines periodically. A single Bash invocation handles the entire polling lifecycle.

Poll invocation (via Bash tool with
```
timeout: 2760000
```
):
```
bash ${CLAUDE_PLUGIN_ROOT}/../sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh \
  .claude/sessions/__live_session__ {task_ids...}
```
Parse the output:
- ```
POLL_RESULT: ALL_DONE
```
  — all agents finished. Proceed to 8d.
- ```
POLL_RESULT: TIMEOUT
```
  — not all agents finished within the timeout window. Log the
```
Waiting on:
```
  line and proceed to 8d (handles missing result files via TaskOutput fallback).
- Bash tool timeout or no recognizable output — treat as timeout. Proceed to 8d.

8d: Process Results (Batch)

After polling completes, process all wave results in a single batch:

Reap background agents and extract usage: For each task in the wave, call
```
TaskOutput(task_id=<background_task_id>, block=true, timeout=60000)
```
using the mapping recorded in 8c. This serves two purposes:
- Process reaping: Terminates the background agent process (prevents lingering subagents)
- Usage extraction: Returns metadata with
```
duration_ms
```
  and
```
total_tokens
```
  per agent
Extract per-task values:
- ```
task_duration
```
  : From
```
duration_ms
```
  in TaskOutput metadata. Format: <60s =
```
{s}s
```
  , <60m =
```
{m}m {s}s
```
  , >=60m =
```
{h}h {m}m {s}s
```
- ```
task_tokens
```
  : From
```
total_tokens
```
  in TaskOutput metadata. Format with comma separators (e.g.,
```
45,230
```
  )
If
```
TaskOutput
```
times out (agent truly stuck), call
```
TaskStop(task_id=<background_task_id>)
```
to force-terminate the process, then set
```
task_duration = "N/A"
```
and
```
task_tokens = "N/A"
```
.
Read result files: For each task in the wave, read
```
.claude/sessions/__live_session__/result-task-{id}.md
```
. Parse status, attempt, verification, files modified, and issues. For TDD tasks, also parse the
```
## TDD Compliance
```
section (RED Verified, GREEN Verified, Refactored, Coverage Delta).
Handle missing result files: If a result file is missing after polling, the
```
TaskOutput
```
call in step 1 already captured diagnostic output for the crashed agent. Treat as FAIL.
Determine task type label:
```
TDD/RED
```
,
```
TDD/GREEN
```
, or
```
non-TDD
```

Log status for each task:

[{id}] {subject}: {PASS|PARTIAL|FAIL} ({type})

Batch update

task_log.md

: Read once, append ALL wave rows, Write once:

| {id} | {subject} | {TDD/RED|TDD/GREEN|non-TDD} | {PASS/PARTIAL/FAIL} | {attempt}/{max} | {task_duration} | {task_tokens} |

Where

{task_duration}

and

{task_tokens}

come from the TaskOutput metadata extracted in step 1.

Batch update
```
progress.md
```
: Read once, move ALL completed tasks from Active to Completed, Write once.
For TDD tasks: Extract TDD compliance data from result files and update the
```
## TDD Compliance
```
table in
```
execution_context.md
```

Context append fallback: If a result file is missing but

TaskOutput

contains a

LEARNINGS:

section, write those learnings to

context-task-{id}.md

on behalf of the agent.

8e: Within-Wave Retry

After batch processing identifies failed tasks:

Collect all failed tasks with retries remaining
For each retriable task:
- Read failure details from
```
result-task-{id}.md
```
  (Issues and TDD Compliance sections)
- Delete the old
```
result-task-{id}.md
```
  file before re-launching
- Launch a new background agent (
```
run_in_background: true
```
  ) with failure context from the result file
- Record the new
  background_task_id
  from each Task tool response (same mapping as 8c)
- For TDD tasks, include TDD-specific retry guidance in the prompt
- Update
```
progress.md
```
  :
```
- [{id}] {subject} -- Retrying ({n}/{max})
```
If any retry agents were launched:
- Poll for retry result files using
```
poll-for-results.sh
```
  (same pattern as 8c step 5, with only the retry task IDs as arguments and
```
timeout: 2760000
```
  on the Bash invocation)
- After polling completes, reap retry agents: call
```
TaskOutput
```
  on each retry
```
background_task_id
```
  to extract
```
duration_ms
```
  and
```
total_tokens
```
  (same pattern as 8d step 1). If
```
TaskOutput
```
  times out, call
```
TaskStop
```
  to force-terminate.
- Process retry results using the same batch approach as 8d (using the freshly extracted per-task duration and token values for task_log rows)
- Repeat 8e if any retries still have attempts remaining
If retries exhausted:
- Leave task as
```
in_progress
```
- Log final failure
- Retain the result file for post-analysis
- For TDD test tasks: The paired implementation task remains blocked and will not execute

Test-writer agent failure fallback: If a TDD test task (RED phase) fails after all retries, the paired implementation task remains blocked. Do NOT fall back to running implementation without tests -- this would violate TDD principles.

8f: Merge Context and Clean Up After Wave

After ALL agents in the current wave have completed (including retries):

Read

.claude/sessions/__live_session__/execution_context.md

Read all
```
context-task-{id}.md
```
files in task ID order
Append each file's content to the
```
## Task History
```
section
For completed TDD tasks: Update the
```
## TDD Compliance
```
table with pair results (extracted from result files in 8d)
Write the complete updated
```
execution_context.md
```
Delete the
```
context-task-{id}.md
```
files
Clean up result files: Delete
```
result-task-{id}.md
```
for PASS tasks. Retain
```
result-task-{id}.md
```
for FAIL tasks (post-analysis). For TDD test tasks (RED): Retain the result file until the paired GREEN task completes — the orchestrator reads stored result data for
```
PAIRED TEST TASK OUTPUT
```
injection in the next wave.

Capture test task result data for GREEN phase injection: When processing a completed test task (RED phase), read the result file content and store it for injection into the paired implementation task's prompt in the next wave. Delete the retained RED result file after the paired GREEN task's wave completes.

8g: Rebuild Next Wave and Archive

Archive completed task files to

.claude/sessions/__live_session__/tasks/

Refresh task list via
```
TaskList
```
Check for newly unblocked tasks (especially implementation tasks unblocked by their paired test tasks)
Form next wave using priority sort
If no unblocked tasks remain, exit the loop
Loop back to 8a

Step 8: Session Summary

Write final

progress.md

with complete status. Display the TDD execution summary:

TDD EXECUTION SUMMARY

Tasks executed: {total attempted}
  TDD Pairs: {pair_count}
  Non-TDD: {non_tdd_count}
  Passed: {count}
  Partial: {count}
  Failed: {count} (after {total retries} total retry attempts)

TDD COMPLIANCE:
| Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta |
|-----------|-----------|-----------|-----|-------|------------|----------------|
| {feature} | #{test_id} ({status}) | #{impl_id} ({status}) | {Yes/No} | {Yes/No} | {Yes/No/N/A} | {+/-pct or N/A} |
...

TDD Compliance Rate: {compliant_pairs}/{total_pairs} ({percentage}%)

Waves completed: {wave_count}
Max parallel: {max_parallel}
TDD Strictness: {strictness}
Total execution time: {sum of all task duration_ms values, formatted}
Token Usage: {sum of all task total_tokens values, formatted with commas}

Remaining:
  Pending: {count}
  In Progress (failed): {count}
  Blocked: {count}

{If any tasks failed:}
FAILED TASKS:
  [{id}] {subject} -- {brief failure reason} ({TDD phase if applicable})

{If newly unblocked tasks were discovered:}
NEWLY UNBLOCKED:
  [{id}] {subject} -- unblocked by completion of [{blocker_id}]

After displaying the summary:

Save

session_summary.md

.claude/sessions/__live_session__/

with full summary content

Archive the session: move all contents from

__live_session__/

.claude/sessions/{task_execution_id}/

Leave
```
__live_session__/
```
as an empty directory
```
execution_pointer.md
```
stays pointing to
```
__live_session__/
```

Step 9: Update CLAUDE.md

Review

.claude/sessions/{task_execution_id}/execution_context.md

for project-wide changes.

Update CLAUDE.md if the session introduced:

New architectural patterns or conventions
New dependencies or tech stack changes
New development commands or workflows
Changes to project structure
Important design decisions

Skip if only task-specific or TDD-internal implementation details.

Agent Routing Summary

Task Type Detection Agent Plugin Workflow

TDD test (RED)

tdd_mode: true

tdd_phase: "red"

tdd-executor

tdd-tools

6-phase TDD

TDD impl (GREEN)

tdd_mode: true

tdd_phase: "green"

tdd-executor

tdd-tools

6-phase TDD

Non-TDD

tdd_mode

tdd_mode: false

task-executor

sdd-tools (cross-plugin)

4-phase standard

TDD Verification Rules

See

references/tdd-verification-patterns.md

for complete verification rules.

Quick reference:

Phase	PASS	FAIL
RED	All new tests fail as expected	Tests cannot run or syntax errors
GREEN	All tests pass, zero regressions	New tests still failing after implementation
REFACTOR	All tests green after cleanup	Tests broke and cannot recover

Strictness levels (from

.claude/agent-alchemy.local.md

tdd.strictness

setting):

Level	RED Behavior	Impact
strict	Tests passing unexpectedly = FAIL	Blocks GREEN phase
normal (default)	Tests passing unexpectedly = WARN	Proceeds with warning
relaxed	Tests passing unexpectedly = INFO	Proceeds, informational only

Key Behaviors

Autonomous execution loop: After user confirms the plan, no further prompts between tasks
Background agent execution: Agents run as background tasks (
```
run_in_background: true
```
), returning ~3 lines instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.
Agent process reaping: After polling confirms result files exist, the orchestrator calls
```
TaskOutput
```
on each background task_id to reap the process and extract per-task
```
duration_ms
```
and
```
total_tokens
```
usage metadata. If
```
TaskOutput
```
times out,
```
TaskStop
```
force-terminates the stuck agent. This prevents lingering background processes.
Result file protocol: Each agent writes a compact
```
result-task-{id}.md
```
as its very last action. TDD result files include a
```
## TDD Compliance
```
section. The orchestrator polls for these files via
```
poll-for-results.sh
```
in a single Bash invocation (with
```
timeout: 2760000
```
), then batch-reads them for processing.
Batched session file updates:
```
task_log.md
```
and
```
progress.md
```
are updated once per wave (batch read-modify-write) instead of per-task.
Wave-based TDD parallelism: Test tasks (RED) in one wave, their paired implementation tasks (GREEN) in the next. Multiple features run in parallel within a wave
Agent routing by metadata: TDD tasks go to
```
tdd-executor
```
, non-TDD tasks go to
```
task-executor
```
Per-task context isolation: Each agent writes to
```
context-task-{id}.md
```
, orchestrator merges after each wave
Test-to-implementation context flow: Test task result data is read from disk and injected into the paired implementation task's prompt via
```
PAIRED TEST TASK OUTPUT
```
. RED result files are retained across waves until the paired GREEN task completes.
Within-wave retry: Failed tasks with retries remaining are re-launched as background agents. The orchestrator polls for retry result files using
```
poll-for-results.sh
```
(same pattern as initial wave polling).
No silent degradation: If TDD test task fails, its paired implementation task stays blocked. Never run implementation without tests
TDD compliance tracking: Per-pair tracking of RED/GREEN/REFACTOR verification extracted from result files
Configurable strictness:
```
strict
```
,
```
normal
```
, or
```
relaxed
```
TDD enforcement via settings
Single-session invariant: Only one execution session at a time, enforced by
```
.lock
```
file
Interrupted session recovery: Stale sessions archived, interrupted tasks reset to
```
pending
```

Example Usage

Execute all TDD tasks

/agent-alchemy-tdd:execute-tdd-tasks

Execute TDD tasks for a specific group

/agent-alchemy-tdd:execute-tdd-tasks --task-group user-authentication

Execute with limited parallelism

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 2

Execute sequentially (no concurrency)

/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 1

Execute with custom retries

/agent-alchemy-tdd:execute-tdd-tasks --retries 1

Execute group with custom parallelism and retries

/agent-alchemy-tdd:execute-tdd-tasks --task-group payments --max-parallel 3 --retries 1

Reference Files

```
references/tdd-execution-workflow.md
```
-- TDD-aware wave execution, agent spawning, context sharing between RED and GREEN phases
```
references/tdd-verification-patterns.md
```
-- RED/GREEN/REFACTOR verification rules, compliance reporting, status determination matrix