Agent-alchemy execute-tdd-tasks
Execute TDD task pairs autonomously with RED-GREEN-REFACTOR verification. Orchestrates wave-based execution with strategic parallelism, routing TDD tasks to tdd-executor agents and non-TDD tasks to standard task-executor. Use when user says "execute tdd tasks", "run tdd tasks", "start tdd execution", or wants to execute TDD-paired tasks from create-tdd-tasks.
git clone https://github.com/sequenzia/agent-alchemy
T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/tdd-tools/skills/execute-tdd-tasks" ~/.claude/skills/sequenzia-agent-alchemy-execute-tdd-tasks && rm -rf "$T"
claude/tdd-tools/skills/execute-tdd-tasks/SKILL.mdExecute TDD Tasks Skill
This skill orchestrates autonomous execution of TDD task pairs generated by
/create-tdd-tasks. It is the TDD counterpart to the standard execute-tasks skill, reusing its session management, wave infrastructure, and execution context sharing while adding TDD-specific agent routing, RED-GREEN-REFACTOR verification, and per-task compliance reporting.
The key difference from standard
execute-tasks: this skill routes TDD tasks to the tdd-executor agent (from tdd-tools) which runs a 6-phase TDD workflow, while routing non-TDD tasks to the standard task-executor agent. It verifies TDD compliance (RED verified, GREEN verified, refactored) per task pair and reports aggregate results.
CRITICAL: Complete ALL 9 steps. The workflow is not complete until Step 9: Update CLAUDE.md is evaluated. After completing each step, immediately proceed to the next step without waiting for user prompts (except Step 4 which requires user confirmation).
Plugin Context
This skill is part of the
tdd-tools plugin and uses agents from the same plugin:
- tdd-executor agent (Opus) -- 6-phase TDD workflow per task
- test-writer agent (Sonnet) -- parallel test generation (used by tdd-executor internally)
For non-TDD tasks, this skill routes to the
task-executor agent from sdd-tools (soft cross-plugin dependency). Since TDD tasks are always generated from SDD tasks via /create-tasks, the sdd-tools plugin is expected to be installed when this skill runs.
Core Principles
1. TDD Compliance First
Every TDD task pair must complete the RED-GREEN-REFACTOR cycle:
- RED: Tests are written and verified to fail before any implementation exists
- GREEN: Implementation is written that makes all tests pass with zero regressions
- REFACTOR: Code is cleaned up while keeping all tests green
2. Strategic Parallelism
Maximize execution throughput without violating TDD sequencing:
- PARALLEL: Multiple test-writing tasks (RED phase) run simultaneously across features
- SEQUENTIAL: Within a single TDD pair, RED must complete before GREEN can start (enforced by dependencies)
3. Reuse execute-tasks Infrastructure
Session management, wave execution, context sharing, and progress tracking all reuse the same patterns from
execute-tasks. See references/tdd-execution-workflow.md for TDD-specific extensions.
4. Honest TDD Reporting
Report per-task compliance with the full RED-GREEN-REFACTOR cycle:
: Whether tests failed as expected before implementationred_verified
: Whether all tests pass after implementationgreen_verified
: Whether code was cleaned up while maintaining green testsrefactored
: Change in test coverage percentage (if measurable)coverage_delta
Orchestration Workflow
This skill orchestrates TDD task execution through a 9-step loop that mirrors the standard
execute-tasks orchestration with TDD-specific extensions. See references/tdd-execution-workflow.md for the full TDD wave execution details and references/tdd-verification-patterns.md for TDD phase verification rules.
Step 1: Load References
Read the TDD-specific reference files:
Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-execution-workflow.md Read: ${CLAUDE_PLUGIN_ROOT}/skills/execute-tdd-tasks/references/tdd-verification-patterns.md
Parse arguments from the invocation:
-- Filter tasks to a specific group--task-group <group>
-- Override max concurrent agents per wave--max-parallel <n>
-- Override retry attempts per task (default: 3)--retries <n>
Step 2: Load and Classify Tasks
Use
TaskList to retrieve all tasks. If --task-group was provided, filter to tasks where metadata.task_group matches.
Classify each task by type:
| Detection | Type | Agent | Source |
|---|---|---|---|
AND | TDD test task | | tdd-tools (same plugin) |
AND | TDD implementation task | | tdd-tools (same plugin) |
No metadata or | Non-TDD task | | sdd-tools (cross-plugin, soft dependency) |
Count and report:
- Total tasks (pending + in_progress + completed)
- TDD pairs identified (test + implementation tasks)
- Non-TDD tasks
- Already completed tasks
Handle edge cases:
- No tasks found: Report "No tasks found for group '{group}'. Use
to generate TDD task pairs from your SDD tasks." and stop./create-tdd-tasks - All completed: Report a summary of completed tasks including TDD compliance and stop.
- No unblocked tasks: Report which tasks exist and what's blocking them.
Step 3: Build Execution Plan
Resolve
max_parallel using precedence:
CLI argument (highest priority)--max-parallel
inmax_parallel.claude/agent-alchemy.local.md- Default: 5
Resolve
retries using precedence:
CLI argument (highest priority)--retries- Default: 3
Read
.claude/agent-alchemy.local.md if it exists, for TDD-specific settings:
--tdd.strictness
,strict
(default), ornormalrelaxed
-- Minimum coverage target (default: 80)tdd.coverage-threshold
Build the dependency graph from all pending tasks (TDD and non-TDD):
- Collect all pending tasks and their
relationshipsblockedBy - Run topological sort to assign dependency levels
- Assign tasks to waves by dependency level (Wave 1 = no dependencies, Wave 2 = depends only on Wave 1, etc.)
- Sort within waves by priority: critical > high > medium > low > unprioritized
- Break ties by "unblocks most others"
- Cap each wave at
tasksmax_parallel
Annotate waves with TDD phase labels:
The dependency structure from
create-tdd-tasks naturally produces alternating test/implementation waves:
Wave 1: [Test-A, Test-B, Test-C] -- RED phase (parallel test generation) Wave 2: [Impl-A, Impl-B, Impl-C] -- GREEN phase (parallel implementation) Wave 3: [Test-D, Test-E, Non-TDD-F] -- RED phase + non-TDD tasks (mixed) Wave 4: [Impl-D, Impl-E] -- GREEN phase
Detect circular dependencies: If tasks remain unassigned after topological sorting, they form a cycle. Report the cycle and attempt to break at the weakest link.
Validate TDD pair cross-references: For each TDD task, verify its
paired_task_id references a valid task. Log warnings for orphaned pairs.
Step 4: Present Execution Plan and Confirm
Display the TDD execution plan:
EXECUTION PLAN (TDD Mode) Tasks to execute: {count} ({tdd_pairs} TDD pairs, {non_tdd} non-TDD tasks) Retry limit: {retries} per task Max parallel: {max_parallel} per wave TDD Strictness: {strict|normal|relaxed} WAVE 1 ({n} tasks -- RED phase): 1. [{id}] Write tests for {subject} (RED, paired: #{impl_id}) 2. [{id}] Write tests for {subject} (RED, paired: #{impl_id}) WAVE 2 ({n} tasks -- GREEN phase): 3. [{id}] {subject} (GREEN, paired: #{test_id}) 4. [{id}] {subject} (GREEN, paired: #{test_id}) WAVE 3 ({n} tasks -- mixed): 5. [{id}] {subject} (non-TDD) 6. [{id}] Write tests for {subject} (RED, paired: #{impl_id}) {Additional waves...} BLOCKED (unresolvable dependencies): [{id}] {subject} -- blocked by: {blocker ids} COMPLETED: {count} tasks already completed
Use
AskUserQuestion to confirm:
questions: - header: "Confirm TDD Execution" question: "Ready to execute {count} tasks in {wave_count} waves (max {max_parallel} parallel) with TDD enforcement ({strictness} mode)?" options: - label: "Yes, start TDD execution" description: "Proceed with the TDD execution plan above" - label: "Cancel" description: "Abort without executing any tasks" multiSelect: false
If the user selects "Cancel", report "Execution cancelled. No tasks were modified." and stop.
Step 5: Initialize Execution Directory
Generate a
task_execution_id using three-tier resolution:
- IF
was provided:--task-group{task_group}-tdd-{YYYYMMDD}-{HHMMSS} - ELSE IF all open tasks share the same
:metadata.task_group{task_group}-tdd-{YYYYMMDD}-{HHMMSS} - ELSE:
tdd-session-{YYYYMMDD}-{HHMMSS}
Clean stale live session: Follow the same procedure as
execute-tasks:
- Check if
contains leftover files.claude/sessions/__live_session__/ - If found, archive to
.claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/ - Reset any
tasks from the interrupted session toin_progresspending
Concurrency guard: Check for
.claude/sessions/__live_session__/.lock. Follow the same lock protocol as execute-tasks.
Create session files in
:.claude/sessions/__live_session__/
-- Save the TDD execution plan from Step 5execution_plan.md
-- Initialize with TDD-extended template:execution_context.md# Execution Context ## Project Patterns <!-- Discovered coding patterns, conventions, tech stack details --> ## Key Decisions <!-- Architecture decisions, approach choices made during execution --> ## Known Issues <!-- Problems encountered, workarounds applied, things to watch out for --> ## File Map <!-- Important files discovered and their purposes --> ## TDD Compliance | Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta | |-----------|-----------|-----------|-----|-------|------------|----------------| ## Task History <!-- Brief log of task outcomes with relevant context -->
-- Initialize with standard table headers:task_log.md# Task Execution Log | Task ID | Subject | Type | Status | Attempts | Duration | Token Usage | |---------|---------|------|--------|----------|----------|-------------|
-- Empty subdirectory for archiving completed task filestasks/
-- Initialize with status template:progress.md# Execution Progress (TDD Mode) Status: Initializing Wave: 0 of {total_waves} Max Parallel: {max_parallel} TDD Strictness: {strictness} Updated: {ISO 8601 timestamp} ## Active Tasks ## Completed This Session
atexecution_pointer.md
-- Absolute path to$HOME/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/execution_pointer.md.claude/sessions/__live_session__/
Step 6: Initialize Execution Context
Read
.claude/sessions/__live_session__/execution_context.md (created in Step 6).
If a prior execution session's context exists, look in
.claude/sessions/ for the most recent timestamped subfolder and merge relevant learnings (Project Patterns, Key Decisions, Known Issues, File Map) into the new execution context.
Context compaction: If Task History has 10+ entries from merged sessions, compact older entries into a summary paragraph and keep the 5 most recent in full.
Step 7: Execute Loop
Execute tasks in waves with TDD-aware agent routing. No user interaction between waves.
8a: Initialize Wave
- Identify all unblocked tasks (pending status, all dependencies completed)
- Sort by priority (critical > high > medium > low > unprioritized)
- Take up to
tasks for this wavemax_parallel - If no unblocked tasks remain, exit the loop
8b: Snapshot Execution Context
Read
.claude/sessions/__live_session__/execution_context.md and hold as baseline for this wave. All agents read from the same snapshot.
8c: Launch Wave Agents
- Mark all wave tasks as
viain_progressTaskUpdate - Record
wave_start_time - Update
with active tasksprogress.md - Launch all wave agents simultaneously using parallel Task tool calls in a single message turn with
.run_in_background: true
Record the background task_id mapping: After the Task tool returns for each agent, record the mapping
{task_list_id → background_task_id} from each response. The background_task_id is needed later to call TaskOutput for process reaping and usage extraction.
Route each task to the correct agent:
For TDD tasks (
metadata.tdd_mode == true), launch the tdd-executor agent (same plugin):
Task: subagent_type: tdd-executor mode: bypassPermissions run_in_background: true prompt: | Execute the following TDD task. Task ID: {id} Task Subject: {subject} Task Description: --- {full description} --- Task Metadata: - Priority: {priority} - Complexity: {complexity} - TDD Phase: {tdd_phase} - Paired Task ID: {paired_task_id} - TDD Strictness: {strictness} CONCURRENT EXECUTION MODE Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md Do NOT write to execution_context.md directly. Do NOT update progress.md -- the orchestrator manages it. Write your learnings to the Context Write Path above instead. RESULT FILE PROTOCOL As your VERY LAST action (after writing context-task-{id}.md), write a compact result file to the Result Write Path above. TDD format includes a TDD Compliance section with RED Verified, GREEN Verified, Refactored, and Coverage Delta fields. After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL} {If GREEN phase, include paired test task result data:} PAIRED TEST TASK OUTPUT: --- {test task result file content and context} --- The tests written by the paired test task are already on disk. Your job is to implement code that makes these tests pass (GREEN phase), then refactor while keeping tests green (REFACTOR phase). {If retry attempt:} RETRY ATTEMPT {n} of {max_retries} Previous TDD phase that failed: {RED|GREEN|REFACTOR} Previous attempt failed with: --- {previous failure details from result file} --- TDD-specific retry guidance: - If RED failed (tests cannot run): Check test syntax, imports, and framework config - If RED warned (tests passed unexpectedly): Verify tests target new behavior, not existing code - If GREEN failed (tests still failing): Re-read test assertions, try different implementation approach - If GREEN failed (regressions): Identify regression cause, fix without breaking new tests - If REFACTOR failed: Revert to pre-refactor state, try smaller refactoring steps Instructions (follow in order): 1. Read the TDD execution and verification references 2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings 3. Understand the task requirements and explore the codebase 4. Execute the 6-phase TDD workflow (Understand, Write Tests, RED, Implement, GREEN, Complete) 5. Verify TDD compliance (RED verified, GREEN verified, refactored) 6. Update task status if PASS (mark completed) 7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md 8. Write result to .claude/sessions/__live_session__/result-task-{id}.md 9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
For non-TDD tasks (no
tdd_mode metadata), launch the standard task-executor agent from sdd-tools (cross-plugin, resolved globally):
Task: subagent_type: task-executor mode: bypassPermissions run_in_background: true prompt: | Execute the following task. Task ID: {id} Task Subject: {subject} Task Description: --- {full description} --- Task Metadata: - Priority: {priority} - Complexity: {complexity} - Source Section: {source_section} CONCURRENT EXECUTION MODE Context Write Path: .claude/sessions/__live_session__/context-task-{id}.md Result Write Path: .claude/sessions/__live_session__/result-task-{id}.md Do NOT write to execution_context.md directly. Do NOT update progress.md -- the orchestrator manages it. Write your learnings to the Context Write Path above instead. RESULT FILE PROTOCOL As your VERY LAST action (after writing context-task-{id}.md), write a compact result file to the Result Write Path above. Standard format with status, verification summary, files modified, and issues sections. After writing the result file, return ONLY: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL} {If retry attempt:} RETRY ATTEMPT {n} of {max_retries} Previous attempt failed with: --- {previous failure details from result file} --- Focus on fixing the specific failures listed above. Instructions (follow in order): 1. Read the execute-tasks skill and reference files 2. Read .claude/sessions/__live_session__/execution_context.md for prior learnings 3. Understand the task requirements and explore the codebase 4. Implement the necessary changes 5. Verify against acceptance criteria 6. Update task status if PASS (mark completed) 7. Write learnings to .claude/sessions/__live_session__/context-task-{id}.md 8. Write result to .claude/sessions/__live_session__/result-task-{id}.md 9. Return: DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
Important: Always include the
CONCURRENT EXECUTION MODE and RESULT FILE PROTOCOL sections regardless of max_parallel value. All agents write to per-task context files and result files.
-
Poll for completion: After launching all background agents, poll for result files using
frompoll-for-results.sh
. The script checks forexecute-tasks
files every 15 seconds for up to 45 minutes, printing progress lines periodically. A single Bash invocation handles the entire polling lifecycle.result-task-{id}.mdPoll invocation (via Bash tool with
):timeout: 2760000bash ${CLAUDE_PLUGIN_ROOT}/../sdd-tools/skills/execute-tasks/scripts/poll-for-results.sh \ .claude/sessions/__live_session__ {task_ids...}Parse the output:
— all agents finished. Proceed to 8d.POLL_RESULT: ALL_DONE
— not all agents finished within the timeout window. Log thePOLL_RESULT: TIMEOUT
line and proceed to 8d (handles missing result files via TaskOutput fallback).Waiting on:- Bash tool timeout or no recognizable output — treat as timeout. Proceed to 8d.
8d: Process Results (Batch)
After polling completes, process all wave results in a single batch:
-
Reap background agents and extract usage: For each task in the wave, call
using the mapping recorded in 8c. This serves two purposes:TaskOutput(task_id=<background_task_id>, block=true, timeout=60000)- Process reaping: Terminates the background agent process (prevents lingering subagents)
- Usage extraction: Returns metadata with
andduration_ms
per agenttotal_tokens
Extract per-task values:
: Fromtask_duration
in TaskOutput metadata. Format: <60s =duration_ms
, <60m ={s}s
, >=60m ={m}m {s}s{h}h {m}m {s}s
: Fromtask_tokens
in TaskOutput metadata. Format with comma separators (e.g.,total_tokens
)45,230
If
times out (agent truly stuck), callTaskOutput
to force-terminate the process, then setTaskStop(task_id=<background_task_id>)
andtask_duration = "N/A"
.task_tokens = "N/A" -
Read result files: For each task in the wave, read
. Parse status, attempt, verification, files modified, and issues. For TDD tasks, also parse the.claude/sessions/__live_session__/result-task-{id}.md
section (RED Verified, GREEN Verified, Refactored, Coverage Delta).## TDD Compliance -
Handle missing result files: If a result file is missing after polling, the
call in step 1 already captured diagnostic output for the crashed agent. Treat as FAIL.TaskOutput -
Determine task type label:
,TDD/RED
, orTDD/GREENnon-TDD -
Log status for each task:
[{id}] {subject}: {PASS|PARTIAL|FAIL} ({type}) -
Batch update
: Read once, append ALL wave rows, Write once:task_log.md| {id} | {subject} | {TDD/RED|TDD/GREEN|non-TDD} | {PASS/PARTIAL/FAIL} | {attempt}/{max} | {task_duration} | {task_tokens} |Where
and{task_duration}
come from the TaskOutput metadata extracted in step 1.{task_tokens} -
Batch update
: Read once, move ALL completed tasks from Active to Completed, Write once.progress.md -
For TDD tasks: Extract TDD compliance data from result files and update the
table in## TDD Complianceexecution_context.md
Context append fallback: If a result file is missing but
TaskOutput contains a LEARNINGS: section, write those learnings to context-task-{id}.md on behalf of the agent.
8e: Within-Wave Retry
After batch processing identifies failed tasks:
- Collect all failed tasks with retries remaining
- For each retriable task:
- Read failure details from
(Issues and TDD Compliance sections)result-task-{id}.md - Delete the old
file before re-launchingresult-task-{id}.md - Launch a new background agent (
) with failure context from the result filerun_in_background: true - Record the new
from each Task tool response (same mapping as 8c)background_task_id - For TDD tasks, include TDD-specific retry guidance in the prompt
- Update
:progress.md- [{id}] {subject} -- Retrying ({n}/{max})
- Read failure details from
- If any retry agents were launched:
- Poll for retry result files using
(same pattern as 8c step 5, with only the retry task IDs as arguments andpoll-for-results.sh
on the Bash invocation)timeout: 2760000 - After polling completes, reap retry agents: call
on each retryTaskOutput
to extractbackground_task_id
andduration_ms
(same pattern as 8d step 1). Iftotal_tokens
times out, callTaskOutput
to force-terminate.TaskStop - Process retry results using the same batch approach as 8d (using the freshly extracted per-task duration and token values for task_log rows)
- Repeat 8e if any retries still have attempts remaining
- Poll for retry result files using
- If retries exhausted:
- Leave task as
in_progress - Log final failure
- Retain the result file for post-analysis
- For TDD test tasks: The paired implementation task remains blocked and will not execute
- Leave task as
Test-writer agent failure fallback: If a TDD test task (RED phase) fails after all retries, the paired implementation task remains blocked. Do NOT fall back to running implementation without tests -- this would violate TDD principles.
8f: Merge Context and Clean Up After Wave
After ALL agents in the current wave have completed (including retries):
- Read
.claude/sessions/__live_session__/execution_context.md - Read all
files in task ID ordercontext-task-{id}.md - Append each file's content to the
section## Task History - For completed TDD tasks: Update the
table with pair results (extracted from result files in 8d)## TDD Compliance - Write the complete updated
execution_context.md - Delete the
filescontext-task-{id}.md - Clean up result files: Delete
for PASS tasks. Retainresult-task-{id}.md
for FAIL tasks (post-analysis). For TDD test tasks (RED): Retain the result file until the paired GREEN task completes — the orchestrator reads stored result data forresult-task-{id}.md
injection in the next wave.PAIRED TEST TASK OUTPUT
Capture test task result data for GREEN phase injection: When processing a completed test task (RED phase), read the result file content and store it for injection into the paired implementation task's prompt in the next wave. Delete the retained RED result file after the paired GREEN task's wave completes.
8g: Rebuild Next Wave and Archive
- Archive completed task files to
.claude/sessions/__live_session__/tasks/ - Refresh task list via
TaskList - Check for newly unblocked tasks (especially implementation tasks unblocked by their paired test tasks)
- Form next wave using priority sort
- If no unblocked tasks remain, exit the loop
- Loop back to 8a
Step 8: Session Summary
Write final
progress.md with complete status. Display the TDD execution summary:
TDD EXECUTION SUMMARY Tasks executed: {total attempted} TDD Pairs: {pair_count} Non-TDD: {non_tdd_count} Passed: {count} Partial: {count} Failed: {count} (after {total retries} total retry attempts) TDD COMPLIANCE: | Task Pair | Test Task | Impl Task | RED | GREEN | Refactored | Coverage Delta | |-----------|-----------|-----------|-----|-------|------------|----------------| | {feature} | #{test_id} ({status}) | #{impl_id} ({status}) | {Yes/No} | {Yes/No} | {Yes/No/N/A} | {+/-pct or N/A} | ... TDD Compliance Rate: {compliant_pairs}/{total_pairs} ({percentage}%) Waves completed: {wave_count} Max parallel: {max_parallel} TDD Strictness: {strictness} Total execution time: {sum of all task duration_ms values, formatted} Token Usage: {sum of all task total_tokens values, formatted with commas} Remaining: Pending: {count} In Progress (failed): {count} Blocked: {count} {If any tasks failed:} FAILED TASKS: [{id}] {subject} -- {brief failure reason} ({TDD phase if applicable}) {If newly unblocked tasks were discovered:} NEWLY UNBLOCKED: [{id}] {subject} -- unblocked by completion of [{blocker_id}]
After displaying the summary:
- Save
tosession_summary.md
with full summary content.claude/sessions/__live_session__/ - Archive the session: move all contents from
to__live_session__/.claude/sessions/{task_execution_id}/ - Leave
as an empty directory__live_session__/
stays pointing toexecution_pointer.md__live_session__/
Step 9: Update CLAUDE.md
Review
.claude/sessions/{task_execution_id}/execution_context.md for project-wide changes.
Update CLAUDE.md if the session introduced:
- New architectural patterns or conventions
- New dependencies or tech stack changes
- New development commands or workflows
- Changes to project structure
- Important design decisions
Skip if only task-specific or TDD-internal implementation details.
Agent Routing Summary
| Task Type | Detection | Agent | Plugin | Workflow |
|---|---|---|---|---|
| TDD test (RED) | , | | tdd-tools | 6-phase TDD |
| TDD impl (GREEN) | , | | tdd-tools | 6-phase TDD |
| Non-TDD | No or | | sdd-tools (cross-plugin) | 4-phase standard |
TDD Verification Rules
See
references/tdd-verification-patterns.md for complete verification rules.
Quick reference:
| Phase | PASS | FAIL |
|---|---|---|
| RED | All new tests fail as expected | Tests cannot run or syntax errors |
| GREEN | All tests pass, zero regressions | New tests still failing after implementation |
| REFACTOR | All tests green after cleanup | Tests broke and cannot recover |
Strictness levels (from
.claude/agent-alchemy.local.md tdd.strictness setting):
| Level | RED Behavior | Impact |
|---|---|---|
| strict | Tests passing unexpectedly = FAIL | Blocks GREEN phase |
| normal (default) | Tests passing unexpectedly = WARN | Proceeds with warning |
| relaxed | Tests passing unexpectedly = INFO | Proceeds, informational only |
Key Behaviors
- Autonomous execution loop: After user confirms the plan, no further prompts between tasks
- Background agent execution: Agents run as background tasks (
), returning ~3 lines instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.run_in_background: true - Agent process reaping: After polling confirms result files exist, the orchestrator calls
on each background task_id to reap the process and extract per-taskTaskOutput
andduration_ms
usage metadata. Iftotal_tokens
times out,TaskOutput
force-terminates the stuck agent. This prevents lingering background processes.TaskStop - Result file protocol: Each agent writes a compact
as its very last action. TDD result files include aresult-task-{id}.md
section. The orchestrator polls for these files via## TDD Compliance
in a single Bash invocation (withpoll-for-results.sh
), then batch-reads them for processing.timeout: 2760000 - Batched session file updates:
andtask_log.md
are updated once per wave (batch read-modify-write) instead of per-task.progress.md - Wave-based TDD parallelism: Test tasks (RED) in one wave, their paired implementation tasks (GREEN) in the next. Multiple features run in parallel within a wave
- Agent routing by metadata: TDD tasks go to
, non-TDD tasks go totdd-executortask-executor - Per-task context isolation: Each agent writes to
, orchestrator merges after each wavecontext-task-{id}.md - Test-to-implementation context flow: Test task result data is read from disk and injected into the paired implementation task's prompt via
. RED result files are retained across waves until the paired GREEN task completes.PAIRED TEST TASK OUTPUT - Within-wave retry: Failed tasks with retries remaining are re-launched as background agents. The orchestrator polls for retry result files using
(same pattern as initial wave polling).poll-for-results.sh - No silent degradation: If TDD test task fails, its paired implementation task stays blocked. Never run implementation without tests
- TDD compliance tracking: Per-pair tracking of RED/GREEN/REFACTOR verification extracted from result files
- Configurable strictness:
,strict
, ornormal
TDD enforcement via settingsrelaxed - Single-session invariant: Only one execution session at a time, enforced by
file.lock - Interrupted session recovery: Stale sessions archived, interrupted tasks reset to
pending
Example Usage
Execute all TDD tasks
/agent-alchemy-tdd:execute-tdd-tasks
Execute TDD tasks for a specific group
/agent-alchemy-tdd:execute-tdd-tasks --task-group user-authentication
Execute with limited parallelism
/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 2
Execute sequentially (no concurrency)
/agent-alchemy-tdd:execute-tdd-tasks --max-parallel 1
Execute with custom retries
/agent-alchemy-tdd:execute-tdd-tasks --retries 1
Execute group with custom parallelism and retries
/agent-alchemy-tdd:execute-tdd-tasks --task-group payments --max-parallel 3 --retries 1
Reference Files
-- TDD-aware wave execution, agent spawning, context sharing between RED and GREEN phasesreferences/tdd-execution-workflow.md
-- RED/GREEN/REFACTOR verification rules, compliance reporting, status determination matrixreferences/tdd-verification-patterns.md