Agent-alchemy execute-tasks
Execute pending Claude Code Tasks in dependency order with wave-based concurrent execution and adaptive verification. Supports task group filtering and configurable parallelism. Use when user says "execute tasks", "run tasks", "start execution", "work on tasks", or wants to execute generated tasks autonomously.
git clone https://github.com/sequenzia/agent-alchemy
T=$(mktemp -d) && git clone --depth=1 https://github.com/sequenzia/agent-alchemy "$T" && mkdir -p ~/.claude/skills && cp -r "$T/claude/sdd-tools/skills/execute-tasks" ~/.claude/skills/sequenzia-agent-alchemy-execute-tasks && rm -rf "$T"
claude/sdd-tools/skills/execute-tasks/SKILL.mdExecute Tasks Skill
This skill orchestrates autonomous task execution for Claude Code Tasks. It builds a dependency-aware execution plan, launches a dedicated agent for each task through a 4-phase workflow (Understand, Implement, Verify, Complete), handles retries with failure context, and shares learnings across tasks through a shared execution context file.
Core Principles
1. Understand Before Implementing
Never write code without first understanding:
- What the task requires (acceptance criteria or inferred requirements)
- What code already exists (read before modify)
- What conventions the project follows (CLAUDE.md, existing patterns)
- What earlier tasks discovered (shared execution context)
2. Follow Existing Patterns
Match the codebase's established patterns:
- Coding style, naming conventions, file organization
- Error handling approach, logging patterns
- Test framework, test structure, assertion style
- Import ordering, module organization
3. Verify Against Criteria
Do not assume implementation is correct. Verify by:
- Walking through each acceptance criterion for spec-generated tasks
- Running tests and linters for all tasks
- Confirming the core change works as intended
- Checking for regressions in existing functionality
4. Report Honestly
Produce accurate verification results:
- PASS only when all Functional criteria pass and tests pass
- PARTIAL when non-critical criteria fail but core works
- FAIL when Functional criteria or tests fail
- Never mark a task complete if verification fails
Orchestration Workflow
This skill orchestrates task execution through a 10-step loop. See
references/orchestration.md for the full detailed procedure.
Step 1: Load Task List
Retrieve all tasks via
TaskList. If a --task-group argument was provided, filter tasks to only those with matching metadata.task_group. If a specific task-id argument was provided, validate it exists.
Step 2: Validate State
Handle edge cases: empty list, all completed, specific task blocked, no unblocked tasks, circular dependencies.
Step 3: Build Execution Plan
Resolve
max_parallel setting using precedence: --max-parallel CLI arg > .claude/agent-alchemy.local.md setting > default 5. Build a dependency graph from pending tasks. Assign tasks to waves using topological levels: Wave 1 = no dependencies, Wave 2 = depends on Wave 1 tasks, etc. Sort within waves by priority (critical > high > medium > low > unprioritized), break ties by "unblocks most others." Cap each wave at max_parallel tasks.
Step 4: Check Settings
Read
.claude/agent-alchemy.local.md if it exists for execution preferences, including max_parallel setting. CLI --max-parallel argument takes precedence over the settings file value.
Step 5: Initialize Execution Directory
Generate a
task_execution_id using three-tier resolution: (1) if --task-group provided → {task_group}-{YYYYMMDD}-{HHMMSS}, (2) else if all open tasks share the same metadata.task_group → {task_group}-{YYYYMMDD}-{HHMMSS}, (3) else → exec-session-{YYYYMMDD}-{HHMMSS}. Clean any stale __live_session__/ files by archiving them to .claude/sessions/interrupted-{YYYYMMDD}-{HHMMSS}/, resetting any in_progress tasks from the interrupted session back to pending. Check for and enforce the concurrency guard via .lock file. Create .claude/sessions/__live_session__/ directory containing:
— saved execution plan from Step 5execution_plan.md
— initialized with standard templateexecution_context.md
— initialized with table headers (Task ID, Subject, Status, Attempts, Duration, Token Usage)task_log.md
— subdirectory for archiving completed task filestasks/
— initialized with status template for real-time progress trackingprogress.md
atexecution_pointer.md
— created immediately with absolute path to~/.claude/tasks/{CLAUDE_CODE_TASK_LIST_ID}/.claude/sessions/__live_session__/
Step 6: Present Execution Plan and Confirm
Display the plan showing tasks to execute, blocked tasks, and completed count. Also display the details of step 5 which includes the session directory path (
.claude/sessions/__live_session__/) and files created, including the execution pointer file location.
Then ask the user to confirm before proceeding with execution. If the user cancels, stop without modifying any tasks.
Step 7: Initialize Execution Context
Read
.claude/sessions/__live_session__/execution_context.md (created in Step 5). If a prior execution context exists, look in .claude/sessions/ for the most recent timestamped subfolder and merge relevant learnings into the new one.
Step 8: Execute Loop
Execute tasks in waves. For each wave: snapshot
execution_context.md, mark wave tasks in_progress, update progress.md with all active tasks, launch up to max_parallel background agents simultaneously via parallel Task tool calls with run_in_background: true, recording the {task_list_id → background_task_id} mapping from each Task tool response. Each agent writes to context-task-{id}.md and a compact result-task-{id}.md (completion signal). The orchestrator polls for result files via the poll-for-results.sh script in a single Bash invocation (with timeout: 2760000), then batch-reads results to process outcomes — avoiding full agent output in context. After polling, the orchestrator calls TaskOutput on each background task_id to reap the process and extract per-task duration_ms and total_tokens usage metadata for the task log. If TaskOutput times out (agent stuck), TaskStop is called to force-terminate the agent. Failed tasks with retries remaining are re-launched as background agents (same TaskOutput reaping after retry polling). After all wave agents complete: merge per-task context files into execution_context.md, clean up result and context files, archive completed task JSONs, refresh TaskList for newly unblocked tasks, form next wave, repeat.
Step 9: Session Summary
Display execution results with pass/fail counts, total execution time (sum of per-task
duration_ms), failed task list, newly unblocked tasks, and total token usage (sum of per-task total_tokens captured via TaskOutput). Save session_summary.md to .claude/sessions/__live_session__/. Archive the session by moving all contents from __live_session__/ to .claude/sessions/{task_execution_id}/, leaving __live_session__/ as an empty directory. execution_pointer.md stays pointing to __live_session__/.
Step 10: Update CLAUDE.md
Review execution context for project-wide changes (new patterns, dependencies, commands, structure changes, design decisions). Make targeted edits to CLAUDE.md if meaningful changes occurred. Skip if only task-specific or internal implementation details.
Task Classification
Determine whether a task is spec-generated or general to select the right verification approach.
Detection Algorithm
- Check description format: Look for
with categorized criteria (**Acceptance Criteria:**
,_Functional:_
, etc.)_Edge Cases:_ - Check metadata: Look for
fieldmetadata.spec_path - Check source reference: Look for
in the descriptionSource: {path} Section {number}
If any check matches -> spec-generated task (use criterion-based verification)
If none match -> General task (use inferred verification)
4-Phase Workflow
Each task is executed by the
agent-alchemy-sdd:task-executor agent through these phases:
Phase 1: Understand
Load context and understand scope before writing code.
- Read the execute-tasks skill and references
- Read
for learnings from prior tasks.claude/sessions/__live_session__/execution_context.md - Load task details via
TaskGet - Classify the task (spec-generated vs general)
- Parse acceptance criteria or infer requirements from description
- Explore affected files via Glob/Grep
- Read
for project conventionsCLAUDE.md
Phase 2: Implement
Execute the code changes following project patterns.
- Read all target files before modifying them
- Follow the project's implementation order (data -> service -> interface -> tests)
- Match existing coding patterns and conventions
- Write tests if specified in testing requirements
- Run mid-implementation checks (linter, existing tests) to catch issues early
Phase 3: Verify
Verify implementation against task requirements using the adaptive approach.
- spec-generated tasks: Walk each acceptance criteria category (Functional, Edge Cases, Error Handling, Performance), check testing requirements, run tests
- General tasks: Infer "done" from description, run existing tests, run linter, verify core change is implemented
Phase 4: Complete
Report results and share learnings.
- Determine status (PASS/PARTIAL/FAIL) based on verification results
- If PASS: mark task as
viacompletedTaskUpdate - If PARTIAL or FAIL: leave as
for the orchestrator to decide on retryin_progress - Write learnings to
(files discovered, patterns learned, issues encountered).claude/sessions/__live_session__/context-task-{id}.md - Write compact result to
(status, verification summary, files modified, issues) — this is the last file written and signals completion to the orchestrator.claude/sessions/__live_session__/result-task-{id}.md - Return minimal status line:
DONE: [{id}] {subject} - {PASS|PARTIAL|FAIL}
Adaptive Verification Overview
Verification adapts based on task type:
| Task Type | Verification Method | Pass Threshold |
|---|---|---|
| spec-generated | Criterion-by-criterion evaluation | All Functional + Tests must pass |
| General | Inferred checklist from description | Core change works + Tests pass |
Critical rule: All Functional criteria must pass for a PASS result. Edge Cases, Error Handling, and Performance failures result in PARTIAL but do not block completion.
Shared Execution Context
Tasks within an execution session share learnings through
.claude/sessions/__live_session__/execution_context.md:
- Write-based updates: All orchestrator writes to session artifacts (
,execution_context.md
,task_log.md
) use Write (full file replacement) via a read-modify-write pattern, never Edit. This ensures atomic, reliable updates regardless of file size.progress.md - Snapshot before wave: The orchestrator snapshots
before launching each wave — all agents in a wave read the same baselineexecution_context.md - Per-task writes: Each agent writes to
instead of the shared file, regardless ofcontext-task-{id}.md
setting. This eliminates write contention and avoids fragile Edit operations on shared files.max_parallel - Result file protocol: Each agent writes a compact
(~18 lines) as its very last action. The orchestrator polls for these files and reads them instead of consuming full agent output, reducing context consumption by ~79% per wave.result-task-{id}.md - Merge after wave: After all agents in a wave complete, the orchestrator appends all
content to thecontext-task-{id}.md
section of## Task History
and deletes the per-task files. Result files for PASS tasks are also deleted; FAIL result files are retained for post-analysis.execution_context.md - Sections: Project Patterns, Key Decisions, Known Issues, File Map, Task History
This enables later tasks to benefit from earlier discoveries and retry attempts to learn from previous failures.
Key Behaviors
- Autonomous execution loop: After the user confirms the execution plan, no further prompts occur between tasks. The loop runs without interruption once started.
- Background agent execution: Agents run as background tasks (
), returning ~3 lines (task_id + output_file) instead of ~100+ lines of full output. This reduces orchestrator context consumption by ~79% per wave.run_in_background: true - Agent process reaping: After polling confirms result files exist, the orchestrator calls
on each background task_id to reap the process and extract per-taskTaskOutput
andduration_ms
usage metadata. Iftotal_tokens
times out,TaskOutput
force-terminates the stuck agent. This prevents lingering background processes.TaskStop - Result file protocol: Each agent writes a compact
(~18 lines) as its very last action. The orchestrator polls for these files viaresult-task-{id}.md
in a single Bash invocation (withpoll-for-results.sh
), then batch-reads them for processing. The result file doubles as a completion signal.timeout: 2760000 - Batched session file updates: Instead of per-task read-modify-write on
andtask_log.md
, all updates are batched into a single read-modify-write cycle per file per wave.progress.md - Wave-based parallelism: Tasks at the same dependency level run simultaneously, up to
concurrent agents per wave. Tasks in later waves wait until their dependencies in earlier waves complete.max_parallel - One agent per task, multiple per wave: Each task gets a fresh agent invocation with isolated context, but multiple agents run concurrently within a wave.
- Per-task context isolation: Each agent writes to
regardless ofcontext-task-{id}.md
setting. The orchestrator merges these after each wave. This eliminates write contention and fragile Edit operations on shared files.max_parallel - Within-wave retry: Failed tasks with retries remaining are re-launched as background agents. The orchestrator polls for retry result files using
(same pattern as initial wave polling).poll-for-results.sh - Configurable parallelism: Default 5 concurrent tasks, configurable via
argument or--max-parallel
settings. Set to 1 for sequential execution..claude/agent-alchemy.local.md - Configurable retries: Default 3 attempts per task, configurable via
argument.retries - Retry with context: Each retry includes failure details from the previous attempt's result file so the agent can try a different approach.
- Dynamic unblocking: After each wave completes, the dependency graph is refreshed and newly unblocked tasks are added to the next wave.
- Honest failure handling: After retries exhausted, tasks stay
(not completed), and execution continues.in_progress - Circular dependency detection: If all remaining tasks are blocked by each other, break at the weakest link (task with fewest blockers) and log a warning.
- Shared context: Agents read the snapshot of
and write learnings to per-task context files. The orchestrator appends per-task content to the Task History section between waves.execution_context.md - Resilient context sharing: If a task-executor fails to write its context or result file, the orchestrator falls back to
to capture diagnostic output.TaskOutput - Single-session invariant: Only one execution session can run at a time per project. A
file in.lock
prevents concurrent invocations.__live_session__/ - Interrupted session recovery: Stale sessions are detected and archived; tasks left
are automatically reset toin_progress
.pending
Example Usage
Execute all unblocked tasks
/agent-alchemy-sdd:execute-tasks
Execute a specific task
/agent-alchemy-sdd:execute-tasks 5
Execute with custom retry limit
/agent-alchemy-sdd:execute-tasks --retries 1
Execute specific task with retries
/agent-alchemy-sdd:execute-tasks 5 --retries 5
Execute tasks for a specific group
/agent-alchemy-sdd:execute-tasks --task-group user-authentication
Execute specific group with custom retries
/agent-alchemy-sdd:execute-tasks --task-group payments --retries 1
Execute with limited parallelism
/agent-alchemy-sdd:execute-tasks --max-parallel 2
Execute sequentially (no concurrency)
/agent-alchemy-sdd:execute-tasks --max-parallel 1
Execute group with custom parallelism and retries
/agent-alchemy-sdd:execute-tasks --task-group payments --max-parallel 3 --retries 1
Reference Files
- 10-step orchestration loop with execution plan, retry handling, and session summaryreferences/orchestration.md
- Detailed phase-by-phase procedures for the 4-phase workflowreferences/execution-workflow.md
- Task classification, criterion verification, pass/fail rules, and failure reporting formatreferences/verification-patterns.md