Learn-skills.dev orchestrate
Coordinate parallel agent teams to execute multi-task implementation plans. Use when running phase tasks from the task plan, parallelizing independent implementation work, or executing custom plan files. Supports interactive (in-session TeamCreate agents) and headless (claude -p fire-and-forget processes) execution modes with task ledger tracking, heartbeat monitoring, budget control, and wave-based quality gates.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/acedergren/agentic-tools/orchestrate" ~/.claude/skills/neversight-learn-skills-dev-orchestrate && rm -rf "$T"
data/skills-md/acedergren/agentic-tools/orchestrate/SKILL.mdOrchestrate
Coordinate a team of parallel agents to execute a phase from the task plan. Manages task assignment, heartbeat monitoring, verification, scope enforcement, and wave transition quality gates.
Supports two execution modes:
- Interactive (default): In-session agents via TeamCreate/Task/SendMessage — good for complex tasks needing inter-agent coordination
- Headless (
): Independent--headless
processes — good for parallelizable tasks with clear scope boundaries, ~54% less coordination overheadclaude -p
Resource Loading
MANDATORY at Step 2: Read
agent-roles.md to determine role assignments and model selection.
MANDATORY at Step 3H.2a (headless only): Read prompt-templates/{role}.md for each task's role to build the system prompt.
MANDATORY at wave transitions: Read wave-template.md for the pre/during/post checklist.
Reference only: headless-runner.md — consult for claude -p flag details, output JSON format, or error classification.
Do NOT Load prompt templates in interactive mode — agents invoke /quality-commit and /tdd directly.
Critical Anti-Patterns
NEVER let parallel agents run
git add && git commit without flock — git's index file is process-global, and concurrent writes silently mix staged files between commits. Symptom: commit A contains files from task B. Recovery requires interactive rebase. This is why every agent system prompt includes flock /tmp/orchestrate/{session-id}/git.lock.
NEVER skip the file overlap check (3H.2b) — two agents editing the same file produces merge conflicts that neither agent can resolve because they have no knowledge of each other's changes. The orchestrator must detect overlap at plan time and serialize conflicting tasks.
NEVER trust
is_error field alone for failure detection — budget exhaustion sets is_error: false with subtype: "error_max_budget_usd". Always check subtype.startsWith("error_") instead. This was confirmed via live testing of claude -p --output-format json.
NEVER reuse PID files across waves — process IDs are recycled by the OS. Always clear
/tmp/orchestrate/{session-id}/task-*.pid between waves, or a stale PID could match an unrelated process, causing the monitor loop to wait indefinitely for a process that already exited.
NEVER spawn headless agents without
--no-session-persistence — without this flag, each claude -p process writes a session file to ~/.claude/. With 22 parallel tasks, this creates 22 orphaned session files that consume disk and pollute session history.
NEVER use
--dangerously-skip-permissions without --allowedTools — the skip-permissions flag alone gives agents unrestricted tool access including TeamCreate, SendMessage, and Task (which could spawn recursive agents). Always pair with --allowedTools "Bash Edit Write Read Glob Grep" to restrict to safe tools.
NEVER let a headless agent's commit go unverified — even if the process exits with
subtype: "success", the agent may have committed to the wrong branch, touched out-of-scope files, or produced a commit that breaks the build. Always run the full verification chain (3H.2e): commit exists → scope check → verify command.
Steps
1. Parse Arguments
Extract the orchestration target from
$ARGUMENTS:
- Phase ID (e.g.,
,A
,B
): Load tasks fromD
for that phase.claude/reference/phase-10-task-plan.md - Plan file path (e.g.,
): Parse tasks from the given filedocs/plans/my-plan.md - Inline task list (e.g.,
): Create tasks from semicolon-separated descriptions"task1; task2; task3"
Flags:
: Spawn independent--headless
processes instead of in-session agentsclaude -p
: Explicit flag for in-session TeamCreate-based mode (default if neither flag given)--interactive
: Parse and display tasks without spawning agents (works with both modes)--dry-run
: Start from wave N (skip earlier waves, assumes they're complete)--wave N
: Cap agent count (default: 5)--max-agents N
: Override per-task budget cap in USD (default: role-based from--budget-per-task N
)agent-roles.md
: Scale timeout thresholds (default: 2x estimated duration)--timeout-multiplier N
: Skip spawning a dedicated QA watcher (not recommended, interactive mode only)--no-qa
: Print all agent messages to the user (noisy but useful for debugging)--verbose
If no arguments, ask the user what to orchestrate.
2. Initialize Task Ledger
For each task in the plan, use
TaskCreate with:
: Task title (e.g., "A-1.01 Update patch/minor runtime deps")subject
: Full task spec including files to modify, verification command, and agent instructionsdescription
: Present-continuous description (e.g., "Updating runtime dependencies")activeForm
:metadata{ "role": "backend-impl", "agent_type": "sonnet", "phase": "A", "wave": "1", "task_id": "A-1.01", "estimated_duration": "20m", "verify_command": "pnpm install && pnpm build", "files": "package.json (root, api, frontend)", "status_detail": "pending" }
Assign roles using the rules in
agent-roles.md. The role field determines which prompt template to use and which model to select.
Set up
blockedBy dependencies from the task plan's Depends column using TaskUpdate.
Print a summary:
Phase A: Dependency Updates + Fastify Hardening Mode: headless (claude -p) 22 tasks (6 haiku, 16 sonnet) across 3 waves Wave 1: 6 tasks (all haiku, all parallel) Wave 2: 12 tasks (1 haiku, 11 sonnet) Wave 3: 4 tasks (2 haiku, 2 sonnet) Estimated: ~13 hours agent work Budget: $78 (22 tasks × avg $3.55)
3. Mode Router
Branch based on the execution mode flag:
→ Go to Step 3H: Headless Execution--headless
(or default) → Go to Step 3I: Interactive Execution--interactive
Step 3H: Headless Execution
3H.1 — Setup
Create session directory:
SESSION_ID=$(date +%s)-$(head -c 4 /dev/urandom | xxd -p) mkdir -p /tmp/orchestrate/$SESSION_ID touch /tmp/orchestrate/$SESSION_ID/git.lock
Record session metadata: phase, start time, budget cap, max agents.
3H.2 — Wave Execution
For each wave (starting from
--wave N or wave 1):
a. Generate Prompts
For each task in the wave:
- Determine the task's role from metadata (e.g.,
)backend-impl - Read the role-specific system prompt from
prompt-templates/{role}.md - Build the task prompt by replacing template variables:
→ task title + full description from ledger{{TASK_DESCRIPTION}}
→ files list from metadata{{TASK_FILES}}
→ verification command from metadata{{VERIFY_COMMAND}}
→ commit hashes + changed file summaries from completed tasks{{COMPLETED_CONTEXT}}
→{{GIT_LOCK_PATH}}/tmp/orchestrate/{session-id}/git.lock
- If context from completed tasks is large, pre-read relevant code snippets and include them (truncate to keep total prompt under 20K tokens)
- Save prompt to
/tmp/orchestrate/{session-id}/task-{id}.prompt
If
--dry-run: Print all generated prompts and exit.
b. File Overlap Check
For each pair of concurrent tasks in the wave:
If task_A.files ∩ task_B.files ≠ ∅: Serialize: task_B.blockedBy += task_A Log: "Serializing {task_B} after {task_A} due to file overlap: {overlapping files}"
c. Spawn Processes
For each unblocked task (up to
--max-agents concurrent):
claude -p \ --model {role.model} \ --system-prompt "$(cat /tmp/orchestrate/$SESSION_ID/task-$TASK_ID.prompt)" \ --allowedTools "Bash Edit Write Read Glob Grep" \ --dangerously-skip-permissions \ --max-budget-usd {budget} \ --output-format json \ --no-session-persistence \ "{task description}" \ > /tmp/orchestrate/$SESSION_ID/task-$TASK_ID.json 2>&1 & echo $! > /tmp/orchestrate/$SESSION_ID/task-$TASK_ID.pid date -u +%Y-%m-%dT%H:%M:%SZ > /tmp/orchestrate/$SESSION_ID/task-$TASK_ID.start echo "running" > /tmp/orchestrate/$SESSION_ID/task-$TASK_ID.status
Update task ledger:
TaskUpdate → status: in_progress.
d. Monitor Loop
Poll every 10 seconds until all wave tasks complete:
- Check PIDs:
for each running taskkill -0 $PID 2>/dev/null - If process exited:
- Read output JSON from
task-{id}.json - Parse
,is_error
,total_cost_usd
,num_turnsresult - Update task status file
- Read output JSON from
- Timeout check: If elapsed time > (estimated_duration × timeout_multiplier):
(SIGTERM)kill $PID- Wait 10s, then
if still runningkill -9 $PID - Mark as
timed_out
- Budget check: If
> task budget:total_cost_usd- Already enforced by
, but log the event--max-budget-usd
- Already enforced by
- Status report (every 30 seconds):
+----------------------------------------------------+ | Phase A — Wave 1 (Headless) [3/6 complete] | +----------------------------------------------------+ | A-1.01 backend-1 DONE $0.42 (12 turns, 45s) | | A-1.02 backend-2 DONE $0.38 (10 turns, 39s) | | A-1.03 qa-1 DONE $0.15 (8 turns, 22s) | | A-1.04 frontend-1 RUN $0.21 (6 turns, 31s) | | A-1.05 mastra-1 RUN $0.18 (5 turns, 28s) | | A-1.06 docs-1 RUN $0.08 (3 turns, 15s) | +----------------------------------------------------+ | Budget: $1.42 / $21.30 session | +----------------------------------------------------+
e. Verify Completed Tasks
For each task whose process exited successfully:
- Check for error: If
→ enter failure escalation (3H.3)is_error: true - Check new commit:
git log --oneline --since="{start_time}" -- {task.files}- If no commit found → failure escalation with "no commit produced"
- Scope check:
— verify only task files were touchedgit diff --name-only HEAD~1- If out-of-scope files modified →
, then failure escalationgit revert HEAD --no-edit
- If out-of-scope files modified →
- Run verify command: Execute
task.verify_command- If fails → failure escalation with verify output
- On success:
→TaskUpdate
, add commit hash to metadatastatus: completed- Reclaim semaphore slot
- If more unblocked tasks in wave → spawn next process (back to c.)
f. Wave Quality Gate
When all tasks in the wave are complete:
pnpm build && npx vitest run && pnpm lint
Read the full wave checklist from
wave-template.md.
- Gate passes → Move to next wave (back to 3H.2)
- Gate fails → Create a fix task, spawn a sonnet agent to fix it, re-run gate
3H.3 — Failure Escalation
Three-tier escalation for failed tasks:
Tier 1 — Retry with context (up to 2 retries):
- Append error output and the agent's result text to the original prompt
- Add prefix: "Previous attempt failed with the following error. Fix the issue and try again."
- Re-spawn with same model and budget
Tier 2 — Model escalation (after 2 retries fail):
- Escalate model: haiku → sonnet, sonnet → opus
- Escalate budget: original × 1.5
- Add prefix: "This task failed with a weaker model. You are a stronger model brought in to resolve it. Here is the full error history: ..."
- Re-spawn with escalated model
Tier 3 — User intervention (if model escalation also fails):
- Print failure details: task description, error output, retry history
- Offer choices:
- Skip: Mark task as skipped, continue with remaining tasks (may break downstream)
- Manual fix: Pause orchestration, let user fix manually, then resume
- Abort: Stop the entire orchestration
3H.4 — Git Safety for Parallel Agents
flock-based locking: All agent system prompts include flock instructions. The orchestrator creates the lock file at setup (3H.1).
File overlap detection: Handled at prompt generation time (3H.2b). Overlapping tasks are serialized.
Post-hoc scope verification: After each task (3H.2e step 3). Out-of-scope commits are reverted.
Git worktrees (for high-overlap phases): If >50% of wave tasks share files, fall back to worktree isolation:
git worktree add /tmp/orchestrate/$SESSION_ID/worktree-$TASK_ID -b temp/$TASK_ID # Agent works in the worktree directory # Orchestrator merges back: git merge temp/$TASK_ID
Step 3I: Interactive Execution
3I.1 — Create Team
Use
TeamCreate with a descriptive name derived from the phase:
- Phase A → team name
phase-A-foundation - Phase B → team name
phase-B-package-split - Custom plans → team name from
or prompt user$ARGUMENTS
3I.2 — Spawn Specialist Agents
Determine agent count from max parallelism in the current wave (capped by
--max-agents).
Assign roles using
agent-roles.md rules based on task file paths and metadata.
Spawn agents via the
Task tool with:
: The team name from aboveteam_name
:subagent_type
(all agents need full tool access)general-purpose
: From the task's role definition inmodelagent-roles.md
: Role-based naming:name
(e.g.,{role}-{N}
,backend-1
,qa-1
)security-1
Always spawn one
qa-1 agent for continuous QA watching (per the QA Watcher Protocol) unless --no-qa is set.
Agent spawn prompt template:
You are {name}, a {role} specialist on team "{team_name}". Your role: Execute assigned tasks from the task ledger. For each task: 1. Acknowledge receipt immediately via SendMessage to the team lead 2. Read full task details with TaskGet 3. Implement the task following your role's domain knowledge 4. Run quality gates: lint → typecheck → test (per workspace) 5. Stage specific files and commit with conventional message format 6. Report completion with commit hash via SendMessage 7. Check TaskList for your next assignment QA protocol: After every Edit/Write, notify qa-1 with changed file paths. Scope: ONLY work on your assigned task. If you discover related work, report it — do not expand scope. Git: Stage ONLY specific files. Never git add -A or git add .
3I.3 — Assign First Wave
Read
TaskList to find unblocked, unassigned tasks matching the current wave.
For each idle agent:
- Find a matching task (match role → agent specialization)
- Use
to setTaskUpdate
andownerstatus: in_progress - Update metadata:
{ "assigned_at": "<ISO timestamp>", "status_detail": "assigned" } - Send task details via
including:SendMessage- Task ID and title
- Files to modify
- Verification command
- Any dependencies or context from completed tasks
3I.4 — Monitor Loop
Run until all tasks in all waves are complete.
On Agent Message Received
Acknowledgment → Update metadata:
{ "status_detail": "in_progress", "last_heartbeat": "<ISO timestamp>" }
Completion claim → Verify before marking done:
- Check commit exists: Ask agent for commit hash, verify with
git log --oneline -1 <hash> - Run the task's
from metadataverify_command - If verified:
→TaskUpdatestatus: completed- Assign next unblocked task from the wave (or next wave if current is done)
- If not verified:
- Send specific feedback about what failed
- Keep task
in_progress
Issue report → Assess severity:
- Blocking: Create a fix task, assign to available agent
- Non-blocking: Log and continue
- Scope creep: Redirect agent back to assigned task
Progressive Stall Escalation
Replace flat 90s heartbeat with progressive escalation:
| Timer | Action |
|---|---|
| 60s | Ping: "Status check — what are you working on?" |
| 120s | Warning: "No response in 2min. Will reassign in 60s." |
| 180s | Reassign: Mark stalled, spawn replacement agent with task context |
When reassigning:
- Update metadata:
{ "status_detail": "stalled", "reassigned_from": "<agent-name>" } - Send the stalled agent a shutdown request
- Spawn replacement with the same role, include: "Previous agent stalled. Pick up where they left off."
Scope Enforcement
If an agent reports working on files NOT listed in their task's
files metadata:
- Send a stop message: "You're modifying files outside your task scope. Please revert and focus on: {task files}"
- If repeated: Reassign the task to a different agent
Status Report
Print every 3 minutes (or when the user asks):
+------------------------------------------+ | Phase A — Wave 2 Progress | +------------------------------------------+ | Completed: 6/12 | In Progress: 3 | | Stalled: 0 | Pending: 3 | +------------------------------------------+ | backend-1: A-2.04 Valkey cache [##-]| | backend-2: A-2.10 OracleStore [#--]| | qa-1: A-2.11 knip CI [###]| | qa-2: watching (last: PASS) | +------------------------------------------+
3I.5 — Improved Shutdown Protocol
When a phase or the entire orchestration completes:
- Send
to all agents viashutdown_requestSendMessage - Wait 30s for responses
- Re-send
to non-respondersshutdown_request - Wait 15s
to force cleanup of any remaining agentsTeamDelete
4. Wave Transition Gate
When all tasks in a wave are complete (applies to both modes):
-
Run full quality gate:
pnpm build && npx vitest run && pnpm lint -
If the phase's wave has a specific Gate command (from the task plan), run that too
-
Gate passes → Move to next wave, assign new tasks
-
Gate fails → Diagnose the failure, create a fix task, assign to an available agent (or spawn a headless fix agent)
Read the wave checklist from
wave-template.md for the full pre/during/post checklist.
5. Phase Completion
When all waves are complete:
-
Run the phase's final verification (from the task plan's last wave Gate)
-
Run
for a comprehensive quality check/health-check --quick -
Print final summary:
Phase A Complete Mode: headless Tasks: 22/22 completed (2 retried, 0 skipped) Duration: 2h 15m Cost: $34.20 / $78.00 budget Agents: 5 concurrent max Commits: 22 Issues: 1 scope violation (reverted + retried), 1 timeout (escalated to sonnet) Quality: All gates passed -
Interactive mode: Shut down all agents via the shutdown protocol (3I.5)
-
Headless mode: Clean up session directory:
rm -rf /tmp/orchestrate/$SESSION_ID
6. Cross-Phase Handoff
If more phases are queued (following the Phase Dependency DAG):
-
Check which phases are now unblocked (e.g., after A completes → B, D, F are unblocked)
-
For parallel phases, set up git worktrees per the Git Worktree Parallelization Strategy:
git worktree add ../portal-phase-{X} phase-10/{X}-{name} cd ../portal-phase-{X} && pnpm install -
Start a new orchestration cycle for each unblocked phase
-
Report to user which phases are starting in parallel
Arguments
See Step 1: Parse Arguments for the full flag reference. Summary:
$ARGUMENTS accepts a Phase ID, plan file path, or inline task list, plus optional flags --headless, --interactive, --dry-run, --wave N, --max-agents N, --budget-per-task N, --timeout-multiplier N, --no-qa, --verbose.
Integration Points
Role Registry
— Maps tasks to specialist roles (model, budget, prompt template)agent-roles.md
Prompt Templates
— Fastify 5 routes, plugins, servicesprompt-templates/backend-impl.md
— SvelteKit pages, components, storesprompt-templates/frontend-impl.md
— Mastra agents, RAG, tools, workflowsprompt-templates/mastra-impl.md
— OWASP + Oracle security reviewprompt-templates/security-reviewer.md
— TDD, test writing, QA watchingprompt-templates/qa-lead.md
— Documentation syncprompt-templates/doc-sync.md
Headless Runner Reference
—headless-runner.md
flags, output format, concurrency model, error classificationclaude -p
Referenced Skills
— Agents use this for each task's commit step (interactive mode)/quality-commit
— Agents use this for implementation tasks needing test coverage (interactive mode)/tdd
— Run at phase completion for comprehensive validation/health-check
Referenced Protocols
- QA Watcher Protocol —
section "Continuous QA Watcher Protocol".claude/reference/phase-10-task-plan.md - Git Worktree Strategy —
section "Git Worktree Parallelization Strategy".claude/reference/phase-10-task-plan.md - Phase Dependency DAG — A→B→C, A→D, A→F, B→E (from task plan header)
Claude Code Native Tools Used
Interactive mode:
/TeamCreate
— Team lifecycleTeamDelete
/TaskCreate
/TaskUpdate
/TaskList
— Ledger operationsTaskGet
— Agent communication (DM, broadcast, shutdown)SendMessage
tool — Spawning agents withTask
parameterteam_name
Headless mode:
/TaskCreate
/TaskUpdate
/TaskList
— Ledger operations (orchestrator only)TaskGet
— Spawning and monitoringBash
processesclaude -p
— Parsing output JSON filesRead
Examples
— Run Phase A interactively (default mode)/orchestrate A
— Run Phase A with headless/orchestrate A --headless
processesclaude -p
— Preview generated prompts without spawning/orchestrate A --headless --dry-run
— Cap each task at $3/orchestrate A --headless --budget-per-task 3
— Resume Phase A from Wave 2/orchestrate A --wave 2
— Interactive with 3 agents max/orchestrate A --interactive --max-agents 3
— Headless from custom plan/orchestrate docs/plans/custom-plan.md --headless
— Inline tasks/orchestrate "add auth middleware; write auth tests; update docs"
Error Recovery
- Agent crashes (interactive): Detect via progressive stall escalation, reassign task
- Process crashes (headless): Detect via PID check, retry with error context
- Quality gate fails: Create fix task, assign/spawn fix agent, re-run gate
- All agents stalled (interactive): Report to user, suggest manual intervention or restart
- Budget exceeded (headless): Pause and ask user before continuing
- Git conflicts: If worktree merge fails, pause and ask user for resolution strategy
- Scope violation (headless): Revert commit, retry with reinforced scope constraint