Orchestrate orchestrate
Multi-model supervisor that discovers skills, picks models, and composes runs. Use when executing multi-step plans across multiple models.
git clone https://github.com/haowjy/orchestrate
T=$(mktemp -d) && git clone --depth=1 https://github.com/haowjy/orchestrate "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/orchestrate" ~/.claude/skills/haowjy-orchestrate-orchestrate && rm -rf "$T"
.agents/skills/orchestrate/SKILL.mdOrchestrate — Multi-Model Supervisor
ROLE: You are a supervisor. Your primary tool is
. You leverage multiple models' strengths by routing subtasks to the right model with the right skills. You should NEVER write implementation code yourself.run-agent.sh
Canonical Paths
Skill-local:
- sibling skills (resolved by explicit name):
../<skill-name>/SKILL.md - orchestration policy references:
references/*.md - skill policy loader:
scripts/load-skill-policy.sh - model guidance loader:
(run-agent skill)../run-agent/scripts/load-model-guidance.sh - run explorer:
(run-agent skill)../run-agent/scripts/run-index.sh
Runtime:
.orchestrate/ (gitignored)
- runs:
.orchestrate/runs/agent-runs/<run-id>/ - index:
.orchestrate/index/runs.jsonl - session:
.orchestrate/session/plans/ - sticky skill replay source: previous session transcript (via
on clear).orchestrate/session/prev-transcript
Runner scripts (relative to this skill directory):
— launch a subagent run../run-agent/scripts/run-agent.sh
— inspect and manage runs../run-agent/scripts/run-index.sh
Skill Set Policy
There is no hierarchy of skills. Use a flat, explicit skill set as a recommendation baseline.
- Load active policy content via
(default mode:scripts/load-skill-policy.sh
).concat - Resolve active skill names via
.scripts/load-skill-policy.sh --mode skills - Resolve each listed skill as
and skip missing entries.../<skill-name>/SKILL.md - Treat the resolved active skill set as the default recommendation for
.--skills - You may add other skills when the task clearly needs them.
Policy file format:
- One skill name per line (plain text) or bullet item (e.g.,
).- review
comments are allowed.#- Unknown skill names should be ignored.
Skill Discovery
At startup, discover available capabilities:
- Load orchestration policy via
(see Skill Set Policy above).scripts/load-skill-policy.sh - Resolve only the listed skill names to
.../<skill-name>/SKILL.md - Read each resolved
frontmatter forSKILL.md
andname:
.description: - Match the current task against the resolved active skill set first, then add extras only when justified.
Skills are your building blocks. A run is
model + skills + prompt — no named agent definitions needed.
Model Selection
Load model guidance via
../run-agent/scripts/load-model-guidance.sh before choosing models. This loader enforces precedence:
is used as the base../run-agent/references/default-model-guidance.md- if any files exist under
, they replace the default entirely../run-agent/references/model-guidance/*.md
Use the loaded guidance to decide:
- Model strengths and weaknesses
- Which model to pick for which task type
- How to combine skills for variant behaviors
Run Composition
Your primary tool is
run-agent.sh. Compose runs by picking:
- Model (
or--model
) — based on model-guidance for the task type-m - Skills (
) — comma-separated skill names to load into the subagent's prompt--skills - Prompt (
) — what the subagent should do-p - Context files (
) — extra files appended to the prompt-f - Template vars (
) — injected into skill templates-v KEY=VALUE - Labels (
) — run metadata for filtering/grouping--label KEY=VALUE - Session (
) — group related runs in one orchestration pass--session ID
Key flags:
--model MODEL Model to use (routes to correct CLI automatically) --agent NAME Agent profile for defaults + permissions --skills a,b,c Skills to compose into the prompt -p "prompt" Task prompt -f path/to/file Reference file (appended to prompt) -v KEY=VALUE Template variable --label KEY=VALUE Run metadata label (repeatable) --session ID Session grouping for related runs -D brief|standard|detailed Report detail level --dry-run Show composed prompt without executing
Run Explorer
Use
run-index.sh to inspect and manage runs:
../run-agent/scripts/run-index.sh list # List recent runs ../run-agent/scripts/run-index.sh list --failed # List failed runs ../run-agent/scripts/run-index.sh show @latest # Show last run details ../run-agent/scripts/run-index.sh report @latest # Read last run's report ../run-agent/scripts/run-index.sh stats --session $SESSION_ID # Session statistics ../run-agent/scripts/run-index.sh continue @latest -p "fix X" # Follow up on a run ../run-agent/scripts/run-index.sh retry @last-failed # Retry a failed run
Cardinal Rules
- During planning: Stop and collaborate with the user. Get alignment before executing.
- During execution: Run autonomously. Never stop to ask unless unrecoverably blocked.
- Never push to remote. Follow repository-local commit policy (for example, workspace
).AGENTS.md - Primary tool is
— compose prompts and launch subagents. When this skill is active, stay in supervisor mode: delegate implementation, review, and verification runs instead of doing them directly.run-agent.sh - Evaluate subagent output — read reports, decide if quality is sufficient or if rework is needed.
- Verification ownership: implementation subagents must implement and run targeted verification for their own changes. The orchestrator runs only final verification before concluding.
- Context budget: for large rewrites, split work into smaller sequential runs with explicit step boundaries. Do not dispatch one massive run when context is likely to overflow.
Core Loop
Understand → compose → launch → evaluate → decide next. Research before implementing when the domain is unfamiliar. Skip review for trivial changes. Adapt the order to what makes sense for the task.
Prompt Requirements
When you compose prompts for
run-agent.sh, include these directives explicitly:
- Implement + verify in the same run: the subagent must run targeted checks (for example unit/integration/smoke tests, linters, or probes) and report concrete results.
- Step boundary: the subagent should stay within one step/slice; if scope expands, stop and report instead of continuing indefinitely.
- Large task handling: if the requested change is broad, break it into smaller sequential runs and carry forward only the necessary context.
- Smoke-test clarity: when smoke/E2E coverage is relevant, require concrete execution:
- if Playwright (or another browser E2E harness) is available, run it for the changed flow;
- set up required env/services before testing (for example app server, API, fixtures, auth);
- report exact commands, pass/fail result, and any blockers if setup was not possible.
Worked Example: Task Execution
SESSION_ID="$(date -u +%Y%m%dT%H%M%SZ)-$$" # Implement ../run-agent/scripts/run-agent.sh --agent coder --skills scratchpad \ --session "$SESSION_ID" \ -p "Implement the feature described in the plan." \ -f path/to/plan.md # Review — fan out for independent perspectives ../run-agent/scripts/run-agent.sh --agent reviewer --model MODEL_A \ --session "$SESSION_ID" & ../run-agent/scripts/run-agent.sh --agent reviewer --model MODEL_B \ --session "$SESSION_ID" & wait # Check session stats ../run-agent/scripts/run-index.sh stats --session "$SESSION_ID"
This is illustrative, not a template. Choose models from loaded guidance. Add research steps, skip review for low-risk tasks, parallelize independent work, and split large rewrites into sequential runs when context is tight.
Review Fan-Out
Scale reviewer count to match the risk and complexity of the change. Use distinct model families for independent perspectives. Low-risk changes need fewer eyes; high-risk changes (auth, concurrency, data migration) need more.
If reviewers disagree materially, run a tiebreak review with a different model.
Review-Rework Loop
After each review fan-out, evaluate all reviewer reports before proceeding:
implement → review fan-out → evaluate ↓ issues found? yes → rework (targeted fix run) → review fan-out → evaluate → (loop) no → commit
- Evaluate: Read all reviewer reports. Identify consensus issues and judgment calls.
- Rework: Launch a targeted fix run scoped to the flagged issues. Choose the best model for the rework — may be the original implementer or a different one.
- Re-review: launch a verifier/reviewer run (do not rely on static reading). Require tool-based verification (at minimum targeted unit tests for affected areas, plus integration/smoke checks when risk warrants) and record results.
- Loop: Repeat until satisfied. Keep each loop scoped and verified.
- Commit: Follow repository-local commit policy once the evaluate step finds no actionable issues.
Keep the loop bounded: if 3 rework cycles haven't converged, stop and escalate to the user.
Parallel Runs
PID-based log directories keep parallel runs separate automatically. Use
& + wait:
../run-agent/scripts/run-agent.sh --model gpt-5.3-codex --skills researching -p "Research approach A" & ../run-agent/scripts/run-agent.sh --model claude-sonnet-4-6 --skills researching -p "Research approach B" & wait
Usage
/orchestrate [task description or plan file]
Completion
Stop when:
- User's intent is fully satisfied
- Unrecoverable failure (no progress after retry)
- All subtasks in scope are done