Claude-skill-registry end-to-end-orchestrator
Complete development workflow orchestrator coordinating all multi-ai skills (research → planning → implementation → testing → verification) with quality gates, failure recovery, and state management. Single-command complete workflows from objective to production-ready code. Use when implementing complete features requiring full pipeline, coordinating multiple skills automatically, or executing production-grade development cycles end-to-end.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/end-to-end-orchestrator" ~/.claude/skills/majiayu000-claude-skill-registry-end-to-end-orchestrator && rm -rf "$T"
skills/data/end-to-end-orchestrator/SKILL.mdEnd-to-End Orchestrator
Overview
end-to-end-orchestrator provides single-command complete development workflows, coordinating all 5 multi-ai skills from research through production deployment.
Purpose: Transform "I want feature X" into production-ready code through automated skill coordination
Pattern: Workflow-based (5-stage pipeline with quality gates)
Key Innovation: Automatic orchestration of research → planning → implementation → testing → verification with failure recovery and quality gates
The Complete Pipeline:
Input: Feature description ↓ 1. Research (multi-ai-research) [optional] ↓ [Quality Gate: Research complete] 2. Planning (multi-ai-planning) ↓ [Quality Gate: Plan ≥90/100] 3. Implementation (multi-ai-implementation) ↓ [Quality Gate: Tests pass, coverage ≥80%] 4. Testing (multi-ai-testing) ↓ [Quality Gate: Coverage ≥95%, verified] 5. Verification (multi-ai-verification) ↓ [Quality Gate: Score ≥90/100, all layers pass] Output: Production-ready code
When to Use
Use end-to-end-orchestrator when:
- Implementing complete features (not quick fixes)
- Want automated workflow (not manual skill chaining)
- Production-quality required (all gates must pass)
- Time optimization important (parallel where possible)
- Need failure recovery (automatic retry/rollback)
When NOT to Use:
- Quick fixes (<30 minutes)
- Exploratory work (uncertain requirements)
- Manual control preferred (step through each phase)
Prerequisites
Required
- All 5 multi-ai skills installed:
- multi-ai-research
- multi-ai-planning
- multi-ai-implementation
- multi-ai-testing
- multi-ai-verification
Optional
- agent-memory-system (for learning from past work)
- hooks-manager (for automation)
- Gemini CLI, Codex CLI (for tri-AI research)
Complete Workflow
Stage 1: Research (Optional)
Purpose: Ground implementation in proven patterns
Process:
-
Determine if Research Needed:
// Check if objective is familiar const similarWork = await recallMemory({ type: 'episodic', query: objective }); if (similarWork.length === 0) { // Unfamiliar domain → research needed needsResearch = true; } else { // Familiar → can skip research, use past learnings needsResearch = false; } -
Execute Research (if needed):
Use multi-ai-research for "[domain] implementation patterns and best practices"What It Provides:
- Claude research: Official docs, codebase patterns
- Gemini research: Web best practices, latest trends
- Codex research: GitHub patterns, code examples
- Quality: ≥95/100 with 100% citations
-
Quality Gate: Research Complete:
✅ Research findings documented ✅ Patterns identified (minimum 2) ✅ Best practices extracted (minimum 3) ✅ Quality score ≥95/100If Fail: Research incomplete → retry research OR proceed without (user decides)
Outputs:
- Research findings (.analysis/ANALYSIS_FINAL.md)
- Patterns and best practices
- Implementation recommendations
Time: 30-60 minutes (can skip if familiar domain)
Next: Proceed to Stage 2
Stage 2: Planning
Purpose: Create agent-executable plan with quality ≥90/100
Process:
-
Load Research Context (if research done):
let context = ""; if (researchDone) { context = await readFile('.analysis/ANALYSIS_FINAL.md'); } -
Invoke Planning:
Use multi-ai-planning to create plan for [objective] ${context ? `Research findings available in: .analysis/ANALYSIS_FINAL.md` : ''} Create comprehensive plan following 6-step workflow.What It Does:
- Analyzes objective
- Hierarchical decomposition (8-15 tasks)
- Maps dependencies, identifies parallel
- Plans verification for all tasks
- Scores quality (0-100)
-
Quality Gate: Plan Approved:
✅ Plan created ✅ Quality score ≥90/100 ✅ All tasks have verification ✅ Dependencies mapped ✅ No circular dependenciesIf Fail (score <90):
- Review gap analysis
- Apply recommended fixes
- Re-verify
- Retry up to 2 times
- If still <90: Escalate to human review
-
Save Plan to Shared State:
# Save for next stage cp plans/[plan-id]/plan.json .multi-ai-context/plan.json
Outputs:
- plan.json (machine-readable)
- PLAN.md (human-readable)
- COORDINATION.md (execution guide)
- Quality ≥90/100
Time: 1.5-3 hours
Next: Proceed to Stage 3
Stage 3: Implementation
Purpose: Execute plan with TDD, produce working code
Process:
-
Load Plan:
const plan = JSON.parse(readFile('.multi-ai-context/plan.json')); console.log(`📋 Loaded plan: ${plan.objective}`); console.log(` Tasks: ${plan.tasks.length}`); console.log(` Estimated: ${plan.metadata.estimated_total_hours} hours`); -
Invoke Implementation:
Use multi-ai-implementation following plan in .multi-ai-context/plan.json Execute all 6 steps: 1. Explore & gather context 2. Plan architecture (plan already created, refine as needed) 3. Implement incrementally with TDD 4. Coordinate multi-agent (if parallel tasks) 5. Integration & E2E testing 6. Quality verification before commit Success criteria from plan.What It Does:
- Explores codebase (progressive disclosure)
- Implements incrementally (<200 lines per commit)
- Test-driven development (tests first)
- Multi-agent coordination for parallel tasks
- Continuous testing during implementation
- Doom loop prevention (max 3 retries)
-
Quality Gate: Implementation Complete:
✅ All plan tasks implemented ✅ All tests passing ✅ Coverage ≥80% (gate), ideally ≥95% ✅ No regressions ✅ Doom loop avoided (< max retries)If Fail:
- Identify failing task
- Retry with different approach
- If 3 failures: Escalate to human
- Save state for recovery
-
Save Implementation State:
# Save for next stage echo '{ "status": "implemented", "files_changed": [...], "tests_run": 95, "tests_passed": 95, "coverage": 87, "commits": ["abc123", "def456"] }' > .multi-ai-context/implementation-status.json
Outputs:
- Working code
- Tests passing
- Coverage ≥80%
- Commits created
Time: 3-10 hours (varies by complexity)
Next: Proceed to Stage 4
Stage 4: Testing (Independent Verification)
Purpose: Verify tests are comprehensive and prevent gaming
Process:
-
Load Implementation Context:
const implStatus = JSON.parse( readFile('.multi-ai-context/implementation-status.json') ); console.log(`🧪 Testing implementation:`); console.log(` Files changed: ${implStatus.files_changed.length}`); console.log(` Current coverage: ${implStatus.coverage}%`); -
Invoke Independent Testing:
Use multi-ai-testing independent verification workflow Verify: - Tests in: tests/ - Code in: src/ - Specifications in: .multi-ai-context/plan.json Workflows to execute: 1. Test quality verification (independent agent) 2. Coverage validation (≥95% target) 3. Edge case discovery (AI-powered) 4. Multi-agent ensemble scoring (if critical feature) Score test quality (0-100).What It Does:
- Independent verification (separate agent from impl)
- Checks tests match specifications (not just what code does)
- Generates additional edge case tests
- Multi-agent ensemble for quality scoring
- Prevents overfitting
-
Quality Gate: Testing Verified:
✅ Test quality score ≥90/100 ✅ Coverage ≥95% (target achieved) ✅ Independent verification passed ✅ No test gaming detected ✅ Edge cases coveredIf Fail:
- Review test quality issues
- Generate additional tests
- Re-verify
- Max 2 retries, then escalate
-
Save Testing State:
echo '{ "status": "tested", "test_quality_score": 92, "coverage": 96, "tests_total": 112, "edge_cases": 23, "gaming_detected": false }' > .multi-ai-context/testing-status.json
Outputs:
- Test quality ≥90/100
- Coverage ≥95%
- Independent verification passed
Time: 1-3 hours
Next: Proceed to Stage 5
Stage 5: Verification (Multi-Layer QA)
Purpose: Final quality assurance before production
Process:
-
Load All Context:
const plan = JSON.parse(readFile('.multi-ai-context/plan.json')); const implStatus = JSON.parse(readFile('.multi-ai-context/implementation-status.json')); const testStatus = JSON.parse(readFile('.multi-ai-context/testing-status.json')); console.log(`🔍 Final verification:`); console.log(` Objective: ${plan.objective}`); console.log(` Implementation: ${implStatus.status}`); console.log(` Testing: ${testStatus.coverage}% coverage`); -
Invoke Multi-Layer Verification:
Use multi-ai-verification for complete quality check Verify: - Code: src/ - Tests: tests/ - Plan: .multi-ai-context/plan.json Execute all 5 layers: 1. Rules-based (linting, types, schema, SAST) 2. Functional (tests, coverage, examples) 3. Visual (if UI: screenshots, a11y) 4. Integration (E2E, API compatibility) 5. Quality scoring (LLM-as-judge, 0-100) All 5 quality gates must pass.What It Does:
- Runs all 5 verification layers
- Each layer is independent
- LLM-as-judge for holistic assessment
- Agent-as-a-Judge can execute tools to verify claims
- Multi-agent ensemble for critical features
-
Quality Gate: Production Ready:
✅ Layer 1 (Rules): PASS ✅ Layer 2 (Functional): PASS, coverage 96% ✅ Layer 3 (Visual): PASS or SKIPPED ✅ Layer 4 (Integration): PASS ✅ Layer 5 (Quality): 92/100 ≥90 ✅ ALL GATES PASSED → PRODUCTION APPROVEDIf Fail:
- Review gap analysis from failed layer
- Apply recommended fixes
- Re-verify from failed layer (not all 5)
- Max 2 retries per layer
- If still failing: Escalate to human
-
Generate Final Report:
# Feature Implementation Complete **Objective**: [from plan] ## Pipeline Execution Summary ### Stage 1: Research - Status: ✅ Complete - Quality: 97/100 - Time: 52 minutes ### Stage 2: Planning - Status: ✅ Complete - Quality: 94/100 - Tasks: 23 - Time: 1.8 hours ### Stage 3: Implementation - Status: ✅ Complete - Files changed: 15 - Lines added: 847 - Commits: 12 - Time: 6.2 hours ### Stage 4: Testing - Status: ✅ Complete - Test quality: 92/100 - Coverage: 96% - Tests: 112 - Time: 1.5 hours ### Stage 5: Verification - Status: ✅ Complete - Quality score: 92/100 - All layers: PASS - Time: 1.2 hours ## Final Metrics - **Total Time**: 11.3 hours - **Quality**: 92/100 - **Coverage**: 96% - **Status**: ✅ PRODUCTION READY ## Commits - abc123: feat: Add database schema - def456: feat: Implement OAuth integration - [... 10 more ...] ## Next Steps - Create PR for team review - Deploy to staging - Production release -
Save to Memory (if agent-memory-system available):
await storeMemory({ type: 'episodic', event: { description: `Complete implementation: ${objective}`, outcomes: { total_time: 11.3, quality_score: 92, test_coverage: 96, stages_completed: 5 }, learnings: extractedDuringPipeline } });
Outputs:
- Production-ready code
- Comprehensive final report
- Commits created
- PR ready (if requested)
- Memory saved for future learning
Time: 30-90 minutes
Result: ✅ PRODUCTION READY
Failure Recovery
Failure Handling at Each Stage
Stage Fails → Recovery Strategy:
Research Fails:
- Retry with different sources
- Skip research (use memory if available)
- Escalate to human if critical gap
Planning Fails (score <90):
- Review gap analysis
- Apply fixes automatically if possible
- Retry planning (max 2 attempts)
- Escalate if still <90
Implementation Fails:
- Identify failing task
- Automatic rollback to last checkpoint
- Retry with alternative approach
- Doom loop prevention (max 3 retries)
- Escalate with full error context
Testing Fails (coverage <80% or quality <90):
- Generate additional tests for gaps
- Retry verification
- Max 2 retries
- Escalate with coverage report
Verification Fails (score <90 or layer fails):
- Apply auto-fixes for Layer 1-2 issues
- Manual fixes needed for Layer 3-5
- Re-verify from failed layer (not all 5)
- Max 2 retries per layer
- Escalate with quality report
Escalation Protocol
When to Escalate to Human:
- Any stage fails 3 times (doom loop)
- Planning quality <80 after 2 retries
- Implementation doom loop detected
- Verification score <80 after 2 retries
- Budget exceeded (if cost tracking enabled)
- Circular dependency detected
- Irrecoverable error (file system, permissions)
Escalation Format:
# ⚠️ ESCALATION REQUIRED **Stage**: Implementation (Stage 3) **Failure**: Doom loop detected (3 failed attempts) ## Context - Objective: Implement user authentication - Failing Task: 2.2.2 Token generation - Error: Tests fail with "undefined userId" repeatedly ## Attempts Made 1. Attempt 1: Added userId to payload → Same error 2. Attempt 2: Changed payload structure → Same error 3. Attempt 3: Different JWT library → Same error ## Root Cause Analysis - Tests expect `user.id` but implementation uses `user.userId` - Mismatch in data model between test and implementation - Auto-fix failed 3 times ## Recommended Actions 1. Review test specifications vs. implementation 2. Align data model (user.id vs. user.userId) 3. Manual intervention required ## State Saved - Checkpoint: checkpoint-003 (before attempts) - Rollback available: `git checkout checkpoint-003` - Continue after fix: Resume from Task 2.2.2
Parallel Execution Optimization
Identifying Parallel Opportunities
From Plan:
const plan = readFile('.multi-ai-context/plan.json'); // Plan identifies parallel groups const parallelGroups = plan.parallel_groups; // Example: // Group 1: Tasks 2.1, 2.2, 2.3 (independent) // Can execute in parallel
Executing Parallel Tasks
Pattern:
// Stage 3: Implementation with parallel tasks const parallelGroup = plan.parallel_groups.find(g => g.group_id === 'pg2'); // Spawn parallel implementation agents const results = await Promise.all( parallelGroup.tasks.map(taskId => { const task = plan.tasks.find(t => t.id === taskId); return task({ description: `Implement ${task.description}`, prompt: `Implement task ${task.id}: ${task.description} Specifications from plan: ${JSON.stringify(task, null, 2)} Success criteria: ${task.verification.success_criteria.join('\n')} Write implementation and tests. Report completion status.` }); }) ); // Verify all parallel tasks completed const allSucceeded = results.every(r => r.status === 'complete'); if (allSucceeded) { // Proceed to integration } else { // Handle failures }
Time Savings: 20-40% faster than sequential execution
State Management
Cross-Skill State Sharing
Shared Context Directory:
.multi-ai-context/
Standard Files:
.multi-ai-context/ ├── research-findings.json # From multi-ai-research ├── plan.json # From multi-ai-planning ├── implementation-status.json # From multi-ai-implementation ├── testing-status.json # From multi-ai-testing ├── verification-report.json # From multi-ai-verification ├── pipeline-state.json # Orchestrator state └── failure-history.json # For doom loop detection
Benefits:
- Skills don't duplicate work
- Later stages read earlier outputs
- Failure recovery knows full state
- Memory can be saved from shared state
Progress Tracking
Real-Time Progress:
{ "pipeline_id": "pipeline_20250126_1200", "objective": "Implement user authentication", "started_at": "2025-01-26T12:00:00Z", "current_stage": 3, "stages": [ { "stage": 1, "name": "Research", "status": "complete", "duration_minutes": 52, "quality": 97 }, { "stage": 2, "name": "Planning", "status": "complete", "duration_minutes": 108, "quality": 94 }, { "stage": 3, "name": "Implementation", "status": "in_progress", "started_at": "2025-01-26T13:48:00Z", "tasks_total": 23, "tasks_complete": 15, "tasks_remaining": 8, "percent_complete": 65 }, { "stage": 4, "name": "Testing", "status": "pending" }, { "stage": 5, "name": "Verification", "status": "pending" } ], "estimated_completion": "2025-01-26T20:00:00Z", "quality_target": 90, "current_quality_estimate": 92 }
Query Progress:
# Check current status cat .multi-ai-context/pipeline-state.json | jq '.current_stage, .stages[2].percent_complete' # Output: Stage 3, 65% complete
Workflow Modes
Standard Mode (Full Pipeline)
All 5 Stages:
Research → Planning → Implementation → Testing → Verification
Time: 8-20 hours Quality: Maximum (all gates, ≥90) Use For: Production features, complex implementations
Fast Mode (Skip Research)
4 Stages (familiar domains):
Planning → Implementation → Testing → Verification
Time: 6-15 hours Quality: High (all gates except research) Use For: Familiar domains, time-sensitive features
Quick Mode (Essential Gates Only)
Implementation + Basic Verification:
Planning → Implementation → Testing (basic) → Verification (Layers 1-2 only)
Time: 3-8 hours Quality: Good (essential gates only) Use For: Internal tools, prototypes
Best Practices
1. Always Run Planning Stage
Even for "simple" features - planning quality ≥90 prevents issues
2. Use Memory to Skip Research
If similar work done before, recall patterns instead of researching
3. Monitor Progress
Check
.multi-ai-context/pipeline-state.json to track progress
4. Trust the Quality Gates
If gate fails, there's a real issue - don't skip fixes
5. Save State Frequently
Each stage completion saves state (enables recovery)
6. Review Final Report
Complete understanding of what was built and quality achieved
Integration Points
With All 5 Multi-AI Skills
Coordinates:
- multi-ai-research (Stage 1)
- multi-ai-planning (Stage 2)
- multi-ai-implementation (Stage 3)
- multi-ai-testing (Stage 4)
- multi-ai-verification (Stage 5)
Provides:
- Automatic skill invocation
- Quality gate enforcement
- Failure recovery
- State management
- Progress tracking
- Final reporting
With agent-memory-system
Before Pipeline:
- Recall similar past work
- Load learned patterns
- Skip research if memory sufficient
After Pipeline:
- Save complete episode to memory
- Extract learnings
- Update procedural patterns
- Improve estimation accuracy
With hooks-manager
Session Hooks:
- SessionStart: Load pipeline state
- SessionEnd: Save pipeline progress
- PostToolUse: Track stage completions
Notification Hooks:
- Send telemetry on stage completions
- Alert on gate failures
- Track quality scores
Quick Reference
The 5-Stage Pipeline
| Stage | Skill | Time | Quality Gate | Output |
|---|---|---|---|---|
| 1 | multi-ai-research | 30-60m | ≥95/100 | Research findings |
| 2 | multi-ai-planning | 1.5-3h | ≥90/100 | Executable plan |
| 3 | multi-ai-implementation | 3-10h | Tests pass, ≥80% cov | Working code |
| 4 | multi-ai-testing | 1-3h | ≥95% cov, quality ≥90 | Verified tests |
| 5 | multi-ai-verification | 1-3h | ≥90/100, all layers | Production ready |
Total: 8-20 hours → Production-ready feature
Workflow Modes
| Mode | Stages | Time | Quality | Use For |
|---|---|---|---|---|
| Standard | All 5 | 8-20h | Maximum | Production features |
| Fast | 2-5 (skip research) | 6-15h | High | Familiar domains |
| Quick | 2,3,4,5 (basic) | 3-8h | Good | Internal tools |
Quality Gates
- Research: ≥95/100, patterns identified
- Planning: ≥90/100, all tasks verifiable
- Implementation: Tests pass, coverage ≥80%
- Testing: Quality ≥90/100, coverage ≥95%
- Verification: ≥90/100, all 5 layers pass
end-to-end-orchestrator provides complete automation from feature description to production-ready code, coordinating all 5 multi-ai skills with quality gates, failure recovery, and state management - delivering enterprise-grade development workflows in a single command.
For examples, see examples/. For failure recovery, see Failure Recovery section.