Claude-skill-registry Enterprise Orchestration
Advanced multi-agent coordination at scale for complex organizations
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/enterprise-orchestration" ~/.claude/skills/majiayu000-claude-skill-registry-enterprise-orchestration && rm -rf "$T"
manifest:
skills/data/enterprise-orchestration/SKILL.mdsource content
Enterprise Orchestration
Coordinate AI teams at enterprise scale with reliability and governance
Enterprise Orchestration provides the patterns, protocols, and infrastructure for running multiple AI agent teams across a large organization. This goes beyond basic orchestration to address the complexities of enterprise: governance, compliance, scale, and cross-team coordination.
Enterprise Challenges
Why Enterprise Is Different
Scale Challenges: - Multiple teams running AI agents simultaneously - Hundreds of tasks per day - Cross-team dependencies - Resource contention Governance Challenges: - Audit requirements - Compliance constraints - Access control - Decision accountability Coordination Challenges: - Conflicting priorities - Shared resources - Handoffs between teams - Consistent standards Quality Challenges: - Maintaining standards at scale - Preventing drift - Learning across teams - Continuous improvement
Architecture
Multi-Level Orchestration
┌─────────────────────────┐ │ ENTERPRISE ORCHESTRA │ │ (Governance) │ └───────────┬─────────────┘ │ ┌───────────────────────────┼───────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ DOMAIN │ │ DOMAIN │ │ DOMAIN │ │ ORCHESTRATOR │ │ ORCHESTRATOR │ │ ORCHESTRATOR │ │ (Product) │ │ (Platform) │ │ (Operations) │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │ │ ┌─────┼─────┐ ┌─────┼─────┐ ┌─────┼─────┐ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ A │ │ A │ │ A │ │ A │ │ A │ │ A │ │ A │ │ A │ │ A │ │ 1 │ │ 2 │ │ 3 │ │ 1 │ │ 2 │ │ 3 │ │ 1 │ │ 2 │ │ 3 │ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘
Layer Responsibilities
Enterprise Orchestra: - Cross-domain coordination - Resource allocation - Policy enforcement - Compliance monitoring - Executive reporting Domain Orchestrators: - Domain-specific coordination - Team management - Priority arbitration - Quality assurance - Domain expertise Individual Agents: - Task execution - Specialist work - Status reporting - Policy compliance
Governance Framework
Decision Authority Matrix
Decision Authority: Agent Level: Can decide: - Implementation details - Tool selection (approved list) - Tactical approaches Must escalate: - Scope changes - External communication - Resource requests Domain Orchestrator: Can decide: - Task prioritization - Team composition - Quality trade-offs Must escalate: - Budget allocation - Cross-domain conflicts - Policy exceptions Enterprise Orchestra: Can decide: - Resource allocation - Priority conflicts - Policy enforcement Must escalate: - Strategic changes - Compliance issues - Major incidents
Policy Enforcement
Policy Framework: Access Control: - Role-based permissions - Data classification - Action restrictions - Audit logging Quality Standards: - Code review requirements - Testing thresholds - Documentation standards - Security checks Communication Rules: - External communication approval - Sensitive data handling - Escalation protocols - Incident reporting Resource Limits: - Compute quotas - API rate limits - Storage allocation - Time boundaries
Audit Trail
Audit Requirements: For Every Decision: - Who made it (agent ID) - When it was made (timestamp) - What was decided (content) - Why it was decided (reasoning) - What was the outcome (result) Audit Log Schema: { "id": "audit-uuid", "timestamp": "ISO-8601", "agent_id": "string", "action_type": "decision|execution|escalation", "domain": "product|platform|operations", "summary": "brief description", "details": { "context": "what led to this", "options_considered": ["option1", "option2"], "decision": "what was decided", "reasoning": "why this choice", "outcome": "what happened" }, "classification": "public|internal|sensitive", "related_tasks": ["task-id-1", "task-id-2"] } Retention: - Standard decisions: 90 days - Significant decisions: 1 year - Compliance-relevant: 7 years
Cross-Team Coordination
Dependency Management
Dependency Types: Blocking Dependencies: - Must complete before next task - Requires explicit handoff - Has defined interface Informational Dependencies: - Would benefit from knowledge - Non-blocking if unavailable - Best effort communication Resource Dependencies: - Shared resource required - Requires scheduling - Has contention potential Dependency Protocol: 1. Register dependency in system 2. Notify dependent team 3. Track progress against dependency 4. Alert on risk/delay 5. Facilitate resolution 6. Confirm completion
Handoff Protocol
Cross-Team Handoff: Pre-Handoff: - Notify receiving team - Prepare handoff package - Schedule handoff meeting - Verify prerequisites Handoff Package: - Task context and history - Current state - Outstanding issues - Key decisions made - Contacts for questions Handoff Meeting: - Walk through context - Clarify questions - Confirm understanding - Agree on expectations - Document handoff Post-Handoff: - Receiving team takes ownership - Handing team available for questions - Progress tracked in system - Escalation path defined
Conflict Resolution
Conflict Types: Priority Conflicts: - Multiple teams need same resource - Competing deadlines - Different urgency assessments Scope Conflicts: - Unclear ownership - Overlapping responsibilities - Different interpretations Technical Conflicts: - Different approaches - Incompatible decisions - Standards disagreements Resolution Process: 1. Identify conflict clearly 2. Gather perspectives from all parties 3. Identify underlying interests 4. Explore options together 5. Escalate if unresolved 6. Document resolution
Scale Operations
Workload Distribution
Distribution Strategy: Task Assignment: - Match task to best-fit agent - Consider current load - Respect domain boundaries - Balance quality and speed Load Balancing: - Monitor agent utilization - Redistribute on overload - Maintain specialization - Avoid context switching Capacity Planning: - Track historical demand - Forecast future needs - Identify bottlenecks - Plan scaling actions
Performance Monitoring
Monitoring Dimensions: Throughput: - Tasks completed per hour - By agent, team, domain - Trend analysis Quality: - Error rates - Revision rates - Customer satisfaction - Standard compliance Latency: - Time to completion - Queue wait times - Handoff delays - Escalation times Resource Utilization: - Agent utilization % - API usage - Compute consumption - Cost per task Alerting: - Error rate > threshold: Page - Queue depth > threshold: Warn - Latency > SLA: Escalate - Utilization > 90%: Plan scaling
Incident Management
Incident Severity: SEV-1 (Critical): - Enterprise-wide impact - Major business function blocked - Response: All hands, immediate - Resolution target: 1 hour SEV-2 (High): - Domain-wide impact - Significant degradation - Response: Domain team, priority - Resolution target: 4 hours SEV-3 (Medium): - Team-level impact - Workaround available - Response: Team, elevated - Resolution target: 24 hours SEV-4 (Low): - Individual impact - Minimal business effect - Response: Normal queue - Resolution target: 1 week Incident Protocol: 1. Detect and classify 2. Assemble response team 3. Communicate status 4. Investigate and mitigate 5. Resolve and verify 6. Post-mortem and learn
Compliance Framework
Regulatory Compliance
Compliance Areas: Data Privacy: - GDPR requirements - Data classification - Retention policies - Subject access requests Security: - Access control - Encryption requirements - Vulnerability management - Incident response Industry Specific: - Healthcare (HIPAA) - Financial (SOX, PCI) - Government (FedRAMP) Compliance Controls: - Policy enforcement - Automated checks - Manual reviews - Regular audits
Risk Management
Risk Categories: Operational Risk: - Agent errors - System failures - Process breakdowns Security Risk: - Unauthorized access - Data breaches - Malicious actions Compliance Risk: - Regulatory violations - Policy breaches - Audit failures Strategic Risk: - Poor decisions at scale - Reputation damage - Competitive disadvantage Risk Controls: - Prevention: Stop before it happens - Detection: Find it quickly - Response: Handle it effectively - Recovery: Return to normal
Knowledge Management
Organizational Learning
Learning System: Capture: - Document decisions and rationale - Record problems and solutions - Note patterns and anti-patterns - Preserve context Organize: - Tag by domain, topic, type - Connect related items - Maintain freshness - Curate quality Distribute: - Make discoverable - Push relevant updates - Train new agents - Cross-pollinate teams Apply: - Reference in similar situations - Suggest based on context - Warn about known pitfalls - Guide best practices
Best Practice Repository
Best Practice Structure: Practice: [Name] Context: When does this apply? What problem does it solve? The Practice: What to do, step by step Why It Works: The reasoning behind it Anti-Patterns: What NOT to do Examples: Real cases of success Related Practices: What else to consider
Integration Architecture
MCP Server Ecosystem
Enterprise MCP Stack: Core Infrastructure: - github: Code management - linear: Task management - notion: Documentation - slack: Communication Development: - next-devtools: Runtime debugging - playwright: Testing - vercel: Deployment Analytics: - Custom metrics server - Log aggregation - Dashboard server Governance: - Audit log server - Policy server - Compliance server
API Gateway Pattern
Enterprise API Gateway: Functions: - Authentication - Authorization - Rate limiting - Request routing - Response caching - Logging Security: - Token validation - Scope enforcement - IP allowlisting - Encryption Observability: - Request tracing - Performance metrics - Error tracking
Deployment Patterns
Progressive Rollout
Rollout Strategy: Phase 1: Canary - Deploy to 1% of agents - Monitor closely - Quick rollback if issues - Duration: 1-2 hours Phase 2: Early Majority - Deploy to 25% of agents - Expanded monitoring - Validate performance - Duration: 4-8 hours Phase 3: Majority - Deploy to 75% of agents - Full monitoring - Support team ready - Duration: 24 hours Phase 4: Complete - Deploy to 100% - Normal monitoring - Close rollout
Feature Flags
Feature Flag Strategy: Flag Types: - Release flag: Hide unfinished features - Experiment flag: A/B testing - Ops flag: Emergency toggle - Permission flag: Entitlement control Flag Lifecycle: 1. Create flag (disabled) 2. Deploy code with flag 3. Enable gradually 4. Full rollout 5. Remove flag from code Best Practices: - Short-lived flags - Clear ownership - Regular cleanup - Documented purpose
Quality Assurance
Quality Gates
Enterprise Quality Gates: Pre-Deployment: - All tests pass - Code review complete - Security scan clean - Documentation updated Post-Deployment: - Smoke tests pass - Performance within SLA - Error rate acceptable - User feedback reviewed Periodic: - Full regression suite - Load testing - Security assessment - Compliance audit
Continuous Improvement
Improvement Cycle: Measure: - Collect performance data - Track quality metrics - Gather feedback Analyze: - Identify patterns - Find root causes - Prioritize opportunities Improve: - Design changes - Implement improvements - Validate results Standardize: - Document best practices - Update processes - Train teams
Executive Reporting
Dashboard Metrics
Executive Dashboard: Health Overview: - Overall system status - Active incident count - SLA compliance rate Performance Summary: - Tasks completed (daily/weekly) - Quality score - Cost per task Team Performance: - By domain - By team - Trend analysis Risk Indicators: - Compliance status - Security posture - Operational risks
Report Templates
Weekly Executive Summary: Headline: [One sentence on overall status] Key Metrics: - Tasks completed: X (+Y% vs last week) - Quality score: X% - SLA achievement: X% - Cost per task: $X Notable Events: - [Event 1] - [Event 2] Risks and Concerns: - [Risk 1] - [Mitigation] - [Risk 2] - [Mitigation] Next Week Focus: - [Priority 1] - [Priority 2]
"At enterprise scale, orchestration isn't about control—it's about enabling coordination while maintaining quality."