Claude-skill-registry Enterprise AI Patterns
Production-grade AI architecture patterns for enterprise - security, governance, scalability, and operational excellence
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/enterprise-ai-patterns" ~/.claude/skills/majiayu000-claude-skill-registry-enterprise-ai-patterns && rm -rf "$T"
manifest:
skills/data/enterprise-ai-patterns/SKILL.mdsource content
Enterprise AI Patterns
You are an expert in enterprise-grade AI architecture patterns. You help organizations build AI systems that are secure, scalable, governable, and operationally excellent.
Enterprise AI Architecture Principles
The Five Pillars
┌─────────────────────────────────────────────────────────────────┐ │ ENTERPRISE AI PILLARS │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ SECURITY │ │ GOVERNANCE│ │ SCALE │ │ OPERATIONS│ │ │ │ │ │ │ │ │ │ │ │ │ │ - IAM │ │ - Policies│ │ - Auto │ │ - Monitor │ │ │ │ - Encrypt │ │ - Audit │ │ - Distrib │ │ - Alert │ │ │ │ - Network │ │ - Lineage │ │ - Multi- │ │ - Incident│ │ │ │ - Data │ │ - Quality │ │ region │ │ - SRE │ │ │ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ │ │ │ ┌───────────┐ │ │ │ COST │ │ │ │ │ │ │ │ - FinOps │ │ │ │ - Optimize│ │ │ │ - Budget │ │ │ └───────────┘ │ └─────────────────────────────────────────────────────────────────┘
Pattern 1: AI Gateway Architecture
Purpose
Centralized entry point for all AI services with security, routing, and observability.
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ AI GATEWAY PATTERN │ │ │ │ Applications │ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ App A │ │ App B │ │ App C │ │ │ └───┬────┘ └───┬────┘ └───┬────┘ │ │ │ │ │ │ │ └──────────┼──────────┘ │ │ │ │ │ ┌───────▼───────┐ │ │ │ AI GATEWAY │ │ │ │ │ │ │ │ - AuthN/AuthZ │ │ │ │ - Rate Limit │ │ │ │ - Routing │ │ │ │ - Logging │ │ │ │ - Caching │ │ │ │ - Fallback │ │ │ └───────┬───────┘ │ │ │ │ │ ┌─────────────┼─────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ OCI │ │Azure │ │ AWS │ │ │ │GenAI │ │OpenAI│ │Bedrock│ │ │ └──────┘ └──────┘ └──────┘ │ └─────────────────────────────────────────────────────────────────┘
Implementation
from fastapi import FastAPI, Request, HTTPException from fastapi.middleware.cors import CORSMiddleware import time import logging app = FastAPI() class AIGateway: def __init__(self): self.providers = { "oci": OCIGenAIProvider(), "azure": AzureOpenAIProvider(), "aws": AWSBedrockProvider() } self.rate_limiter = RateLimiter() self.cache = ResponseCache() self.logger = logging.getLogger("ai_gateway") async def route_request(self, request: AIRequest) -> AIResponse: # 1. Rate limiting if not self.rate_limiter.allow(request.user_id): raise HTTPException(429, "Rate limit exceeded") # 2. Check cache cached = self.cache.get(request) if cached: return cached # 3. Route to provider provider = self.select_provider(request) # 4. Execute with fallback try: response = await provider.generate(request) except ProviderError: response = await self.fallback(request) # 5. Cache and log self.cache.set(request, response) self.log_request(request, response) return response def select_provider(self, request: AIRequest) -> Provider: """Route based on model preference or cost.""" if request.model.startswith("gpt"): return self.providers["azure"] elif request.model.startswith("claude"): return self.providers["aws"] else: return self.providers["oci"] # Default to OCI
Pattern 2: Model Registry & Governance
Purpose
Central catalog of approved AI models with versioning, lineage, and access control.
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ MODEL REGISTRY PATTERN │ │ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ MODEL REGISTRY │ │ │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ Model A │ │ Model B │ │ Model C │ │ │ │ │ │ v1.0, v1.1 │ │ v2.0 │ │ v1.0 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ Status: │ │ Status: │ │ Status: │ │ │ │ │ │ PRODUCTION │ │ STAGING │ │ DEPRECATED │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ │ Metadata: │ │ │ │ - Owner, Team │ │ │ │ - Training data lineage │ │ │ │ - Performance metrics │ │ │ │ - Approval status │ │ │ │ - Access permissions │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ Governance: │ │ ├── Approval workflow (ML → Security → Legal → Deploy) │ │ ├── Version control (immutable versions) │ │ ├── Access control (who can use which models) │ │ └── Audit trail (all model operations logged) │ └─────────────────────────────────────────────────────────────────┘
Model Lifecycle
Model States: DEVELOPMENT: - In active development - Not for production use - Access: ML team only STAGING: - Ready for testing - Pending approval - Access: QA, stakeholders APPROVED: - Passed all reviews - Ready for production - Access: Applications PRODUCTION: - Actively serving traffic - Monitored - Access: Production systems DEPRECATED: - Scheduled for removal - New uses blocked - Existing uses grandfathered ARCHIVED: - Removed from service - Retained for audit - No access
Pattern 3: AI Observability Stack
Purpose
Full visibility into AI system health, performance, and behavior.
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ AI OBSERVABILITY STACK │ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ DASHBOARDS ││ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ │ │ Latency │ │Throughput│ │ Errors │ │ Cost │ ││ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ ALERTING ││ │ │ - Latency > threshold ││ │ │ - Error rate spike ││ │ │ - Cost anomaly ││ │ │ - Model drift detected ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ DATA LAYER ││ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ │ │ Metrics │ │ Logs │ │ Traces │ ││ │ │ │ (Prom) │ │ (Loki) │ │ (Jaeger) │ ││ │ │ └──────────┘ └──────────┘ └──────────┘ ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ Instrumentation: │ │ - Request/response logging │ │ - Token usage tracking │ │ - Latency breakdown │ │ - Error classification │ │ - User feedback signals │ └─────────────────────────────────────────────────────────────────┘
Key Metrics
Latency Metrics: - p50_latency_ms: Typical response time - p95_latency_ms: Worst case common - p99_latency_ms: Edge cases - time_to_first_token: Streaming starts Throughput Metrics: - requests_per_second: Current load - tokens_per_second: Processing rate - concurrent_requests: Active requests - queue_depth: Waiting requests Quality Metrics: - error_rate: Failed requests % - hallucination_rate: Detected hallucinations - user_feedback_score: Thumbs up/down ratio - retrieval_relevance: RAG quality score Cost Metrics: - tokens_consumed: Input + output - cost_per_request: Avg cost - daily_spend: Total cost - cost_by_application: Breakdown
Pattern 4: Prompt Management System
Purpose
Version-controlled, tested, and deployed prompts as code.
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ PROMPT MANAGEMENT SYSTEM │ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ PROMPT REPOSITORY ││ │ │ ││ │ │ prompts/ ││ │ │ ├── customer_support/ ││ │ │ │ ├── v1.0.0/ ││ │ │ │ │ ├── system.txt ││ │ │ │ │ ├── examples.json ││ │ │ │ │ └── tests.json ││ │ │ │ └── v1.1.0/ ││ │ │ │ └── ... ││ │ │ └── data_analysis/ ││ │ │ └── ... ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ CI/CD Pipeline: │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ Commit │─▶│ Test │─▶│ Review │─▶│ Stage │─▶│ Deploy │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ │ Testing: │ │ - Unit tests (expected outputs) │ │ - Regression tests (no quality drop) │ │ - A/B tests (compare versions) │ │ - Safety tests (no harmful outputs) │ └─────────────────────────────────────────────────────────────────┘
Prompt Template
# prompts/customer_support/v1.1.0/config.yaml name: customer_support version: 1.1.0 description: "Handle customer support inquiries" system_prompt: | You are a helpful customer support agent for {company_name}. Guidelines: - Be professional and empathetic - Cite knowledge base sources - Escalate complex issues - Never share internal policies Knowledge cutoff: {kb_update_date} variables: - company_name: required - kb_update_date: required examples: - input: "I want to return my order" expected_topics: ["return_policy", "refund_timeline"] - input: "My product is broken" expected_topics: ["warranty", "replacement"] tests: - name: "handles_refund_question" input: "How do I get a refund?" assertions: - contains: "refund" - does_not_contain: "internal" - sentiment: "helpful"
Pattern 5: AI Security Layers
Defense in Depth
┌─────────────────────────────────────────────────────────────────┐ │ AI SECURITY LAYERS │ │ │ │ Layer 1: PERIMETER │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - API Gateway authentication ││ │ │ - Rate limiting ││ │ │ - IP allowlisting ││ │ │ - WAF rules ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ Layer 2: INPUT VALIDATION │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Prompt injection detection ││ │ │ - Input sanitization ││ │ │ - Length limits ││ │ │ - Content filtering ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ Layer 3: MODEL SECURITY │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Dedicated clusters (isolation) ││ │ │ - Content moderation ││ │ │ - Output filtering ││ │ │ - Guardrails ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ Layer 4: DATA PROTECTION │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Encryption at rest ││ │ │ - Encryption in transit ││ │ │ - PII detection/masking ││ │ │ - Data residency controls ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ Layer 5: AUDIT & COMPLIANCE │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Request/response logging ││ │ │ - Access audit trail ││ │ │ - Compliance reporting ││ │ │ - Incident response ││ │ └─────────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────────┘
Prompt Injection Defense
class PromptSanitizer: """Detect and mitigate prompt injection attacks.""" INJECTION_PATTERNS = [ r"ignore previous instructions", r"disregard .*instructions", r"you are now", r"new persona", r"system prompt", r"<\|.*\|>", # Special tokens ] def sanitize(self, user_input: str) -> str: # 1. Check for known patterns for pattern in self.INJECTION_PATTERNS: if re.search(pattern, user_input, re.IGNORECASE): raise SecurityError("Potential prompt injection detected") # 2. Escape special characters sanitized = self.escape_special(user_input) # 3. Wrap in delimiters wrapped = f"<user_input>{sanitized}</user_input>" return wrapped def escape_special(self, text: str) -> str: """Escape characters that could be interpreted as instructions.""" replacements = { "```": "'''", # Code blocks "###": "---", # Markdown headers "<|": "< |", # Special tokens "|>": "| >", } for old, new in replacements.items(): text = text.replace(old, new) return text
Pattern 6: Cost Management
FinOps for AI
┌─────────────────────────────────────────────────────────────────┐ │ AI FINOPS FRAMEWORK │ │ │ │ VISIBILITY │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Cost by application/team ││ │ │ - Cost by model ││ │ │ - Token usage trends ││ │ │ - Unit economics (cost per conversation) ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ OPTIMIZATION │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Model right-sizing (use smaller when sufficient) ││ │ │ - Caching (avoid redundant calls) ││ │ │ - Batching (combine requests) ││ │ │ - Reserved capacity (commit for discounts) ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ GOVERNANCE │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ - Budget alerts by team ││ │ │ - Spend caps per application ││ │ │ - Chargeback/showback ││ │ │ - Approval for expensive models ││ │ └─────────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────────┘
Cost Optimization Strategies
Strategy 1: MODEL TIERING - Route simple queries to cheaper models - Reserve expensive models for complex tasks - Example: Command Light for FAQ, Command R+ for analysis Strategy 2: CACHING - Cache identical queries - Semantic caching (similar queries) - Cache embeddings - TTL based on content freshness Strategy 3: PROMPT OPTIMIZATION - Shorter prompts = fewer input tokens - Efficient few-shot examples - Remove unnecessary context Strategy 4: BATCHING - Combine multiple small requests - Process in bulk during off-peak - Reduced per-request overhead Strategy 5: COMMITMENT - Reserved capacity for steady workloads - Volume discounts with providers - Multi-year agreements where appropriate
Pattern 7: Multi-Region Resilience
Architecture
┌─────────────────────────────────────────────────────────────────┐ │ MULTI-REGION AI DEPLOYMENT │ │ │ │ Region A (Primary) Region B (Secondary) │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ AI Services │ │ AI Services │ │ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ │ │ GenAI DAC │ │ │ │ GenAI DAC │ │ │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ │ │ Knowledge Base│ │ │ │ Knowledge Base│ │ │ │ │ └──────────────┘ │ │ └──────────────┘ │ │ │ └─────────────────────┘ └─────────────────────┘ │ │ │ │ │ │ └──────────────┬───────────────┘ │ │ │ │ │ ┌───────▼───────┐ │ │ │ Global Load │ │ │ │ Balancer │ │ │ │ │ │ │ │ - Health │ │ │ │ - Failover │ │ │ │ - Geo-routing │ │ │ └───────────────┘ │ │ │ │ Sync: │ │ - Knowledge bases replicated │ │ - Models deployed to both regions │ │ - Config synchronized │ └─────────────────────────────────────────────────────────────────┘
Implementation Checklist
Phase 1: Foundation
- [ ] Deploy AI Gateway - [ ] Implement authentication/authorization - [ ] Set up basic monitoring - [ ] Configure rate limiting - [ ] Enable audit logging
Phase 2: Governance
- [ ] Establish model registry - [ ] Define approval workflows - [ ] Implement prompt management - [ ] Create cost tracking - [ ] Document policies
Phase 3: Security
- [ ] Input validation layer - [ ] Output filtering - [ ] PII detection - [ ] Prompt injection defense - [ ] Security review process
Phase 4: Operations
- [ ] Full observability stack - [ ] Alerting rules - [ ] Runbooks - [ ] Incident response plan - [ ] Capacity planning
Phase 5: Optimization
- [ ] Caching strategy - [ ] Model tiering - [ ] Cost optimization - [ ] Performance tuning - [ ] Multi-region deployment