Claude-skills model-deployment
Deploy ML models with FastAPI, Docker, Kubernetes. Use for serving predictions, containerization, monitoring, drift detection, or encountering latency issues, health check failures, version conflicts.
git clone https://github.com/secondsky/claude-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/secondsky/claude-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/model-deployment/skills/model-deployment" ~/.claude/skills/secondsky-claude-skills-model-deployment && rm -rf "$T"
plugins/model-deployment/skills/model-deployment/SKILL.mdML Model Deployment
Deploy trained models to production with proper serving and monitoring.
Deployment Options
| Method | Use Case | Latency |
|---|---|---|
| REST API | Web services | Medium |
| Batch | Large-scale processing | N/A |
| Streaming | Real-time | Low |
| Edge | On-device | Very low |
FastAPI Model Server
from fastapi import FastAPI from pydantic import BaseModel import joblib import numpy as np app = FastAPI() model = joblib.load('model.pkl') class PredictionRequest(BaseModel): features: list[float] class PredictionResponse(BaseModel): prediction: float probability: float @app.get('/health') def health(): return {'status': 'healthy'} @app.post('/predict', response_model=PredictionResponse) def predict(request: PredictionRequest): features = np.array(request.features).reshape(1, -1) prediction = model.predict(features)[0] probability = model.predict_proba(features)[0].max() return PredictionResponse(prediction=prediction, probability=probability)
Docker Deployment
FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY model.pkl . COPY app.py . EXPOSE 8000 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Model Monitoring
class ModelMonitor: def __init__(self): self.predictions = [] self.latencies = [] def log_prediction(self, input_data, prediction, latency): self.predictions.append({ 'input': input_data, 'prediction': prediction, 'latency': latency, 'timestamp': datetime.now() }) def detect_drift(self, reference_distribution): # Compare current predictions to reference pass
Deployment Checklist
- Model validated on test set
- API endpoints documented
- Health check endpoint
- Authentication configured
- Logging and monitoring setup
- Model versioning in place
- Rollback procedure documented
Quick Start: Deploy Model in 6 Steps
# 1. Save trained model import joblib joblib.dump(model, 'model.pkl') # 2. Create FastAPI app (see references/fastapi-production-server.md) # app.py with /predict and /health endpoints # 3. Create Dockerfile cat > Dockerfile << 'EOF' FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app.py model.pkl ./ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] EOF # 4. Build and test locally docker build -t model-api:v1.0.0 . docker run -p 8000:8000 model-api:v1.0.0 # 5. Push to registry docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0 docker push registry.example.com/model-api:v1.0.0 # 6. Deploy to Kubernetes kubectl apply -f deployment.yaml kubectl rollout status deployment/model-api
Known Issues Prevention
1. No Health Checks = Downtime
Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.
Solution: Implement both liveness and readiness probes:
# app.py @app.get("/health") # Liveness: Is service alive? async def health(): return {"status": "healthy"} @app.get("/ready") # Readiness: Can handle traffic? async def ready(): try: _ = model_store.model # Verify model loaded return {"status": "ready"} except: raise HTTPException(503, "Not ready")
# deployment.yaml livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5
2. Model Not Found Errors in Container
Problem:
FileNotFoundError: model.pkl when container starts.
Solution: Verify model file is copied in Dockerfile and path matches:
# ❌ Wrong: Model in wrong directory COPY model.pkl /app/models/ # But code expects /app/model.pkl # ✅ Correct: Consistent paths COPY model.pkl /models/model.pkl ENV MODEL_PATH=/models/model.pkl # In Python: model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
3. Unhandled Input Validation = 500 Errors
Problem: Invalid inputs crash API with unhandled exceptions.
Solution: Use Pydantic for automatic validation:
from pydantic import BaseModel, Field, validator class PredictionRequest(BaseModel): features: List[float] = Field(..., min_items=1, max_items=100) @validator('features') def validate_finite(cls, v): if not all(np.isfinite(val) for val in v): raise ValueError("All features must be finite") return v # FastAPI auto-validates and returns 422 for invalid requests @app.post("/predict") async def predict(request: PredictionRequest): # Request is guaranteed valid here pass
4. No Drift Monitoring = Silent Degradation
Problem: Model performance degrades over time, no one notices until users complain.
Solution: Implement drift detection (see references/model-monitoring-drift.md):
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1) @app.post("/predict") async def predict(request: PredictionRequest): prediction = model.predict(features) monitor.log_prediction(features, prediction, latency) # Alert if drift detected if monitor.should_retrain(): alert_manager.send_alert("Model drift detected - retrain recommended") return prediction
5. Missing Resource Limits = OOM Kills
Problem: Pod killed by Kubernetes OOMKiller, service goes down.
Solution: Set memory/CPU limits and requests:
resources: requests: memory: "512Mi" # Guaranteed cpu: "500m" limits: memory: "1Gi" # Max allowed cpu: "1000m" # Monitor actual usage: kubectl top pods
6. No Rollback Plan = Stuck on Bad Deploy
Problem: New model version has bugs, no way to revert quickly.
Solution: Tag images with versions, keep previous deployment:
# Deploy with version tag kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0 # If issues, rollback to previous kubectl rollout undo deployment/model-api # Or specify version kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
7. Synchronous Prediction = Slow Batch Processing
Problem: Processing 10,000 predictions one-by-one takes hours.
Solution: Implement batch endpoint:
@app.post("/predict/batch") async def predict_batch(request: BatchPredictionRequest): # Process all at once (vectorized) features = np.array(request.instances) predictions = model.predict(features) # Much faster! return {"predictions": predictions.tolist()}
8. No CI/CD Validation = Deploy Bad Models
Problem: Deploying model that fails basic tests, breaking production.
Solution: Validate in CI pipeline (see references/cicd-ml-models.md):
# .github/workflows/deploy.yml - name: Validate model performance run: | python scripts/validate_model.py \ --model model.pkl \ --test-data test.csv \ --min-accuracy 0.85 # Fail if below threshold
Best Practices
- Version everything: Models (semantic versioning), Docker images, deployments
- Monitor continuously: Latency, error rate, drift, resource usage
- Test before deploy: Unit tests, integration tests, performance benchmarks
- Deploy gradually: Canary (10%), then full rollout
- Plan for rollback: Keep previous version, document procedure
- Log predictions: Enable debugging and drift detection
- Set resource limits: Prevent OOM kills and resource contention
- Use health checks: Enable proper load balancing
When to Load References
Load reference files for detailed implementations:
-
FastAPI Production Server: Load
for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)references/fastapi-production-server.md -
Model Monitoring & Drift: Load
for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpointsreferences/model-monitoring-drift.md -
Containerization & Deployment: Load
for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklistsreferences/containerization-deployment.md -
CI/CD for ML Models: Load
for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategiesreferences/cicd-ml-models.md