Claude-skill-registry-data ml-deployment-helper
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/ml-deployment-helper" ~/.claude/skills/majiayu000-claude-skill-registry-data-ml-deployment-helper && rm -rf "$T"
manifest:
data/ml-deployment-helper/SKILL.mdsource content
ML Deployment Helper
Overview
Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.
Deployment Checklist
Before deploying any model, this skill ensures:
- ✅ Model versioned and tracked
- ✅ Dependencies documented (requirements.txt/Dockerfile)
- ✅ API endpoint created
- ✅ Input validation implemented
- ✅ Monitoring configured
- ✅ A/B testing ready
- ✅ Rollback plan documented
- ✅ Performance benchmarked
Deployment Patterns
Pattern 1: REST API (FastAPI)
from specweave import create_model_api # Generates production-ready API api = create_model_api( model_path="models/model-v3.pkl", increment="0042", framework="fastapi" ) # Creates: # - api/ # ├── main.py (FastAPI app) # ├── models.py (Pydantic schemas) # ├── predict.py (Prediction logic) # ├── Dockerfile # ├── requirements.txt # └── tests/
Generated
main.py:
from fastapi import FastAPI, HTTPException from pydantic import BaseModel import joblib app = FastAPI(title="Recommendation Model API", version="0042-v3") model = joblib.load("model-v3.pkl") class PredictionRequest(BaseModel): user_id: int context: dict @app.post("/predict") async def predict(request: PredictionRequest): try: prediction = model.predict([request.dict()]) return { "recommendations": prediction.tolist(), "model_version": "0042-v3", "timestamp": datetime.now() } except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/health") async def health(): return {"status": "healthy", "model_loaded": model is not None}
Pattern 2: Batch Prediction
from specweave import create_batch_predictor # For offline scoring batch_predictor = create_batch_predictor( model_path="models/model-v3.pkl", increment="0042", input_path="s3://bucket/data/", output_path="s3://bucket/predictions/" ) # Creates: # - batch/ # ├── predictor.py # ├── scheduler.yaml (Airflow/Kubernetes CronJob) # └── monitoring.py
Pattern 3: Real-Time Streaming
from specweave import create_streaming_predictor # For Kafka/Kinesis streams streaming = create_streaming_predictor( model_path="models/model-v3.pkl", increment="0042", input_topic="user-events", output_topic="predictions" ) # Creates: # - streaming/ # ├── consumer.py # ├── predictor.py # ├── producer.py # └── docker-compose.yaml
Containerization
from specweave import containerize_model # Generates optimized Dockerfile dockerfile = containerize_model( model_path="models/model-v3.pkl", framework="sklearn", python_version="3.10", increment="0042" )
Generated
Dockerfile:
FROM python:3.10-slim WORKDIR /app # Copy model and dependencies COPY models/model-v3.pkl /app/model.pkl COPY requirements.txt /app/ # Install dependencies RUN pip install --no-cache-dir -r requirements.txt # Copy application COPY api/ /app/api/ # Health check HEALTHCHECK --interval=30s --timeout=3s \ CMD curl -f http://localhost:8000/health || exit 1 # Run API CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
Monitoring Setup
from specweave import setup_model_monitoring # Configures monitoring for production monitoring = setup_model_monitoring( model_name="recommendation-model", increment="0042", metrics=[ "prediction_latency", "throughput", "error_rate", "prediction_distribution", "feature_drift" ] ) # Creates: # - monitoring/ # ├── prometheus.yaml # ├── grafana-dashboard.json # ├── alerts.yaml # └── drift-detector.py
A/B Testing Infrastructure
from specweave import create_ab_test # Sets up A/B test framework ab_test = create_ab_test( control_model="model-v2.pkl", treatment_model="model-v3.pkl", traffic_split=0.1, # 10% to new model success_metric="click_through_rate", increment="0042" ) # Creates: # - ab-test/ # ├── router.py (traffic splitting) # ├── metrics.py (success tracking) # ├── statistical-tests.py (significance testing) # └── dashboard.py (real-time monitoring)
A/B Test Router:
import random def route_prediction(user_id, control_model, treatment_model): """Route to control or treatment based on user_id hash""" # Consistent hashing (same user always gets same model) user_bucket = hash(user_id) % 100 if user_bucket < 10: # 10% to treatment return treatment_model.predict(features), "treatment" else: return control_model.predict(features), "control"
Model Versioning
from specweave import ModelVersion # Register model version version = ModelVersion.register( model_path="models/model-v3.pkl", increment="0042", metadata={ "accuracy": 0.87, "training_date": "2024-01-15", "data_version": "v2024-01", "framework": "xgboost==1.7.0" } ) # Easy rollback if production_metrics["error_rate"] > threshold: ModelVersion.rollback(to_version="0042-v2")
Load Testing
from specweave import load_test_model # Benchmark model performance results = load_test_model( api_url="http://localhost:8000/predict", requests_per_second=[10, 50, 100, 500, 1000], duration_seconds=60, increment="0042" )
Output:
Load Test Results: ================== | RPS | Latency P50 | Latency P95 | Latency P99 | Error Rate | |------|-------------|-------------|-------------|------------| | 10 | 35ms | 45ms | 50ms | 0.00% | | 50 | 38ms | 52ms | 65ms | 0.00% | | 100 | 45ms | 70ms | 95ms | 0.02% | | 500 | 120ms | 250ms | 400ms | 1.20% | | 1000 | 350ms | 800ms | 1200ms | 8.50% | Recommendation: Deploy with max 100 RPS per instance Target: <100ms P95 latency (achieved at 100 RPS)
Deployment Commands
# Generate deployment artifacts /ml:deploy-prepare 0042 # Create API /ml:create-api --increment 0042 --framework fastapi # Setup monitoring /ml:setup-monitoring 0042 # Create A/B test /ml:create-ab-test --control v2 --treatment v3 --split 0.1 # Load test /ml:load-test 0042 --rps 100 --duration 60s # Deploy to production /ml:deploy 0042 --environment production
Deployment Increment
The skill creates a deployment increment:
.specweave/increments/0043-deploy-recommendation-model/ ├── spec.md (deployment requirements) ├── plan.md (deployment strategy) ├── tasks.md │ ├── [ ] Containerize model │ ├── [ ] Create API │ ├── [ ] Setup monitoring │ ├── [ ] Configure A/B test │ ├── [ ] Load test │ ├── [ ] Deploy to staging │ ├── [ ] Validate staging │ └── [ ] Deploy to production ├── api/ (FastAPI app) ├── monitoring/ (Grafana dashboards) ├── ab-test/ (A/B testing logic) └── load-tests/ (Performance benchmarks)
Best Practices
- Always load test before production
- Start with 1-5% traffic in A/B test
- Monitor model drift in production
- Version everything (model, data, code)
- Document rollback plan before deploying
- Set up alerts for anomalies
- Gradual rollout (canary deployment)
Integration with SpecWeave
# After training model (increment 0042) /sw:inc "0043-deploy-recommendation-model" # Generates deployment increment with all artifacts /sw:do # Deploy to production when ready /ml:deploy 0043 --environment production
Model deployment is not the end—it's the beginning of the MLOps lifecycle.