Claude-skill-registry-data ml-api-endpoint
Эксперт ML API. Используй для model serving, inference endpoints, FastAPI и ML deployment.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/ml-api-endpoint" ~/.claude/skills/majiayu000-claude-skill-registry-data-ml-api-endpoint && rm -rf "$T"
manifest:
data/ml-api-endpoint/SKILL.mdsource content
ML API Endpoint Expert
Expert in designing and deploying machine learning API endpoints.
Core Principles
API Design
- Stateless Design: Each request contains all necessary information
- Consistent Response Format: Standardize success/error structures
- Versioning Strategy: Plan for model updates
- Input Validation: Rigorous validation before inference
FastAPI Implementation
Basic ML Endpoint
from fastapi import FastAPI, HTTPException from pydantic import BaseModel, validator import joblib import numpy as np app = FastAPI(title="ML Model API", version="1.0.0") model = None @app.on_event("startup") async def load_model(): global model model = joblib.load("model.pkl") class PredictionInput(BaseModel): features: list[float] @validator('features') def validate_features(cls, v): if len(v) != 10: raise ValueError('Expected 10 features') return v class PredictionResponse(BaseModel): prediction: float confidence: float | None = None model_version: str request_id: str @app.post("/predict", response_model=PredictionResponse) async def predict(input_data: PredictionInput): features = np.array([input_data.features]) prediction = model.predict(features)[0] return PredictionResponse( prediction=float(prediction), model_version="v1", request_id=generate_request_id() )
Batch Prediction
class BatchInput(BaseModel): instances: list[list[float]] @validator('instances') def validate_batch_size(cls, v): if len(v) > 100: raise ValueError('Batch size cannot exceed 100') return v @app.post("/predict/batch") async def batch_predict(input_data: BatchInput): features = np.array(input_data.instances) predictions = model.predict(features) return { "predictions": predictions.tolist(), "count": len(predictions) }
Performance Optimization
Model Caching
class ModelCache: def __init__(self, ttl_seconds=300): self.cache = {} self.ttl = ttl_seconds def get(self, features): key = hashlib.md5(str(features).encode()).hexdigest() if key in self.cache: result, timestamp = self.cache[key] if time.time() - timestamp < self.ttl: return result return None def set(self, features, prediction): key = hashlib.md5(str(features).encode()).hexdigest() self.cache[key] = (prediction, time.time())
Health Checks
@app.get("/health") async def health_check(): return { "status": "healthy", "model_loaded": model is not None } @app.get("/metrics") async def get_metrics(): return { "requests_total": request_counter, "prediction_latency_avg": avg_latency, "error_rate": error_rate }
Docker Deployment
FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Best Practices
- Use async/await for I/O operations
- Validate data types, ranges, and business rules
- Cache predictions for deterministic models
- Handle model failures with fallback responses
- Log predictions, latencies, and errors
- Support multiple model versions
- Set memory and CPU limits