Software_development_department mlops-engineer
Provides MLOps patterns for ML CI/CD pipelines, model registries, monitoring, and data drift detection. Use when setting up ML infrastructure or when the user mentions MLOps, model deployment, ML pipeline, or model monitoring.
install
source · Clone the upstream repo
git clone https://github.com/tranhieutt/software_development_department
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tranhieutt/software_development_department "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/mlops-engineer" ~/.claude/skills/tranhieutt-software-development-department-mlops-engineer && rm -rf "$T"
manifest:
.claude/skills/mlops-engineer/SKILL.mdsource content
MLOps Engineer
Tool selection matrix
| Need | Tool | When to use |
|---|---|---|
| Experiment tracking | MLflow | Open-source, self-hosted |
| Experiment tracking | W&B | Cloud, rich visualization |
| Pipeline orchestration | Kubeflow | Kubernetes-native |
| Pipeline orchestration | Prefect | Python-first, dynamic |
| Data version control | DVC | Git-based datasets & models |
| Feature store | Feast | Open-source, online+offline |
| Model serving | KServe | K8s serverless inference |
| Model serving | SageMaker Endpoints | AWS managed |
| Monitoring / drift | Evidently | Open-source, alerting |
| CI/CD for ML | GitHub Actions + DVC | Lightweight |
MLflow: experiment tracking + model registry
import mlflow import mlflow.sklearn mlflow.set_tracking_uri("http://mlflow-server:5000") mlflow.set_experiment("model-training") with mlflow.start_run(): # Log params mlflow.log_param("n_estimators", 100) mlflow.log_param("max_depth", 5) # Train model = train(X_train, y_train) metrics = evaluate(model, X_test, y_test) # Log metrics mlflow.log_metric("accuracy", metrics["accuracy"]) mlflow.log_metric("f1", metrics["f1"]) # Log model + register mlflow.sklearn.log_model( model, "model", registered_model_name="fraud-detector", ) # Promote to production via API client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="fraud-detector", version=3, stage="Production" )
GitHub Actions: ML CI/CD pipeline
name: ML Pipeline on: push: paths: ["data/**", "src/**", "params.yaml"] jobs: train-and-validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: iterative/setup-dvc@v1 - name: Pull data run: dvc pull - name: Run training pipeline run: dvc repro - name: Validate model metrics run: | python scripts/check_metrics.py \ --min-accuracy 0.92 \ --min-f1 0.88 - name: Register model if metrics pass if: github.ref == 'refs/heads/main' run: python scripts/register_model.py env: MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
Model serving: FastAPI + model registry
from fastapi import FastAPI import mlflow.pyfunc import os app = FastAPI() MODEL_NAME = os.environ["MODEL_NAME"] MODEL_STAGE = os.environ.get("MODEL_STAGE", "Production") # Load once on startup (cold start cost paid once) model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}/{MODEL_STAGE}") @app.post("/predict") async def predict(features: dict): import pandas as pd df = pd.DataFrame([features]) predictions = model.predict(df) return {"predictions": predictions.tolist()} @app.get("/health") async def health(): return {"status": "healthy", "model": MODEL_NAME, "stage": MODEL_STAGE}
Data drift monitoring (Evidently)
from evidently.report import Report from evidently.metric_preset import DataDriftPreset import pandas as pd def check_drift(reference_data: pd.DataFrame, production_data: pd.DataFrame) -> dict: report = Report(metrics=[DataDriftPreset()]) report.run(reference_data=reference_data, current_data=production_data) result = report.as_dict() drift_detected = result["metrics"][0]["result"]["dataset_drift"] drifted_features = [ f for f, v in result["metrics"][0]["result"]["drift_by_columns"].items() if v["drift_detected"] ] return {"drift_detected": drift_detected, "drifted_features": drifted_features} # Trigger retraining if drift detected if check_drift(ref, prod)["drift_detected"]: trigger_retraining_pipeline()
Critical rules (non-obvious)
- Separate training and serving environments — training deps (torch, cuda) bloat serving images by 10x; use multi-stage Dockerfiles or separate images
- Pin all dependencies — ML stack changes break reproducibility; pin Python + all packages, freeze with
not justpip freezerequirements.txt - Log everything before filtering — never decide what metrics to log during training; log all, filter in dashboards
- Separate model config from code —
(DVC) orparams.yaml
for hyperparameters; never hardcode in training scriptsconfig.yaml - Shadow mode before cutover — run new model version in parallel (shadow traffic), compare outputs before switching production
DVC pipeline (dvc.yaml)
stages: preprocess: cmd: python src/preprocess.py deps: [src/preprocess.py, data/raw/] outs: [data/processed/] params: [params.yaml:preprocess] train: cmd: python src/train.py deps: [src/train.py, data/processed/] outs: [models/model.pkl] params: [params.yaml:train] metrics: [metrics/train.json] evaluate: cmd: python src/evaluate.py deps: [src/evaluate.py, models/model.pkl, data/processed/] metrics: [metrics/eval.json]