Software_development_department mlops-engineer

Provides MLOps patterns for ML CI/CD pipelines, model registries, monitoring, and data drift detection. Use when setting up ML infrastructure or when the user mentions MLOps, model deployment, ML pipeline, or model monitoring.

install

source · Clone the upstream repo

git clone https://github.com/tranhieutt/software_development_department

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/tranhieutt/software_development_department "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/mlops-engineer" ~/.claude/skills/tranhieutt-software-development-department-mlops-engineer && rm -rf "$T"

manifest: .claude/skills/mlops-engineer/SKILL.md

source content

MLOps Engineer

Tool selection matrix

Need	Tool	When to use
Experiment tracking	MLflow	Open-source, self-hosted
Experiment tracking	W&B	Cloud, rich visualization
Pipeline orchestration	Kubeflow	Kubernetes-native
Pipeline orchestration	Prefect	Python-first, dynamic
Data version control	DVC	Git-based datasets & models
Feature store	Feast	Open-source, online+offline
Model serving	KServe	K8s serverless inference
Model serving	SageMaker Endpoints	AWS managed
Monitoring / drift	Evidently	Open-source, alerting
CI/CD for ML	GitHub Actions + DVC	Lightweight

MLflow: experiment tracking + model registry

import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("model-training")

with mlflow.start_run():
    # Log params
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 5)

    # Train
    model = train(X_train, y_train)
    metrics = evaluate(model, X_test, y_test)

    # Log metrics
    mlflow.log_metric("accuracy", metrics["accuracy"])
    mlflow.log_metric("f1", metrics["f1"])

    # Log model + register
    mlflow.sklearn.log_model(
        model, "model",
        registered_model_name="fraud-detector",
    )

# Promote to production via API
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="fraud-detector", version=3, stage="Production"
)

GitHub Actions: ML CI/CD pipeline

name: ML Pipeline
on:
  push:
    paths: ["data/**", "src/**", "params.yaml"]

jobs:
  train-and-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: iterative/setup-dvc@v1

      - name: Pull data
        run: dvc pull

      - name: Run training pipeline
        run: dvc repro

      - name: Validate model metrics
        run: |
          python scripts/check_metrics.py \
            --min-accuracy 0.92 \
            --min-f1 0.88

      - name: Register model if metrics pass
        if: github.ref == 'refs/heads/main'
        run: python scripts/register_model.py
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}

Model serving: FastAPI + model registry

from fastapi import FastAPI
import mlflow.pyfunc
import os

app = FastAPI()
MODEL_NAME = os.environ["MODEL_NAME"]
MODEL_STAGE = os.environ.get("MODEL_STAGE", "Production")

# Load once on startup (cold start cost paid once)
model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}/{MODEL_STAGE}")

@app.post("/predict")
async def predict(features: dict):
    import pandas as pd
    df = pd.DataFrame([features])
    predictions = model.predict(df)
    return {"predictions": predictions.tolist()}

@app.get("/health")
async def health():
    return {"status": "healthy", "model": MODEL_NAME, "stage": MODEL_STAGE}

Data drift monitoring (Evidently)

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
import pandas as pd

def check_drift(reference_data: pd.DataFrame, production_data: pd.DataFrame) -> dict:
    report = Report(metrics=[DataDriftPreset()])
    report.run(reference_data=reference_data, current_data=production_data)
    result = report.as_dict()

    drift_detected = result["metrics"][0]["result"]["dataset_drift"]
    drifted_features = [
        f for f, v in result["metrics"][0]["result"]["drift_by_columns"].items()
        if v["drift_detected"]
    ]
    return {"drift_detected": drift_detected, "drifted_features": drifted_features}

# Trigger retraining if drift detected
if check_drift(ref, prod)["drift_detected"]:
    trigger_retraining_pipeline()

Critical rules (non-obvious)

Separate training and serving environments — training deps (torch, cuda) bloat serving images by 10x; use multi-stage Dockerfiles or separate images
Pin all dependencies — ML stack changes break reproducibility; pin Python + all packages, freeze with
```
pip freeze
```
not just
```
requirements.txt
```
Log everything before filtering — never decide what metrics to log during training; log all, filter in dashboards
Separate model config from code —
```
params.yaml
```
(DVC) or
```
config.yaml
```
for hyperparameters; never hardcode in training scripts
Shadow mode before cutover — run new model version in parallel (shadow traffic), compare outputs before switching production

DVC pipeline (dvc.yaml)

stages:
  preprocess:
    cmd: python src/preprocess.py
    deps: [src/preprocess.py, data/raw/]
    outs: [data/processed/]
    params: [params.yaml:preprocess]

  train:
    cmd: python src/train.py
    deps: [src/train.py, data/processed/]
    outs: [models/model.pkl]
    params: [params.yaml:train]
    metrics: [metrics/train.json]

  evaluate:
    cmd: python src/evaluate.py
    deps: [src/evaluate.py, models/model.pkl, data/processed/]
    metrics: [metrics/eval.json]