Claude-skill-registry ai-mlops

Production MLOps and ML/LLM/agent security skill for deploying and operating ML systems in production (registry + CI/CD, serving, monitoring/drift, evaluation loops, incident response/runbooks, and governance), including GenAI security (prompt injection, jailbreaks, RAG security, privacy, and supply chain).

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ai-mlops" ~/.claude/skills/majiayu000-claude-skill-registry-ai-mlops && rm -rf "$T"
manifest: skills/data/ai-mlops/SKILL.md
source content

MLOps & ML Security - Complete Reference (Jan 2026)

Production ML lifecycle with modern security practices.

This skill covers:

  • Production: Data ingestion, deployment, drift detection, monitoring, incident response
  • Security: Prompt injection, jailbreak defense, RAG security, output filtering
  • Governance: Privacy protection, supply chain security, safety evaluation
  1. Data ingestion (dlt): Load data from APIs, databases to warehouses
  2. Model deployment: Batch jobs, real-time APIs, hybrid systems, event-driven automation
  3. Operations: Real-time monitoring, drift detection, automated retraining, incident response

Modern Best Practices (Jan 2026):

It is execution-focused:

  • Data ingestion patterns (REST APIs, database replication, incremental loading)
  • Deployment patterns (batch, online, hybrid, streaming, event-driven)
  • Automated monitoring with real-time drift detection
  • Automated retraining pipelines (monitor → detect → trigger → validate → deploy)
  • Incident handling with validated rollback and postmortems
  • Links to copy-paste templates in
    assets/

Quick Reference

TaskTool/FrameworkCommandWhen to Use
Data Ingestiondlt (data load tool)
dlt pipeline run
,
dlt init
Loading from APIs, databases to warehouses
Batch DeploymentAirflow, Dagster, Prefect
airflow dags trigger
,
dagster job launch
Scheduled predictions on large datasets
API DeploymentFastAPI, Flask, TorchServe
uvicorn app:app
,
torchserve --start
Real-time inference (<500ms latency)
LLM ServingvLLM, TGI, BentoML
vllm serve model
,
bentoml serve
High-throughput LLM inference
Model RegistryMLflow, W&B, ZenML
mlflow.register_model()
,
zenml model register
Versioning and promoting models
Drift DetectionStatistical tests + monitorsPSI/KS, embedding drift, prediction driftDetect data/process changes and trigger review
MonitoringPrometheus, Grafana
prometheus.yml
, Grafana dashboards
Metrics, alerts, SLO tracking
AgentOpsAgentOps, Langfuse, LangSmith
agentops.init()
, trace visualization
AI agent observability, session replay
Incident ResponseRunbooks, PagerDutyDocumented playbooks, alert routingHandling failures and degradation

Use This Skill When

Use this skill when the user asks for deployment, operations, monitoring, incident handling, or governance for ML/LLM/agent systems, e.g.:

  • "How do I deploy this model to prod?"
  • "Design a batch + online scoring architecture."
  • "Add monitoring and drift detection to our model."
  • "Write an incident runbook for this ML service."
  • "Package this LLM/RAG pipeline as an API."
  • "Plan our retraining and promotion workflow."
  • "Load data from Stripe API to Snowflake."
  • "Set up incremental database replication with dlt."
  • "Build an ELT pipeline for warehouse loading."

If the user is asking only about EDA, modelling, or theory, prefer:

  • ai-ml-data-science
    (EDA, features, modelling, SQL transformation with SQLMesh)
  • ai-llm
    (prompting, fine-tuning, eval)
  • ai-rag
    (retrieval pipeline design)
  • ai-llm-inference
    (compression, spec decode, serving internals)

If the user is asking about SQL transformation (after data is loaded), prefer:

  • ai-ml-data-science
    (SQLMesh templates for staging, intermediate, marts layers)

Decision Tree: Choosing Deployment Strategy

User needs to deploy: [ML System]
    ├─ Data Ingestion?
    │   ├─ From REST APIs? → dlt REST API templates
    │   ├─ From databases? → dlt database sources (PostgreSQL, MySQL, MongoDB)
    │   └─ Incremental loading? → dlt incremental patterns (timestamp, ID-based)
    │
    ├─ Model Serving?
    │   ├─ Latency <500ms? → FastAPI real-time API
    │   ├─ Batch predictions? → Airflow/Dagster batch pipeline
    │   └─ Mix of both? → Hybrid (batch features + online scoring)
    │
    ├─ Monitoring & Ops?
    │   ├─ Drift detection? → Evidently + automated retraining triggers
    │   ├─ Performance tracking? → Prometheus + Grafana dashboards
    │   └─ Incident response? → Runbooks + PagerDuty alerts
    │
    └─ LLM/RAG Production?
        ├─ Cost optimization? → Caching, prompt templates, token budgets
        └─ Safety? → See ai-mlops skill

Core Concepts (Vendor-Agnostic)

  • Lifecycle loop: train → validate → deploy → monitor → respond → retrain/retire.
  • Risk controls: access control, data minimization, logging, and change management (NIST AI RMF: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf).
  • Observability planes: system metrics (latency/errors), data metrics (freshness/drift), quality metrics (model performance).
  • Incident readiness: detection, containment, rollback, and root-cause analysis.

Do / Avoid

Do

  • Do gate deployments with repeatable checks: evaluation pass, load test, security review, rollback plan.
  • Do version everything: code, data, features, model artifact, prompt templates, configuration.
  • Do define SLOs and budgets (latency/cost/error rate) before optimizing.

Avoid

  • Avoid manual “clickops” deployments without audit trail.
  • Avoid silent upgrades; require eval + canary for model/prompt changes.
  • Avoid drift dashboards without actions; every alert needs an owner and runbook.

Core Patterns Overview

This skill provides production-ready patterns and guides organized into comprehensive references:

Data & Infrastructure Patterns

Pattern 0: Data Contracts, Ingestion & Lineage → See Data Ingestion Patterns

  • Data contracts with SLAs and versioning
  • Ingestion modes (CDC, batch, streaming)
  • Lineage tracking and schema evolution
  • Replay and backfill procedures

Pattern 1: Choose Deployment Mode → See Deployment Patterns

  • Decision table (batch, online, hybrid, streaming)
  • When to use each mode
  • Deployment mode selection checklist

Pattern 2: Standard Deployment Lifecycle → See Deployment Lifecycle

  • Pre-deploy, deploy, observe, operate, evolve phases
  • Environment promotion (dev → staging → prod)
  • Gradual rollout strategies (canary, blue-green)

Pattern 3: Packaging & Model Registry → See Model Registry Patterns

  • Model registry structure and metadata
  • Packaging strategies (Docker, ONNX, MLflow)
  • Promotion flows (experimental → production)
  • Versioning and governance

Serving Patterns

Pattern 4: Batch Scoring Pipeline → See Deployment Patterns

  • Orchestration with Airflow/Dagster
  • Idempotent scoring jobs
  • Validation and backfill procedures

Pattern 5: Real-Time API Scoring → See API Design Patterns

  • Service design (HTTP/JSON, gRPC)
  • Input/output schemas
  • Rate limiting, timeouts, circuit breakers

Pattern 6: Hybrid & Feature Store Integration → See Feature Store Patterns

  • Batch vs online features
  • Feature store architecture
  • Training-serving consistency
  • Point-in-time correctness

Operations Patterns

Pattern 7: Monitoring & Alerting → See Monitoring Best Practices

  • Data, performance, and technical metrics
  • SLO definition and tracking
  • Dashboard design and alerting strategies

Pattern 8: Drift Detection & Automated Retraining → See Drift Detection Guide

  • Automated retraining triggers
  • Event-driven retraining pipelines

Pattern 9: Incidents & Runbooks → See Incident Response Playbooks

  • Common failure modes
  • Detection, diagnosis, resolution
  • Post-mortem procedures

Pattern 10: LLM / RAG in Production → See LLM & RAG Production Patterns

  • Prompt and configuration management
  • Safety and compliance (PII, jailbreaks)
  • Cost optimization (token budgets, caching)
  • Monitoring and fallbacks

Pattern 11: Cross-Region, Residency & Rollback → See Multi-Region Patterns

  • Multi-region deployment architectures
  • Data residency and tenant isolation
  • Disaster recovery and failover
  • Regional rollback procedures

Pattern 12: Online Evaluation & Feedback Loops → See Online Evaluation Patterns

  • Feedback signal collection (implicit, explicit)
  • Shadow and canary deployments
  • A/B testing with statistical significance
  • Human-in-the-loop labeling
  • Automated retraining cadence

Pattern 13: AgentOps (AI Agent Operations) → See AgentOps Patterns

  • Session tracing and replay for AI agents
  • Cost and latency tracking across agent runs
  • Multi-agent visualization and debugging
  • Tool invocation monitoring
  • Integration with CrewAI, LangGraph, OpenAI Agents SDK

Pattern 14: Edge MLOps & TinyML → See Edge MLOps Patterns

  • Device-aware CI/CD pipelines
  • OTA model updates with rollback
  • Federated learning operations
  • Edge drift detection
  • Intermittent connectivity handling

Resources (Detailed Guides)

For comprehensive operational guides, see:

Core Infrastructure:

Serving & APIs:

  • Deployment Patterns - Batch, online, hybrid, streaming deployment strategies and architectures
  • API Design Patterns - ML/LLM/RAG API patterns, input/output schemas, reliability patterns, versioning

Operations & Reliability:

Security & Governance:

Advanced Patterns:

Templates

Use these as copy-paste starting points for production artifacts:

Data Ingestion (dlt)

For loading data into warehouses and pipelines:

Use dlt when:

  • Loading data from APIs (Stripe, HubSpot, Shopify, custom APIs)
  • Replicating databases to warehouses
  • Building ELT pipelines with incremental loading
  • Managing data ingestion with Python

For SQL transformation (after ingestion), use:

ai-ml-data-science
skill (SQLMesh templates for staging/intermediate/marts layers)

Deployment & Packaging

Monitoring & Operations

Navigation

Resources

Templates

Data

External Resources

See

data/sources.json
for curated references on:

  • Serving frameworks (FastAPI, Flask, gRPC, TorchServe, KServe, Ray Serve)
  • Orchestration (Airflow, Dagster, Prefect)
  • Model registries and MLOps (MLflow, W&B, Vertex AI, Sagemaker)
  • Monitoring and observability (Prometheus, Grafana, OpenTelemetry, Evidently)
  • Feature stores (Feast, Tecton, Vertex, Databricks)
  • Streaming & messaging (Kafka, Pulsar, Kinesis)
  • LLMOps & RAG infra (vector DBs, LLM gateways, safety tools)

Data Lake & Lakehouse

For comprehensive data lake/lakehouse patterns (beyond dlt ingestion), see data-lake-platform:

  • Table formats: Apache Iceberg, Delta Lake, Apache Hudi
  • Query engines: ClickHouse, DuckDB, Apache Doris, StarRocks
  • Alternative ingestion: Airbyte (GUI-based connectors)
  • Transformation: dbt (alternative to SQLMesh)
  • Streaming: Apache Kafka patterns
  • Orchestration: Dagster, Airflow

This skill focuses on ML-specific deployment, monitoring, and security. Use data-lake-platform for general-purpose data infrastructure.

Recency Protocol (Tooling Recommendations)

When users ask recommendation questions about MLOps tooling, verify recency before answering.

Trigger Conditions

  • "What's the best MLOps platform for [use case]?"
  • "What should I use for [deployment/monitoring/drift detection]?"
  • "What's the latest in MLOps?"
  • "Current best practices for [model registry/feature store/observability]?"
  • "Is [MLflow/Kubeflow/Vertex AI] still relevant in 2026?"
  • "[MLOps tool A] vs [MLOps tool B]?"
  • "Best way to deploy [LLM/ML model] to production?"
  • "What feature store should I use?"

Minimal Recency Check

  1. Start from
    data/sources.json
    and prefer sources with
    add_as_web_search: true
    .
  2. If web search or browsing is available, confirm at least: (a) the tool’s latest release/docs date, (b) active maintenance signals, (c) a recent comparison/alternatives post.
  3. If live search is not available, state that you are relying on static knowledge +
    data/sources.json
    , and recommend validation steps (POC + evals + rollout plan).

What to Report

After searching, provide:

  • Current landscape: What MLOps tools/platforms are popular NOW
  • Emerging trends: New approaches gaining traction (LLMOps, GenAI ops)
  • Deprecated/declining: Tools or approaches losing relevance
  • Recommendation: Based on fresh data, not just static knowledge

Related Skills

For adjacent topics, reference these skills:

  • ai-ml-data-science - EDA, feature engineering, modelling, evaluation, SQLMesh transformations
  • ai-llm - Prompting, fine-tuning, evaluation for LLMs
  • ai-agents - Agentic workflows, multi-agent systems, LLMOps
  • ai-rag - RAG pipeline design, chunking, retrieval, evaluation
  • ai-llm-inference - Model serving optimization, quantization, batching
  • ai-prompt-engineering - Prompt design patterns and best practices
  • data-lake-platform - Data lake/lakehouse infrastructure (ClickHouse, Iceberg, Kafka)

Use this skill to turn trained models into reliable services, not to derive the model itself.