Awesome-omni-skills ml-pipeline-workflow

ML Pipeline Workflow workflow skill. Use this skill when the user needs Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment and the operator should preserve the upstream workflow, copied support files, and provenance before merging or handing off.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ml-pipeline-workflow" ~/.claude/skills/diegosouzapw-awesome-omni-skills-ml-pipeline-workflow && rm -rf "$T"

manifest: skills/ml-pipeline-workflow/SKILL.md

ML Pipeline Workflow

Overview

This public intake copy packages

plugins/antigravity-awesome-skills-claude/skills/ml-pipeline-workflow

from

https://github.com/sickn33/antigravity-awesome-skills

into the native Omni Skills editorial shape without hiding its origin.

Use it when the operator needs the upstream workflow, support files, and repository context to stay intact while the public validator and private enhancer continue their normal downstream flow.

This intake keeps the copied upstream files intact and uses

metadata.json

plus

ORIGIN.md

as the provenance anchor for review.

ML Pipeline Workflow Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.

Imported source sections that did not map cleanly to the public headings are still preserved below or in the support files. Notable imported sections: What This Skill Provides, Integration Points, Progressive Disclosure, Common Patterns, Limitations.

When to Use This Skill

Use this section as the trigger filter. It should make the activation boundary explicit before the operator loads files, runs commands, or opens a pull request.

The task is unrelated to ml pipeline workflow
You need a different domain or tool outside this scope
Building new ML pipelines from scratch
Designing workflow orchestration for ML systems
Implementing data → model → deployment automation
Setting up reproducible training workflows

Operating Table

Situation	Start here	Why it matters
First-time use	`metadata.json`	Confirms repository, branch, commit, and imported path before touching the copied workflow
Provenance review	`ORIGIN.md`	Gives reviewers a plain-language audit trail for the imported source
Workflow execution	`SKILL.md`	Starts with the smallest copied file that materially changes execution
Supporting context	`SKILL.md`	Adds the next most relevant copied source file without loading the entire package
Handoff decision	`## Related Skills`	Helps the operator switch to a stronger native skill when the task drifts

Workflow

This workflow is intentionally editorial and operational at the same time. It keeps the imported source useful to the operator while still satisfying the public intake standards that feed the downstream enhancer flow.

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.
If detailed examples are required, open resources/implementation-playbook.md.
Explore hyperparameter-tuning skill for optimization
Learn experiment-tracking-setup for MLflow/W&B
Review model-deployment-patterns for serving strategies

Imported Workflow Notes

Imported: Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.
If detailed examples are required, open
```
resources/implementation-playbook.md
```
.

Imported: Next Steps

After setting up your pipeline:

Explore hyperparameter-tuning skill for optimization
Learn experiment-tracking-setup for MLflow/W&B
Review model-deployment-patterns for serving strategies
Implement monitoring with observability tools

Imported: Overview

This skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring.

Imported: What This Skill Provides

Core Capabilities

Pipeline Architecture
- End-to-end workflow design
- DAG orchestration patterns (Airflow, Dagster, Kubeflow)
- Component dependencies and data flow
- Error handling and retry strategies
Data Preparation
- Data validation and quality checks
- Feature engineering pipelines
- Data versioning and lineage
- Train/validation/test splitting strategies
Model Training
- Training job orchestration
- Hyperparameter management
- Experiment tracking integration
- Distributed training patterns
Model Validation
- Validation frameworks and metrics
- A/B testing infrastructure
- Performance regression detection
- Model comparison workflows
Deployment Automation
- Model serving patterns
- Canary deployments
- Blue-green deployment strategies
- Rollback mechanisms

Reference Documentation

See the

references/

directory for detailed guides:

data-preparation.md - Data cleaning, validation, and feature engineering
model-training.md - Training workflows and best practices
model-validation.md - Validation strategies and metrics
model-deployment.md - Deployment patterns and serving architectures

Assets and Templates

The

assets/

directory contains:

pipeline-dag.yaml.template - DAG template for workflow orchestration
training-config.yaml - Training configuration template
validation-checklist.md - Pre-deployment validation checklist

Examples

Example 1: Ask for the upstream workflow directly

Use @ml-pipeline-workflow to handle <task>. Start from the copied upstream workflow, load only the files that change the outcome, and keep provenance visible in the answer.

Explanation: This is the safest starting point when the operator needs the imported workflow, but not the entire repository.

Example 2: Ask for a provenance-grounded review

Review @ml-pipeline-workflow against metadata.json and ORIGIN.md, then explain which copied upstream files you would load first and why.

Explanation: Use this before review or troubleshooting when you need a precise, auditable explanation of origin and file selection.

Example 3: Narrow the copied support files before execution

Use @ml-pipeline-workflow for <task>. Load only the copied references, examples, or scripts that change the outcome, and name the files explicitly before proceeding.

Explanation: This keeps the skill aligned with progressive disclosure instead of loading the whole copied package by default.

Example 4: Build a reviewer packet

Review @ml-pipeline-workflow using the copied upstream files plus provenance, then summarize any gaps before merge.

Explanation: This is useful when the PR is waiting for human review and you want a repeatable audit packet.

Imported Usage Notes

Imported: Usage Patterns

Basic Pipeline Setup

# 1. Define pipeline stages
stages = [
    "data_ingestion",
    "data_validation",
    "feature_engineering",
    "model_training",
    "model_validation",
    "model_deployment"
]

# 2. Configure dependencies
# See assets/pipeline-dag.yaml.template for full example

Production Workflow

Data Preparation Phase
- Ingest raw data from sources
- Run data quality checks
- Apply feature transformations
- Version processed datasets
Training Phase
- Load versioned training data
- Execute training jobs
- Track experiments and metrics
- Save trained models
Validation Phase
- Run validation test suite
- Compare against baseline
- Generate performance reports
- Approve for deployment
Deployment Phase
- Package model artifacts
- Deploy to serving infrastructure
- Configure monitoring
- Validate production traffic

Best Practices

Treat the generated public skill as a reviewable packaging layer around the upstream repository. The goal is to keep provenance explicit and load only the copied source material that materially improves execution.

Modularity: Each stage should be independently testable
Idempotency: Re-running stages should be safe
Observability: Log metrics at every stage
Versioning: Track data, code, and model versions
Failure Handling: Implement retry logic and alerting
Use data validation libraries (Great Expectations, TFX)
Version datasets with DVC or similar tools

Imported Operating Notes

Imported: Best Practices

Pipeline Design

Modularity: Each stage should be independently testable
Idempotency: Re-running stages should be safe
Observability: Log metrics at every stage
Versioning: Track data, code, and model versions
Failure Handling: Implement retry logic and alerting

Data Management

Use data validation libraries (Great Expectations, TFX)
Version datasets with DVC or similar tools
Document feature engineering transformations
Maintain data lineage tracking

Model Operations

Separate training and serving infrastructure
Use model registries (MLflow, Weights & Biases)
Implement gradual rollouts for new models
Monitor model performance drift
Maintain rollback capabilities

Deployment Strategies

Start with shadow deployments
Use canary releases for validation
Implement A/B testing infrastructure
Set up automated rollback triggers
Monitor latency and throughput

Troubleshooting

Problem: The operator skipped the imported context and answered too generically

Symptoms: The result ignores the upstream workflow in

plugins/antigravity-awesome-skills-claude/skills/ml-pipeline-workflow

, fails to mention provenance, or does not use any copied source files at all. Solution: Re-open

metadata.json

ORIGIN.md

, and the most relevant copied upstream files. Load only the files that materially change the answer, then restate the provenance before continuing.

Problem: The imported workflow feels incomplete during review

Symptoms: Reviewers can see the generated

SKILL.md

, but they cannot quickly tell which references, examples, or scripts matter for the current task. Solution: Point at the exact copied references, examples, scripts, or assets that justify the path you took. If the gap is still real, record it in the PR instead of hiding it.

Problem: The task drifted into a different specialization

Symptoms: The imported skill starts in the right place, but the work turns into debugging, architecture, design, security, or release orchestration that a native skill handles better. Solution: Use the related skills section to hand off deliberately. Keep the imported provenance visible so the next skill inherits the right context instead of starting blind.

Imported Troubleshooting Notes

Imported: Troubleshooting

Common Issues

Pipeline failures: Check dependencies and data availability
Training instability: Review hyperparameters and data quality
Deployment issues: Validate model artifacts and serving config
Performance degradation: Monitor data drift and model metrics

Debugging Steps

Check pipeline logs for each stage
Validate input/output data at boundaries
Test components in isolation
Review experiment tracking metrics
Inspect model artifacts and metadata

Related Skills

```
@linear-claude-skill
```
- Use when the work is better handled by that native specialization after this imported skill establishes context.
```
@linkedin-automation
```
- Use when the work is better handled by that native specialization after this imported skill establishes context.
```
@linkedin-cli
```
- Use when the work is better handled by that native specialization after this imported skill establishes context.
```
@linkedin-profile-optimizer
```
- Use when the work is better handled by that native specialization after this imported skill establishes context.

Additional Resources

Use this support matrix and the linked files below as the operator packet for this imported skill. They should reflect real copied source material, not generic scaffolding.

Resource family	What it gives the reviewer	Example path
`references`	copied reference notes, guides, or background material from upstream	`references/n/a`
`examples`	worked examples or reusable prompts copied from upstream	`examples/n/a`
`scripts`	upstream helper scripts that change execution or validation	`scripts/n/a`
`agents`	routing or delegation notes that are genuinely part of the imported package	`agents/n/a`
`assets`	supporting assets or schemas copied from the source package	`assets/n/a`

Imported Reference Notes

Imported: Integration Points

Orchestration Tools

Apache Airflow: DAG-based workflow orchestration
Dagster: Asset-based pipeline orchestration
Kubeflow Pipelines: Kubernetes-native ML workflows
Prefect: Modern dataflow automation

Experiment Tracking

MLflow for experiment tracking and model registry
Weights & Biases for visualization and collaboration
TensorBoard for training metrics

Deployment Platforms

AWS SageMaker for managed ML infrastructure
Google Vertex AI for GCP deployments
Azure ML for Azure cloud
Kubernetes + KServe for cloud-agnostic serving

Imported: Progressive Disclosure

Start with the basics and gradually add complexity:

Level 1: Simple linear pipeline (data → train → deploy)
Level 2: Add validation and monitoring stages
Level 3: Implement hyperparameter tuning
Level 4: Add A/B testing and gradual rollouts
Level 5: Multi-model pipelines with ensemble strategies

Imported: Common Patterns

Batch Training Pipeline

# See assets/pipeline-dag.yaml.template
stages:
  - name: data_preparation
    dependencies: []
  - name: model_training
    dependencies: [data_preparation]
  - name: model_evaluation
    dependencies: [model_training]
  - name: model_deployment
    dependencies: [model_evaluation]

Real-time Feature Pipeline

# Stream processing for real-time features
# Combined with batch training
# See references/data-preparation.md

Continuous Training

# Automated retraining on schedule
# Triggered by data drift detection
# See references/model-training.md

Imported: Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.