Babysitter dvc-dataset-versioning
Dataset versioning skill using DVC for tracking data changes, managing data pipelines, and ensuring reproducibility.
install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-science-ml/skills/dvc-dataset-versioning" ~/.claude/skills/a5c-ai-babysitter-dvc-dataset-versioning && rm -rf "$T"
manifest:
library/specializations/data-science-ml/skills/dvc-dataset-versioning/SKILL.mdsource content
dvc-dataset-versioning
Overview
Dataset versioning skill using DVC (Data Version Control) for tracking data changes, managing data pipelines, and ensuring reproducibility in ML workflows.
Capabilities
- Dataset version tracking
- Data pipeline definition and execution
- Remote storage management (S3, GCS, Azure, etc.)
- Reproducibility enforcement
- Data lineage tracking
- Experiment comparison with data versions
- Cache management for large datasets
Target Processes
- Data Collection and Validation Pipeline
- ML Model Retraining Pipeline
- Feature Store Implementation
Tools and Libraries
- DVC
- Git
- Remote storage SDKs (boto3, google-cloud-storage, etc.)
Input Schema
{ "type": "object", "required": ["action"], "properties": { "action": { "type": "string", "enum": ["init", "add", "push", "pull", "diff", "checkout", "run", "repro"], "description": "DVC action to perform" }, "paths": { "type": "array", "items": { "type": "string" }, "description": "File or directory paths to track" }, "remote": { "type": "string", "description": "Remote storage name" }, "revision": { "type": "string", "description": "Git revision for checkout/diff" }, "pipeline": { "type": "object", "description": "Pipeline stage definition for run action" } } }
Output Schema
{ "type": "object", "required": ["status", "action"], "properties": { "status": { "type": "string", "enum": ["success", "error"] }, "action": { "type": "string" }, "trackedFiles": { "type": "array", "items": { "type": "string" } }, "changes": { "type": "array", "items": { "type": "object", "properties": { "path": { "type": "string" }, "status": { "type": "string" }, "hash": { "type": "string" } } } }, "remote": { "type": "object", "properties": { "name": { "type": "string" }, "url": { "type": "string" }, "syncStatus": { "type": "string" } } } } }
Usage Example
{ kind: 'skill', title: 'Version training dataset', skill: { name: 'dvc-dataset-versioning', context: { action: 'add', paths: ['data/train.csv', 'data/test.csv'], remote: 's3-bucket' } } }