Vibe-Skills splitting-datasets
install
source · Clone the upstream repo
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/splitting-datasets" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-splitting-datasets && rm -rf "$T"
manifest:
bundled/skills/splitting-datasets/SKILL.mdsource content
Dataset Splitter
Positioning
Treat this skill as a narrow helper for partition strategy.
When to Use
Use this skill when:
- Prepare a dataset for machine learning model training.
- Create training, validation, and testing sets.
- Partition data to evaluate model performance.
Not For / Boundaries
- Full preprocessing-pipeline ownership: use
preprocessing-data-with-automated-pipelines - Leakage audits and prediction-time checks: use
ml-data-leakage-guard - Model training and tuning after the split: use
training-machine-learning-models
Typical Outputs
- Partition strategy with ratios, random seeds, and stratification rules
- Notes on temporal or grouped split constraints
- Handoff guidance for leakage review and downstream training
Related Skills
for the broader preprocessing sequencepreprocessing-data-with-automated-pipelines
to verify the split does not leak future or test informationml-data-leakage-guard