Vibe-Skills splitting-datasets

install
source · Clone the upstream repo
git clone https://github.com/foryourhealth111-pixel/Vibe-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/foryourhealth111-pixel/Vibe-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bundled/skills/splitting-datasets" ~/.claude/skills/foryourhealth111-pixel-vibe-skills-splitting-datasets && rm -rf "$T"
manifest: bundled/skills/splitting-datasets/SKILL.md
source content

Dataset Splitter

Positioning

Treat this skill as a narrow helper for partition strategy.

When to Use

Use this skill when:

  • Prepare a dataset for machine learning model training.
  • Create training, validation, and testing sets.
  • Partition data to evaluate model performance.

Not For / Boundaries

  • Full preprocessing-pipeline ownership: use
    preprocessing-data-with-automated-pipelines
  • Leakage audits and prediction-time checks: use
    ml-data-leakage-guard
  • Model training and tuning after the split: use
    training-machine-learning-models

Typical Outputs

  • Partition strategy with ratios, random seeds, and stratification rules
  • Notes on temporal or grouped split constraints
  • Handoff guidance for leakage review and downstream training

Related Skills

  • preprocessing-data-with-automated-pipelines
    for the broader preprocessing sequence
  • ml-data-leakage-guard
    to verify the split does not leak future or test information