AutoResearchClaw experimental-design
Best practices for designing reproducible ML experiments. Use when planning ablations, baselines, or controlled experiments.
install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/AutoResearchClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/experiment/experimental-design" ~/.claude/skills/aiming-lab-autoresearchclaw-experimental-design && rm -rf "$T"
manifest:
researchclaw/skills/builtin/experiment/experimental-design/SKILL.mdsource content
Experimental Design Best Practice
- ALWAYS include meaningful baselines (not just random):
- At least one classical method baseline
- At least one recent SOTA method baseline
- A simple-but-strong baseline (e.g., linear probe, k-NN)
- Use MULTIPLE random seeds (minimum 3, ideally 5)
- Report mean +/- std across seeds
- Design ablations that isolate EACH key component:
- Remove one component at a time
- Each ablation must be meaningfully different from baseline
- Control variables: change only ONE thing per comparison
- Use standard splits (train/val/test) — never test on training data
- Report wall-clock time and memory usage alongside accuracy