AutoResearchClaw experimental-design

Best practices for designing reproducible ML experiments. Use when planning ablations, baselines, or controlled experiments.

install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/AutoResearchClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/experiment/experimental-design" ~/.claude/skills/aiming-lab-autoresearchclaw-experimental-design && rm -rf "$T"
manifest: researchclaw/skills/builtin/experiment/experimental-design/SKILL.md
source content

Experimental Design Best Practice

  1. ALWAYS include meaningful baselines (not just random):
    • At least one classical method baseline
    • At least one recent SOTA method baseline
    • A simple-but-strong baseline (e.g., linear probe, k-NN)
  2. Use MULTIPLE random seeds (minimum 3, ideally 5)
  3. Report mean +/- std across seeds
  4. Design ablations that isolate EACH key component:
    • Remove one component at a time
    • Each ablation must be meaningfully different from baseline
  5. Control variables: change only ONE thing per comparison
  6. Use standard splits (train/val/test) — never test on training data
  7. Report wall-clock time and memory usage alongside accuracy