AutoResearchClaw data-loading

Optimize data loading pipeline to prevent GPU starvation. Use when setting up DataLoader or data preprocessing.

install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/AutoResearchClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/tooling/data-loading" ~/.claude/skills/aiming-lab-autoresearchclaw-data-loading && rm -rf "$T"
manifest: researchclaw/skills/builtin/tooling/data-loading/SKILL.md
source content

Efficient Data Loading Best Practice

  1. Use num_workers = min(8, os.cpu_count()) for DataLoader
  2. Enable pin_memory=True when using GPU
  3. Use persistent_workers=True to avoid re-spawning
  4. Pre-compute and cache transformations when possible
  5. For image data: use torchvision.transforms.v2 (faster)
  6. For large datasets: consider memory-mapped files or WebDataset
  7. Profile with torch.utils.bottleneck to find I/O bottlenecks