AutoResearchClaw data-loading
Optimize data loading pipeline to prevent GPU starvation. Use when setting up DataLoader or data preprocessing.
install
source · Clone the upstream repo
git clone https://github.com/aiming-lab/AutoResearchClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiming-lab/AutoResearchClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/researchclaw/skills/builtin/tooling/data-loading" ~/.claude/skills/aiming-lab-autoresearchclaw-data-loading && rm -rf "$T"
manifest:
researchclaw/skills/builtin/tooling/data-loading/SKILL.mdsource content
Efficient Data Loading Best Practice
- Use num_workers = min(8, os.cpu_count()) for DataLoader
- Enable pin_memory=True when using GPU
- Use persistent_workers=True to avoid re-spawning
- Pre-compute and cache transformations when possible
- For image data: use torchvision.transforms.v2 (faster)
- For large datasets: consider memory-mapped files or WebDataset
- Profile with torch.utils.bottleneck to find I/O bottlenecks