Auto-deep-researcher-24x7 gpu-monitor
Check GPU status, running experiments, and available resources
install
source · Clone the upstream repo
git clone https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7 "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/gpu-monitor" ~/.claude/skills/xiangyue-zhang-auto-deep-researcher-24x7-gpu-monitor && rm -rf "$T"
manifest:
skills/gpu-monitor/SKILL.mdsource content
/gpu-monitor
Quick GPU status check for experiment management.
Usage
/gpu-monitor /gpu-monitor --server user@remote-host
Behavior
- Run
to get current GPU statusnvidia-smi - Display a clean summary table:
- GPU ID, Name, Memory (used/total), Utilization %, Temperature
- Running processes on each GPU
- Identify which GPUs are free (< 1GB memory used)
- Identify which GPUs are running experiments (check for python/torchrun processes)
- If
is provided, SSH to remote server first--server
Output Format
GPU Status ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GPU Name Memory Util Temp 0 L20X 144GB 45123/147456 98% 72°C ← training (PID 12345) 1 L20X 144GB 234/147456 0% 35°C ← FREE 2 L20X 144GB 43210/147456 95% 70°C ← training (PID 12346) 3 L20X 144GB 1024/147456 12% 40°C ← keeper Free GPUs: [1] Training: GPU 0 (PID 12345), GPU 2 (PID 12346)