Auto-deep-researcher-24x7 gpu-monitor

Name: gpu-monitor
Author: Xiangyue-Zhang

Check GPU status, running experiments, and available resources

install

source · Clone the upstream repo

git clone https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Xiangyue-Zhang/auto-deep-researcher-24x7 "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/gpu-monitor" ~/.claude/skills/xiangyue-zhang-auto-deep-researcher-24x7-gpu-monitor && rm -rf "$T"

manifest: skills/gpu-monitor/SKILL.md

source content

/gpu-monitor

Quick GPU status check for experiment management.

Usage

/gpu-monitor
/gpu-monitor --server user@remote-host

Behavior

Run
```
nvidia-smi
```
to get current GPU status
Display a clean summary table:
- GPU ID, Name, Memory (used/total), Utilization %, Temperature
- Running processes on each GPU
Identify which GPUs are free (< 1GB memory used)
Identify which GPUs are running experiments (check for python/torchrun processes)
If
```
--server
```
is provided, SSH to remote server first

Output Format

GPU Status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GPU  Name          Memory         Util  Temp
  0   L20X 144GB    45123/147456   98%   72°C  ← training (PID 12345)
  1   L20X 144GB      234/147456    0%   35°C  ← FREE
  2   L20X 144GB    43210/147456   95%   70°C  ← training (PID 12346)
  3   L20X 144GB     1024/147456   12%   40°C  ← keeper

Free GPUs: [1]
Training: GPU 0 (PID 12345), GPU 2 (PID 12346)