Auto-claude-code-research-in-sleep run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says \"run experiment\", \"deploy to server\", \"\u8dd1\u5b9e\u9a8c\", or needs to launch training jobs.

install

source · Clone the upstream repo

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/skills-codex/run-experiment" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-run-experiment-11dea4 && rm -rf "$T"

manifest: skills/skills-codex/run-experiment/SKILL.md

source content

Run Experiment

Deploy and run ML experiment: $ARGUMENTS

Workflow

Step 1: Detect Environment

Read the project's

AGENTS.md

to determine the experiment environment:

Local GPU: Look for local CUDA/MPS setup info
Remote server: Look for SSH alias, conda env, code directory

If no server info is found in

AGENTS.md

, ask the user.

Step 2: Pre-flight Check

Check GPU availability on the target machine:

Remote:

ssh <server> nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader

Local:

nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader
# or for Mac MPS:
python -c "import torch; print('MPS available:', torch.backends.mps.is_available())"

Free GPU = memory.used < 500 MiB.

Step 3: Sync Code (Remote Only)

Check the project's

AGENTS.md

for a

code_sync

setting. If not specified, default to

rsync

Option A: rsync (default)

Only sync necessary files — NOT data, checkpoints, or large files:

rsync -avz --include='*.py' --exclude='*' <local_src>/ <server>:<remote_dst>/

Option B: git (when

code_sync: git

is set in AGENTS.md)

Push local changes to remote repo, then pull on the server:

# 1. Push from local
git add -A && git commit -m "sync: experiment deployment" && git push

# 2. Pull on server
ssh <server> "cd <remote_dst> && git pull"

Benefits: version-tracked, multi-server sync with one push, no rsync include/exclude rules needed.

Step 3.5: W&B Integration (when

wandb: true

in AGENTS.md)

Skip this step entirely if

wandb

is not set or is
false
in AGENTS.md.

Before deploying, ensure the experiment scripts have W&B logging:

Check if wandb is already in the script — look for
```
import wandb
```
or
```
wandb.init
```
. If present, skip to Step 4.

If not present, add W&B logging to the training script:

import wandb
wandb.init(project=WANDB_PROJECT, name=EXP_NAME, config={...hyperparams...})

# Inside training loop:
wandb.log({"train/loss": loss, "train/lr": lr, "step": step})

# After eval:
wandb.log({"eval/loss": eval_loss, "eval/ppl": ppl, "eval/accuracy": acc})

# At end:
wandb.finish()

Metrics to log (add whichever apply to the experiment):
- ```
train/loss
```
  — training loss per step
- ```
train/lr
```
  — learning rate
- ```
eval/loss
```
  ,
```
eval/ppl
```
  ,
```
eval/accuracy
```
  — eval metrics per epoch
- ```
gpu/memory_used
```
  — GPU memory (via
```
torch.cuda.max_memory_allocated()
```
  )
- ```
speed/samples_per_sec
```
  — throughput
- Any custom metrics the experiment already computes

Verify wandb login on the target machine:

ssh <server> "wandb status"  # should show logged in
# If not logged in:
ssh <server> "wandb login <WANDB_API_KEY>"

The W&B project name and API key come from
AGENTS.md
(see example below). The experiment name is auto-generated from the script name + timestamp.

Step 4: Deploy

Remote (via SSH + screen)

For each experiment, create a dedicated screen session with GPU binding:

ssh <server> "screen -dmS <exp_name> bash -c '\
  eval \"\$(<conda_path>/conda shell.bash hook)\" && \
  conda activate <env> && \
  CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>'"

Local

# Linux with CUDA
CUDA_VISIBLE_DEVICES=<gpu_id> python <script> <args> 2>&1 | tee <log_file>

# Mac with MPS (PyTorch uses MPS automatically)
python <script> <args> 2>&1 | tee <log_file>

For local long-running jobs, use

run_in_background: true

to keep the conversation responsive.

Step 5: Verify Launch

Remote:

ssh <server> "screen -ls"

Local: Check process is running and GPU is allocated.

Step 6: Feishu Notification (if configured)

After deployment is verified, check

~/.codex/feishu.json

Send
```
experiment_done
```
notification: which experiments launched, which GPUs, estimated time
If config absent or mode
```
"off"
```
: skip entirely (no-op)

Key Rules

ALWAYS check GPU availability first — never blindly assign GPUs
Each experiment gets its own screen session + GPU (remote) or background process (local)
Use
```
tee
```
to save logs for later inspection
Run deployment commands with
```
run_in_background: true
```
to keep conversation responsive
Report back: which GPU, which screen/process, what command, estimated time
If multiple experiments, launch them in parallel on different GPUs

AGENTS.md Example

Users should add their server info to their project's

AGENTS.md

## Remote Server
- SSH: `ssh my-gpu-server`
- GPU: 4x A100 (80GB each)
- Conda: `eval "$(/opt/conda/bin/conda shell.bash hook)" && conda activate research`
- Code dir: `/home/user/experiments/`
- code_sync: rsync          # default. Or set to "git" for git push/pull workflow
- wandb: false              # set to "true" to auto-add W&B logging to experiment scripts
- wandb_project: my-project # W&B project name (required if wandb: true)
- wandb_entity: my-team     # W&B team/user (optional, uses default if omitted)

## Local Environment
- Mac MPS / Linux CUDA
- Conda env: `ml` (Python 3.10 + PyTorch)

W&B setup: Run
wandb login
on your server once (or set
WANDB_API_KEY
env var). The skill reads project/entity from
AGENTS.md
and adds
wandb.init()
+
wandb.log()
to your training scripts automatically. Dashboard:
https://wandb.ai/<entity>/<project>
.