Auto-claude-code-research-in-sleep monitor-experiment

Monitor running experiments, check progress, collect results. Use when user says \"check results\", \"is it done\", \"monitor\", or wants experiment output.

install
source · Clone the upstream repo
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/skills-codex/monitor-experiment" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-monitor-experiment-02e14c && rm -rf "$T"
manifest: skills/skills-codex/monitor-experiment/SKILL.md
source content

Monitor Experiment Results

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

ssh <server> "screen -ls"

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

  • Compare against known baselines
  • Flag unexpected results (negative delta, NaN, divergence)
  • Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check

~/.codex/feishu.json
:

  • Send
    experiment_done
    notification: results summary table, delta vs baseline
  • If config absent or mode
    "off"
    : skip entirely (no-op)

Key Rules

  • Always show raw numbers before interpretation
  • Compare against the correct baseline (same config)
  • Note if experiments are still running (check progress bars, iteration counts)
  • If results look wrong, check training logs for errors before concluding