Auto-claude-code-research-in-sleep monitor-experiment
Monitor running experiments, check progress, collect results. Use when user says \"check results\", \"is it done\", \"monitor\", or wants experiment output.
install
source · Clone the upstream repo
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/skills-codex/monitor-experiment" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-monitor-experiment-02e14c && rm -rf "$T"
manifest:
skills/skills-codex/monitor-experiment/SKILL.mdsource content
Monitor Experiment Results
Monitor: $ARGUMENTS
Workflow
Step 1: Check What's Running
ssh <server> "screen -ls"
Step 2: Collect Output from Each Screen
For each screen session, capture the last N lines:
ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"
If hardcopy fails, check for log files or tee output.
Step 3: Check for JSON Result Files
ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"
If JSON results exist, fetch and parse them:
ssh <server> "cat <results_dir>/<latest>.json"
Step 4: Summarize Results
Present results in a comparison table:
| Experiment | Metric | Delta vs Baseline | Status | |-----------|--------|-------------------|--------| | Baseline | X.XX | — | done | | Method A | X.XX | +Y.Y | done |
Step 5: Interpret
- Compare against known baselines
- Flag unexpected results (negative delta, NaN, divergence)
- Suggest next steps based on findings
Step 6: Feishu Notification (if configured)
After results are collected, check
~/.codex/feishu.json:
- Send
notification: results summary table, delta vs baselineexperiment_done - If config absent or mode
: skip entirely (no-op)"off"
Key Rules
- Always show raw numbers before interpretation
- Compare against the correct baseline (same config)
- Note if experiments are still running (check progress bars, iteration counts)
- If results look wrong, check training logs for errors before concluding