Auto-claude-code-research-in-sleep auto-review-loop-llm
Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
T=$(mktemp -d) && git clone --depth=1 https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/auto-review-loop-llm" ~/.claude/skills/wanshuiyin-auto-claude-code-research-in-sleep-auto-review-loop-llm && rm -rf "$T"
skills/auto-review-loop-llm/SKILL.mdAuto Review Loop (Generic LLM): Autonomous Research Improvement
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
Context: $ARGUMENTS
Constants
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
- REVIEW_DOC:
(cumulative log) (fall back toreview-stage/AUTO_REVIEW.md
for legacy projects)./AUTO_REVIEW.md
LLM Configuration
This skill uses any OpenAI-compatible API for external review via the
llm-chat MCP server.
Configuration via MCP Server (Recommended)
Add to
~/.claude/settings.json:
{ "mcpServers": { "llm-chat": { "command": "/usr/bin/python3", "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"], "env": { "LLM_API_KEY": "your-api-key", "LLM_BASE_URL": "https://api.deepseek.com/v1", "LLM_MODEL": "deepseek-chat" } } } }
Supported Providers
| Provider | LLM_BASE_URL | LLM_MODEL |
|---|---|---|
| OpenAI | | , |
| DeepSeek | | , |
| MiniMax | | |
| Kimi (Moonshot) | | , |
| ZhiPu (GLM) | | , |
| SiliconFlow | | |
| 阿里云百炼 | | |
| 零一万物 | | |
API Call Method
Primary: MCP Tool
mcp__llm-chat__chat: prompt: | [Review prompt content] model: "deepseek-chat" system: "You are a senior ML reviewer..."
Fallback: curl
curl -s "${LLM_BASE_URL}/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${LLM_API_KEY}" \ -d '{ "model": "${LLM_MODEL}", "messages": [ {"role": "system", "content": "You are a senior ML reviewer..."}, {"role": "user", "content": "[review prompt]"} ], "max_tokens": 4096 }'
State Persistence (Compact Recovery)
Persist state to
review-stage/REVIEW_STATE.json after each round:
{ "round": 2, "status": "in_progress", "last_score": 5.0, "last_verdict": "not ready", "pending_experiments": [], "timestamp": "2026-03-15T10:00:00" }
Write this file at the end of every Phase E (after documenting the round).
On completion, set
"status": "completed".
Workflow
Initialization
- Check
for recovery (fall back toreview-stage/REVIEW_STATE.json
if not found — legacy path)./REVIEW_STATE.json - Read project context and prior reviews
- Initialize round counter
Loop (up to MAX_ROUNDS)
Phase A: Review
If MCP available:
mcp__llm-chat__chat: system: "You are a senior ML reviewer (NeurIPS/ICML level)." prompt: | [Round N/MAX_ROUNDS of autonomous review loop] [Full research context: claims, methods, results, known weaknesses] [Changes since last round, if any] 1. Score this work 1-10 for a top venue 2. List remaining critical weaknesses (ranked by severity) 3. For each weakness, specify the MINIMUM fix 4. State clearly: is this READY for submission? Yes/No/Almost Be brutally honest. If the work is ready, say so clearly.
If MCP NOT available:
curl -s "${LLM_BASE_URL}/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${LLM_API_KEY}" \ -d '{ "model": "${LLM_MODEL}", "messages": [ {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."}, {"role": "user", "content": "[Full review prompt]"} ], "max_tokens": 4096 }'
Phase B: Parse Assessment
CRITICAL: Save the FULL raw response verbatim. Then extract:
- Score (numeric 1-10)
- Verdict ("ready" / "almost" / "not ready")
- Action items (ranked list of fixes)
STOP: If score >= 6 AND verdict contains "ready/almost"
Phase C: Implement Fixes
Priority: metric additions > reframing > new experiments
Phase D: Wait for Results
Monitor remote experiments
Phase E: Document Round
Append to
review-stage/AUTO_REVIEW.md:
## Round N (timestamp) ### Assessment (Summary) - Score: X/10 - Verdict: [ready/almost/not ready] - Key criticisms: [bullet list] ### Reviewer Raw Response <details> <summary>Click to expand full reviewer response</summary> [Paste the COMPLETE raw response here — verbatim, unedited.] </details> ### Actions Taken - [what was implemented/changed] ### Results - [experiment outcomes, if any] ### Status - [continuing to round N+1 / stopping]
Write
with current state.review-stage/REVIEW_STATE.json
Termination
- Set
status to "completed"review-stage/REVIEW_STATE.json - Write final summary
Key Rules
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
) to write in chunks. Do NOT ask the user for permission — just do it silently.cat << 'EOF' > file -
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef →
chain. Do NOT generate BibTeX from memory.[VERIFY] -
Be honest about weaknesses
-
Implement fixes BEFORE re-reviewing
-
Document everything
-
Include previous context in round 2+ prompts
-
Prefer MCP tool over curl when available
Prompt Template for Round 2+
mcp__llm-chat__chat: system: "You are a senior ML reviewer (NeurIPS/ICML level)." prompt: | [Round N/MAX_ROUNDS of autonomous review loop] ## Previous Review Summary (Round N-1) - Previous Score: X/10 - Previous Verdict: [ready/almost/not ready] - Previous Key Weaknesses: [list] ## Changes Since Last Review 1. [Action 1]: [result] 2. [Action 2]: [result] ## Updated Results [paste updated metrics/tables] Please re-score and re-assess: 1. Score this work 1-10 for a top venue 2. List remaining critical weaknesses (ranked by severity) 3. For each weakness, specify the MINIMUM fix 4. State clearly: is this READY for submission? Yes/No/Almost Be brutally honest. If the work is ready, say so clearly.
Output Protocols
Follow these shared protocols for all output files:
- Output Versioning Protocol — write timestamped file first, then copy to fixed name
- Output Manifest Protocol — log every output to MANIFEST.md
- Output Language Protocol — respect the project's language setting