git clone https://github.com/Aradotso/trending-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aradotso/trending-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aris-autonomous-research" ~/.claude/skills/aradotso-trending-skills-aris-autonomous-research && rm -rf "$T"
skills/aris-autonomous-research/SKILL.md--- name: aris-autonomous-research description: ARIS (Auto-Research-In-Sleep) — Markdown-only autonomous ML research workflows using cross-model review loops, idea discovery, experiment automation, and paper writing with Claude Code or any LLM agent. triggers: - run autonomous research pipeline - set up ARIS research workflow - use claude code for ML research - automate paper writing with AI - cross-model research review loop - run experiment automation with ARIS - install ARIS skills for claude code - generate research ideas while sleeping --- # ARIS — Autonomous Research In Sleep > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. ARIS is a **zero-dependency, Markdown-only** autonomous ML research system. Each "skill" is a plain `SKILL.md` file that any LLM agent can read and execute. The system orchestrates **cross-model collaboration**: one model executes (Claude Code / Codex) while another critiques (GPT-5.4 / Gemini / GLM / MiniMax), breaking self-review blind spots without any framework or lock-in. Core capabilities: - 🔬 **Idea discovery** from a research direction or existing paper - 🧪 **Experiment automation** with GPU-ready code generation and W&B tracking - 📝 **Paper writing** (LaTeX, Beamer slides, A0 poster) - 🔁 **Cross-model review loops** with score progression - 📬 **Rebuttal drafting** with safety gates (no fabrication, no overpromise, full coverage) --- ## Installation ### 1. Clone the repository ```bash git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cd Auto-claude-code-research-in-sleep
2. Install skills into Claude Code
Copy the skills directory to your project, or symlink it:
# Option A: copy skills to your project cp -r skills/ /your/project/.claude/skills/ # Option B: symlink (keeps skills up to date) ln -s /path/to/Auto-claude-code-research-in-sleep/skills /your/project/.claude/skills
Claude Code auto-discovers
SKILL.md files in .claude/skills/**. No registration step needed.
3. Configure the MCP reviewer (cross-model review)
ARIS uses the
llm-chat MCP server so the executor model can call a second model for review. Install it:
cd mcp-servers/llm-chat pip install -r requirements.txt # or: uv pip install -r requirements.txt
Add to your
claude_desktop_config.json (or Claude Code MCP config):
{ "mcpServers": { "llm-chat": { "command": "python", "args": ["/path/to/Auto-claude-code-research-in-sleep/mcp-servers/llm-chat/server.py"], "env": { "OPENAI_API_KEY": "$OPENAI_API_KEY", "LLM_MODEL": "gpt-4o" } } } }
For alternative reviewers (Kimi, GLM, MiniMax, DeepSeek) set
andLLM_BASE_URLto the provider's OpenAI-compatible endpoint. No Claude or OpenAI API required.LLM_MODEL
4. (Optional) Codex MCP for OpenAI Codex as executor
{ "mcpServers": { "codex": { "command": "npx", "args": ["@openai/codex-mcp"], "env": { "OPENAI_API_KEY": "$OPENAI_API_KEY" } } } }
Environment Variables
| Variable | Required | Description |
|---|---|---|
| For GPT reviewer | OpenAI API key |
| For Claude executor | Anthropic API key |
| Alternative reviewer | OpenAI-compatible base URL |
| Alternative reviewer | Model name at that endpoint |
| Experiment tracking | Weights & Biases key |
Workflows & Commands
Full pipeline (idea → paper)
/research-pipeline "factorized gap in discrete diffusion LMs"
With a reference paper and base codebase:
/research-pipeline "improve positional encoding in transformers" — ref paper: https://arxiv.org/abs/2104.09864, base repo: https://github.com/facebookresearch/fairseq
Parameters:
| Flag | Default | Effect |
|---|---|---|
| — | ARIS reads this paper, finds weaknesses, targets them |
| — | Clone and use this repo as experiment base |
| false | Generate lean summary files (good for short-context models) |
Workflow 1 — Idea Discovery
/idea-discovery "sparse attention in long-context LLMs"
What it does:
- Searches recent arXiv papers in the direction
- Identifies open gaps and contradiction clusters
- Generates 3–5 novel ideas with novelty scores
- Runs
to sharpen the best idea into a problem statement/research-refine
Workflow 1.5 — Experiment Bridge
/experiment-bridge "idea_proposal.md" — base repo: https://github.com/huggingface/diffusers
What it does:
- Reads the sharpened idea from Workflow 1
- Generates GPU-ready experiment code
- Runs GPT cross-model code review before deployment (
by default)code review: true - Executes training loop with W&B logging
- Saves results to
experiment_results/
Example generated experiment scaffold:
# experiment_results/run_001/train.py (auto-generated by /experiment-bridge) import wandb import torch from torch.utils.data import DataLoader wandb.init( project=os.environ.get("WANDB_PROJECT", "aris-experiment"), config={ "method": "factorized_discrete_diffusion", "lr": 3e-4, "epochs": 50, "batch_size": 32, } ) for epoch in range(config.epochs): for batch in dataloader: loss = model(batch) wandb.log({"loss": loss.item(), "epoch": epoch})
Workflow 2 — Literature Review
/literature-review "discrete diffusion language models"
Anti-hallucination: ARIS verifies every citation via DBLP → CrossRef → marks unverified as
[VERIFY]. Never fabricates BibTeX.
Workflow 3 — Paper Writing
/paper-write "experiment_results/ + idea_proposal.md" — venue: NeurIPS
Supported venue templates:
CVPR, NeurIPS, ICML, ICLR, ACL, AAAI, ACM MM
Workflow 4 — Rebuttal
/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000
Parameters:
| Parameter | Default | Description |
|---|---|---|
| | Target conference |
| Required | Hard character limit |
| false | Stop after strategy (Phase 0–3), no draft |
| false | Auto-run experiments when reviewers ask for new evidence |
| 1 | GPT-5.4 stress-test passes on draft |
| 3 | Per-reviewer follow-up round limit |
Three safety gates — rebuttal will NOT finalize if any fails:
- 🔒 No fabrication — every claim maps to paper/review/confirmed result
- 🔒 No overpromise — every promise is user-approved
- 🔒 Full coverage — every reviewer concern is tracked
Outputs:
— exact character count, ready to paste to venue portalPASTE_READY.txt
— extended version for manual editingREBUTTAL_DRAFT_rich.md
Presentation & Poster
/paper-slides "paper/" # Beamer PDF + PPTX + speaker notes + Q&A prep /paper-poster "paper/" # A0/A1 poster PDF + PPTX + SVG (venue colors)
Standalone Utility Skills
| Skill | Command | What it does |
|---|---|---|
| | Diagnose training instability, NaN, slow convergence |
| | Convert raw numbers into paper-ready claims |
| | Design minimal ablation study for a method |
| | Claim-driven experiment roadmap |
| | Sharpen vague idea into problem-anchored proposal |
| | Develop and verify research formulas |
| | Generate figures (Gemini-assisted) |
| | Draft grant proposal from research idea |
Alternative Model Combinations
ARIS requires only an OpenAI-compatible endpoint for the reviewer. Set environment variables:
# Kimi as reviewer export LLM_BASE_URL="https://api.moonshot.cn/v1" export LLM_MODEL="moonshot-v1-128k" export LLM_API_KEY=$MOONSHOT_API_KEY # DeepSeek as reviewer export LLM_BASE_URL="https://api.deepseek.com/v1" export LLM_MODEL="deepseek-chat" export LLM_API_KEY=$DEEPSEEK_API_KEY # MiniMax as reviewer export LLM_BASE_URL="https://api.minimax.chat/v1" export LLM_MODEL="abab6.5s-chat" export LLM_API_KEY=$MINIMAX_API_KEY
Then in
mcp-servers/llm-chat/server.py the LLM_BASE_URL env var overrides the OpenAI default. No code changes needed.
Using with Codex CLI (no Claude)
ARIS ships a parallel
skills-codex/ directory with the same workflows adapted for OpenAI Codex CLI:
# Install Codex CLI npm install -g @openai/codex # Run a workflow codex --skill skills/skills-codex/research-pipeline/SKILL.md \ "improve contrastive learning in vision transformers"
Using with Cursor
- Open Cursor settings → Rules → paste content of
docs/CURSOR_ADAPTATION.md - Copy
toskills/
in your project.cursorrules-skills/ - In chat:
@research-pipeline "your research direction"
Using with Trae (ByteDance IDE)
See
for full setup. Trae supports SKILL.md natively via its plugin system.docs/TRAE_ARIS_RUNBOOK_EN.md
Input Templates
Pre-filled templates for every workflow live in
templates/:
templates/ research-pipeline.md # Full pipeline input idea-discovery.md experiment-bridge.md literature-review.md paper-write.md rebuttal.md paper-slides.md paper-poster.md
Use a template:
/research-pipeline — template: templates/research-pipeline.md
Project Structure
Auto-claude-code-research-in-sleep/ ├── skills/ │ ├── research-pipeline/SKILL.md # Main orchestration workflow │ ├── idea-discovery/SKILL.md # Workflow 1 │ ├── experiment-bridge/SKILL.md # Workflow 1.5 │ ├── literature-review/SKILL.md # Workflow 2 │ ├── paper-write/SKILL.md # Workflow 3 │ ├── rebuttal/SKILL.md # Workflow 4 │ ├── paper-slides/SKILL.md │ ├── paper-poster/SKILL.md │ ├── training-check/SKILL.md │ ├── result-to-claim/SKILL.md │ ├── ablation-planner/SKILL.md │ ├── experiment-plan/SKILL.md │ ├── research-refine/SKILL.md │ ├── formula-derivation/SKILL.md │ └── skills-codex/ # Codex CLI variants ├── mcp-servers/ │ └── llm-chat/ # OpenAI-compatible reviewer MCP ├── templates/ # Input templates per workflow ├── docs/ │ ├── CURSOR_ADAPTATION.md │ ├── TRAE_ARIS_RUNBOOK_EN.md │ ├── ANTIGRAVITY_ADAPTATION.md │ ├── MODELSCOPE_GUIDE.md # Free tier setup │ ├── MiniMax-GLM-Configuration.md │ └── CODEX_GEMINI_REVIEW_GUIDE.md └── README.md
Common Patterns
Pattern 1: Start from an arXiv paper you want to beat
/research-pipeline "improve method" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/repo
ARIS reads the paper → identifies weaknesses → clones repo → generates targeted ideas → runs experiments → writes paper.
Pattern 2: Resume interrupted session
Add
compact: true to any workflow. ARIS writes a lean SESSION_SUMMARY.md. On resume:
/research-pipeline — resume: SESSION_SUMMARY.md
Pattern 3: Jump into the middle of a pipeline
Already have results? Jump to paper writing:
/paper-write "my_results/ + my_idea.md" — venue: NeurIPS
Already have a paper? Jump to rebuttal:
/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000
Pattern 4: Free tier via ModelScope
export LLM_BASE_URL="https://api-inference.modelscope.cn/v1" export LLM_MODEL="Qwen/Qwen2.5-72B-Instruct" export LLM_API_KEY=$MODELSCOPE_API_KEY
See
docs/MODELSCOPE_GUIDE.md for zero-cost setup.
Troubleshooting
Skills not discovered by Claude Code
Ensure SKILL.md files are under
.claude/skills/ relative to your project root. Claude Code scans this path at startup.
MCP reviewer not connecting
# Test the llm-chat server directly cd mcp-servers/llm-chat python server.py --test # Should print: {"status": "ok", "model": "gpt-4o"}
W&B logging not working in experiment-bridge
wandb login # uses WANDB_API_KEY env var, or prompts for manual entry
Citation hallucination in literature-review
All unverified citations are tagged
[VERIFY] in output. Search DBLP manually for flagged entries before including in your paper. Never remove the [VERIFY] tag without confirming.
Rebuttal exceeds character limit
ARIS tracks character count per section. If a draft exceeds the limit, it automatically trims supporting evidence (keeps claims, removes elaboration). You can also pass
quick mode: true to get the strategy without the draft, then write targeted sections manually.
Cross-model review loop not running (self-review fallback)
If the
llm-chat MCP is unreachable, ARIS falls back to single-model review with a warning in the output. Check MCP server logs:
tail -f ~/.claude/mcp-logs/llm-chat.log
Session context overflow
Use
compact: true on any workflow invocation to produce a compressed SESSION_SUMMARY.md that fits in a fresh context window.
Extending ARIS
Every skill is a plain Markdown file. To create a custom skill:
# my-custom-skill ## Trigger When the user says "run my custom analysis"... ## Steps 1. Read input files 2. Call `mcp__llm-chat__chat` with the review prompt 3. Write output to `custom_output/` ## Output - `custom_output/analysis.md` - `custom_output/score.json`
Save as
.claude/skills/my-custom-skill/SKILL.md and Claude Code will discover it automatically.