PaperOrchestra paper-orchestra
Orchestrate the full PaperOrchestra (Song et al., 2026, arXiv:2604.05018) five-agent pipeline to turn unstructured research materials (idea, experimental log, LaTeX template, conference guidelines, optional figures) into a submission-ready LaTeX manuscript and compiled PDF. TRIGGER when the user asks to "write a paper from my experiments", "turn this idea and these results into a paper", "generate a conference submission", "run paper-orchestra on X", or otherwise wants the end-to-end paper-writing pipeline. Coordinates the outline-agent, plotting-agent, literature-review-agent, section-writing-agent, and content-refinement-agent skills.
git clone https://github.com/Ar9av/PaperOrchestra
T=$(mktemp -d) && git clone --depth=1 https://github.com/Ar9av/PaperOrchestra "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/paper-orchestra" ~/.claude/skills/ar9av-paperorchestra-paper-orchestra && rm -rf "$T"
skills/paper-orchestra/SKILL.mdpaper-orchestra (Orchestrator)
Top-level driver for the PaperOrchestra pipeline. Read this document and follow the steps below. The detailed prompts and rules live in each sub-skill's
SKILL.md and references/ directories — you (the host agent) will load them
as you go.
Source paper: Song et al., PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing, arXiv:2604.05018, 2026. https://arxiv.org/pdf/2604.05018
What this skill produces
A complete submission package
P = (paper.tex, paper.pdf) written into
workspace/final/, plus a full audit trail under workspace/ (outline,
figures, refs, drafts, refinement worklog, provenance snapshot).
Inputs (the (I, E, T, G, F) tuple from the paper)
The workspace MUST contain:
| File | Symbol | Required | Description |
|---|---|---|---|
| | yes | Idea Summary (Sparse or Dense variant — see ) |
| | yes | Experimental Log: setup, raw numeric data, qualitative observations |
| | yes | LaTeX template for the target conference (with commands) |
| | yes | Formatting rules, page limit, mandatory sections |
| | no | Optional pre-existing figures. If empty, the plotting agent generates everything. |
scripts/init_workspace.py will scaffold this layout. scripts/validate_inputs.py
will check it before the pipeline runs.
Pipeline (read references/pipeline.md
for the full diagram)
references/pipeline.mdStep 1: Outline ──▶ outline.json (1 LLM call) Step 2: Plotting ─┐ ├──▶ figures/*.png + captions.json (~20-30 calls) Step 3: Lit Review ─┘ (~20-30 calls) intro_relwork.tex + refs.bib Step 4: Section Writing ──▶ drafts/paper.tex (1 LLM call) Step 5: Content Refine ──▶ final/paper.tex + final/paper.pdf (~5-7 calls, ~3 iters)
Step 2 and Step 3 are independent and MUST run in parallel when your host supports parallel sub-agents. If not, run Step 3 first (it has the longer wall time due to Semantic Scholar rate limits) and Step 2 second.
Critical pre-instruction (read once, apply always)
Before any LLM call that writes paper content (outline, intro/related work, section writing, refinement), you MUST prepend the Anti-Leakage Prompt at
references/anti-leakage-prompt.md to your system prompt. This is verbatim
from Appendix D.4 of the paper and prevents pre-training-data leakage. The
paper applies it uniformly across all baselines for fair comparison; we apply
it for fidelity and to keep generated papers grounded in the user's inputs.
Step-by-step execution
0. Scaffold, check for missing inputs, and validate
python skills/paper-orchestra/scripts/init_workspace.py --out workspace/ python skills/paper-orchestra/scripts/validate_inputs.py --workspace workspace/
Before failing on missing inputs, check whether aggregation can supply them:
| Inputs state | Action |
|---|---|
and both present and non-empty | Continue to Step 1. |
| Either is missing/empty, and the user mentioned a directory | Load and run with that directory as , then re-validate. |
| Either is missing/empty, no directory mentioned | Ask the user: "Your workspace is missing / . Do you have a folder with research notes or agent history I can aggregate from? If so, tell me the path — or drop the files manually into ." |
If validation still fails after aggregation (e.g.
template.tex or conference_guidelines.md are missing), stop and tell the user exactly which files remain outstanding.
Also probe the TeX installation (once per workspace, result cached):
python skills/paper-orchestra/scripts/check_tex_packages.py \ --out workspace/tex_profile.json
The Section Writing Agent reads
tex_profile.json to decide which LaTeX
patterns to use (e.g., Figure~\ref{} vs \cref{}, whether to include
\usepackage{microtype}, etc.). This eliminates compile-time package
failures that previously required iterative manual edits.
1. Outline (Step 1 — 1 LLM call)
Load
skills/outline-agent/SKILL.md and follow it. Output: workspace/outline.json.
Validate with python skills/outline-agent/scripts/validate_outline.py workspace/outline.json.
Halt the pipeline if validation fails — every downstream agent depends on the schema.
2 ∥ 3. Plotting and Literature Review (in parallel)
Parse
outline.json. Extract:
→ drives Step 2outline.plotting_plan
→ drives Step 3outline.intro_related_work_plan
If your host supports parallel sub-agents (Claude Code's Agent tool with multiple concurrent calls; Cursor's parallel agents; Antigravity's worker pool), spawn two concurrent sub-tasks:
- Sub-task A: load
, execute the plotting plan, produceskills/plotting-agent/SKILL.md
for every entry, plusworkspace/figures/<figure_id>.png
.workspace/figures/captions.json - Sub-task B: load
, execute the research strategy, produceskills/literature-review-agent/SKILL.md
andworkspace/drafts/intro_relwork.tex
.workspace/refs.bib
If your host does not support parallel sub-agents, run Sub-task B first (it has slower wall-clock due to Semantic Scholar QPS limits) then Sub-task A. The artifacts are independent, so order doesn't affect correctness.
4. Section Writing (Step 4 — ONE single multimodal LLM call)
Load
skills/section-writing-agent/SKILL.md and follow it. This is one
single call in the paper (App. B: "Section Writing Agent (1 call)") — do
not split it per section. The agent receives:
outline.json
,idea.mdexperimental_log.md
(already-filled from Step 3 — preserve verbatim)intro_relwork.tex
(the citation map)refs.bibconference_guidelines.md- The actual figure image files from
(multimodal input)workspace/figures/
Output:
workspace/drafts/paper.tex (a complete LaTeX document).
Then run the deterministic gates:
python skills/section-writing-agent/scripts/orphan_cite_gate.py workspace/drafts/paper.tex workspace/refs.bib python skills/section-writing-agent/scripts/latex_sanity.py workspace/drafts/paper.tex python skills/paper-orchestra/scripts/anti_leakage_check.py workspace/drafts/paper.tex
If any gate fails, the host agent must fix the issue (re-prompting the writing step with the gate's error report) before proceeding.
5. Content Refinement (Step 5 — ~3 iterations, ~5-7 calls)
Load
skills/content-refinement-agent/SKILL.md and follow it. The skill
implements the loop with strict halt rules from halt-rules.md. Maintain
workspace/refinement/worklog.json and snapshot each iteration into
workspace/refinement/iter<N>/.
Halt conditions (any one triggers the loop to stop and accept the current best snapshot):
- Iteration count reaches the cap (default 3, see
).halt-rules.md - Overall score from the simulated reviewer decreases vs the previous iteration → revert to previous snapshot, halt.
- Overall score ties but at least one sub-axis decreases while none gain compensatingly (negative net sub-axis change) → revert, halt.
- Reviewer issues no new actionable weaknesses.
The accepted snapshot is copied to
workspace/final/paper.tex.
6. Compile and finalize
cd workspace/final && latexmk -pdf paper.tex
Then write
workspace/provenance.json capturing input file hashes, outline
hash, refs hash, figure hashes, and final tex/pdf hashes (helper:
scripts/snapshot.py in the orchestrator scripts dir if you want a one-shot;
otherwise the host agent computes hashes inline).
Report to the user: the path to
workspace/final/paper.pdf, a brief summary of
which sections were drafted, citation count, refinement iterations completed,
and any gates that failed mid-pipeline.
Workspace layout
See
references/io-contract.md. Summary:
workspace/ ├── inputs/ # user-provided │ ├── idea.md │ ├── experimental_log.md │ ├── template.tex │ ├── conference_guidelines.md │ └── figures/ # optional pre-existing figures ├── outline.json # Step 1 output ├── figures/ # Step 2 output │ ├── <figure_id>.png │ └── captions.json ├── refs.bib # Step 3 output ├── drafts/ # Step 3 + Step 4 output │ ├── intro_relwork.tex │ └── paper.tex ├── refinement/ # Step 5 working dir │ ├── worklog.json │ ├── iter1/ │ ├── iter2/ │ └── iter3/ ├── final/ # accepted snapshot + compiled PDF │ ├── paper.tex │ └── paper.pdf └── provenance.json # input/output hashes for reproducibility
Cost budget (from paper App. B)
Total: ~60–70 LLM calls per paper, ~40 minutes wall-time on the paper's setup. Budget breakdown:
| Step | Calls |
|---|---|
| Outline | 1 |
| Plotting | ~20–30 |
| Literature Review | ~20–30 |
| Section Writing | 1 |
| Content Refinement | ~5–7 |
Host integration
See
references/host-integration.md for per-host invocation details (Claude
Code, Cursor, Antigravity, Cline, Aider, OpenCode).
Resources
— full step-by-step flow + parallelism rules + halt rulesreferences/pipeline.md
— workspace layout, input file schemasreferences/io-contract.md
— verbatim from App. D.4, prepend to every writing callreferences/anti-leakage-prompt.md
— 1-page distillation of arXiv:2604.05018references/paper-summary.md
— per-host invocation guidereferences/host-integration.md
— scaffold workspace dir treescripts/init_workspace.py
— verify (I, E, T, G) before runningscripts/validate_inputs.py
— grep draft for leaked author names/emails/affilsscripts/anti_leakage_check.py