PaperOrchestra agent-research-aggregator

Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.

install
source · Clone the upstream repo
git clone https://github.com/Ar9av/PaperOrchestra
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Ar9av/PaperOrchestra "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agent-research-aggregator" ~/.claude/skills/ar9av-paperorchestra-agent-research-aggregator && rm -rf "$T"
manifest: skills/agent-research-aggregator/SKILL.md
source content

agent-research-aggregator


Should I run? (decision gate)

Before starting Phase 1, check whether aggregation is actually needed:

SituationAction
workspace/inputs/idea.md
and
workspace/inputs/experimental_log.md
both exist and are non-empty
Skip this skill entirely. Proceed directly to
paper-orchestra
.
Either file is missing or empty, and the user provided a directory pathRun this skill with that directory as
--search-roots
.
Either file is missing or empty, and no directory was providedScan cwd and
~
by default; show the discovery summary to the user before continuing.
The inputs exist but look thin (e.g. idea.md has < 5 lines, no numeric data in experimental_log.md)Ask the user whether to supplement with aggregation or proceed as-is.

The skill is intentionally a pre-pass — it is cheap to skip and should only run when the structured inputs don't already exist.


A pre-processing skill for PaperOrchestra (arXiv:2604.05018). Reads scattered experimentation artifacts from AI coding-agent cache directories and synthesizes them into the structured

(I, E)
input pair the PaperOrchestra pipeline expects.

[.claude/]  [.cursor/]  [.antigravity/]  [.openclaw/]
      │            │              │               │
      └────────────┴──────────────┴───────────────┘
                          │
                    Phase 1: Discovery
                  (discover_logs.py)
                          │
                    discovered_logs.json
                          │
                    Phase 2: Extraction
                  (LLM call per log batch)
                          │
                    raw_experiments.json
                          │
                    Phase 3: Synthesis
                  (LLM call — consolidate)
                          │
                    synthesis.json
                          │
                    Phase 4: Formatting
                  (format_po_inputs.py)
                          │
             ┌────────────┴────────────┐
      workspace/inputs/         workspace/ara/
        idea.md                   aggregation_report.md
        experimental_log.md       discovered_logs.json
                                  raw_experiments.json
                                  synthesis.json

The output drops directly into

workspace/inputs/
so the user can immediately run
paper-orchestra
on the same workspace.


Inputs

ParameterRequiredDefaultDescription
--search-roots
nocwd,
~
Comma-separated directories to scan for agent caches
--agents
noallComma-separated subset:
claude,cursor,antigravity,openclaw
--workspace
no
./workspace
PaperOrchestra workspace root
--depth
no4Max directory scan depth (prevents runaway scans on large home dirs)
--since
nononeOnly include logs modified after this date (ISO 8601:
2025-01-01
)

The user specifies these when invoking the skill, or you may ask them for

--search-roots
if the current directory has no detectable agent caches.


Phase 1 — Discovery (deterministic)

Run the discovery script to catalog every relevant log file:

python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots <roots> \
    --agents <agents> \
    --depth <depth> \
    --since <since> \
    --out workspace/ara/discovered_logs.json

The script exits with code 2 when no

--project
filter is set (this is expected on the first run). It prints a "Projects found" list to stdout — show it to the user immediately.

If no logs are found at all: stop and ask the user to specify

--search-roots
or point you at a directory that contains agent cache folders.


Phase 1.5 — Project Selection (mandatory)

A paper can only be written from a single project. You must ask the user which project to use before any LLM processing begins.

  1. Display the numbered project list from the discovery summary, e.g.:
    Projects found:
      [1] /home/alice/projects/my-rl-experiment  (42 files)
      [2] /home/alice/projects/llm-eval-suite    (17 files)
      [3] /home/alice/projects/old-demo          (3 files)
    
  2. Ask: "Which project should this paper be based on? Please choose a number or paste the project path."
  3. Do not proceed to Phase 2 until the user has answered.
  4. Re-run discovery with the chosen project to filter the manifest:
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots <roots> \
    --agents <agents> \
    --depth <depth> \
    --since <since> \
    --project "<chosen project path>" \
    --out workspace/ara/discovered_logs.json

This overwrites

discovered_logs.json
so only the selected project's files remain. The script exits 0 on success.

If the discovery finds only one project: skip the question and inform the user: "Only one project found:

<path>
. Using it for the paper." — then re-run with
--project
automatically.

If the discovery summary shows irrelevant files after filtering: ask the user whether to include or exclude them before continuing to Phase 2. Err on the side of inclusion — the extraction prompt is conservative.


Phase 2 — Extraction (LLM-assisted)

Process discovered logs in batches (group by agent type; keep batches under ~50 KB of raw text to stay within context limits):

For each batch:

  1. Read the log files in the batch (the script's
    --list
    output tells you which file paths to read).
  2. Apply the extraction prompt from
    references/extraction-prompt.md
    as your system message.
  3. Pass the raw log text as the user message.
  4. Collect the structured JSON the LLM returns (see schema in the prompt).
  5. Append to
    workspace/ara/raw_experiments.json
    .

After all batches:

python skills/agent-research-aggregator/scripts/extract_experiments.py \
    --discovered workspace/ara/discovered_logs.json \
    --out workspace/ara/raw_experiments.json \
    --validate-only

Run this in

--validate-only
mode to check the combined JSON is well-formed and meets the minimum schema (
experiments
array non-empty, each entry has
hypothesis
or
method
or
results
). Fix any malformed entries before Phase 3.


Phase 3 — Synthesis (LLM-assisted)

Consolidate possibly-redundant experiment records from multiple agent caches into a single coherent research narrative. This is ONE LLM call.

System message: Use

references/synthesis-prompt.md
verbatim.

User message:

<raw_experiments>
{contents of workspace/ara/raw_experiments.json}
</raw_experiments>

The LLM must return a

synthesis.json
with keys:

  • research_question
    — the overarching question being investigated
  • hypothesis
    — the core proposed solution / claim
  • method_summary
    — how the approach works (concise, no data leakage)
  • key_contributions
    — 2–5 bullet strings
  • experimental_setup
    — datasets, metrics, baselines, implementation notes
  • results_tables
    — array of
    {title, headers[], rows[]}
    markdown-table objects
  • qualitative_observations
    — free-form text blocks (what worked, what didn't, failure modes, ablation insights)
  • iteration_history
    — ordered list of
    {iteration_id, change_description, outcome}
    entries if multiple iterations are detected
  • open_questions
    — questions that remain unanswered in the logs

Save to

workspace/ara/synthesis.json
.

Note: By this point, the user has already selected a single project in Phase 1.5. The synthesis should represent one coherent research thread. If the LLM still surfaces multiple disconnected research questions, flag this as a data quality warning in the audit report (Phase 5) but do not re-ask for project selection — that decision was made earlier.


Phase 4 — Formatting (deterministic)

Convert

synthesis.json
into PaperOrchestra input files:

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/

This generates two files:

workspace/inputs/idea.md
(Sparse variant)

Follows the PaperOrchestra Sparse Idea format (arXiv:2604.05018, §3.1):

# [Synthesized Research Title]

## Problem
<2–4 sentence problem statement derived from research_question>

## Hypothesis
<hypothesis from synthesis>

## Method
<method_summary from synthesis>

## Key Contributions
<key_contributions as bullet list>

## Open Questions
<open_questions, if any>

workspace/inputs/experimental_log.md

Follows the PaperOrchestra Experimental Log format (App. D.3):

## 1. Experimental Setup
<experimental_setup from synthesis, formatted as prose + sub-bullets>

## 2. Raw Numeric Data
<results_tables converted to GitHub-Flavored Markdown tables>

## 3. Qualitative Observations
<qualitative_observations from synthesis>

### Iteration History
<iteration_history as an ordered narrative, if present>

After running the script, review both files with the user:

  1. Read
    workspace/inputs/idea.md
    aloud and ask: "Does this accurately capture your research question and method?"
  2. Read the table headers from
    workspace/inputs/experimental_log.md
    and ask: "Are these the correct metrics and baselines?"

Revise based on feedback before proceeding to PaperOrchestra.


Phase 5 — Audit Report (deterministic)

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/ \
    --report workspace/ara/aggregation_report.md

The

--report
flag makes the script also write
aggregation_report.md
, which contains:

  • Number of agent caches scanned, files read, batches processed
  • Per-agent breakdown (files found per agent type)
  • Experiment records extracted (count, date range)
  • Iterations detected (count, convergence direction)
  • Data quality warnings (gaps, low-confidence extractions, conflicting numbers)
  • Files written and their sizes

Show the report to the user. If the data quality section lists warnings, discuss them before running paper-orchestra — garbage in, garbage out.


Handoff to PaperOrchestra

Once the user has confirmed

idea.md
and
experimental_log.md
, the workspace is ready for the paper-orchestra pipeline. You still need:

FileStatusAction
workspace/inputs/idea.md
✓ generateduser review recommended
workspace/inputs/experimental_log.md
✓ generateduser review recommended
workspace/inputs/template.tex
MISSINGask user to provide their conference LaTeX template
workspace/inputs/conference_guidelines.md
MISSINGask user to provide (page limit, deadline, formatting rules)

Tell the user exactly which two files are still needed, then offer to run

paper-orchestra
once they supply them.


Error handling

SituationAction
Cache directory does not existSkip silently; note in report
File is binary or non-textSkip; note in report
File > 200 KBTruncate at 200 KB; note in report with path
LLM extraction returns malformed JSONRe-prompt once with the parse error appended; if still malformed, log the batch as
status: failed
and continue
Synthesis returns > 1
research_question
Log as data quality warning in audit report; do not re-ask for project (was selected in Phase 1.5)
results_tables
is empty after synthesis
Warn the user — PaperOrchestra's section-writing agent needs numeric data

Hard rules (never violate)

  1. Never write to agent cache directories. This skill is read-only on
    .claude/
    ,
    .cursor/
    ,
    .antigravity/
    ,
    .openclaw/
    .
  2. Never include personal information (emails, names, credentials, API keys) in generated
    idea.md
    or
    experimental_log.md
    . The extraction prompt instructs the LLM to strip PII; double-check before handoff.
  3. Never fabricate results. If a metric appears in only one log with low confidence, mark it
    [UNVERIFIED]
    in the table rather than silently including it.
  4. Never proceed past Phase 1 without user confirmation of the discovered file list if the scan found > 50 files.

Quick reference

# Phase 1: discover all projects (exits with code 2 — project selection required)
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . ~ --out workspace/ara/discovered_logs.json

# Phase 1.5: re-run with chosen project (exits 0)
python skills/agent-research-aggregator/scripts/discover_logs.py \
    --search-roots . ~ \
    --project "/home/user/projects/my-chosen-project" \
    --out workspace/ara/discovered_logs.json

# ... (Phase 2: LLM extraction calls, see above) ...

python skills/agent-research-aggregator/scripts/extract_experiments.py \
    --discovered workspace/ara/discovered_logs.json \
    --out workspace/ara/raw_experiments.json --validate-only

# ... (Phase 3: LLM synthesis call, see above) ...

python skills/agent-research-aggregator/scripts/format_po_inputs.py \
    --synthesis workspace/ara/synthesis.json \
    --out workspace/inputs/ \
    --report workspace/ara/aggregation_report.md