Aiwg corpus-export

Package corpus subsets as distribution archives. Select papers by cluster, topic, REF range, or custom filter; bundle PDFs, analysis docs, citation sidecars, web sources, and BibTeX into a tar.gz with manifest.

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/corpus-export" ~/.claude/skills/jmagly-aiwg-corpus-export && rm -rf "$T"
manifest: .agents/skills/corpus-export/SKILL.md
source content

Corpus Export

Package corpus subsets as distribution archives. Selects papers by cluster, topic, REF range, or custom filter and bundles all artifacts (PDF, analysis doc, citation sidecar, web source, BibTeX) into a portable archive with manifest.

Triggers

  • "export the corpus"
  • "package papers for distribution"
  • "create a distribution archive"
  • "export agentic canon"
  • "corpus export"
  • /corpus-export

Parameters

Selection (one required)

--cluster <name>

Select all papers in a named cluster (from

/research-gap-detect
).

/corpus-export --cluster "Agentic Canon"

--refs <range>

Explicit REF range or list. Supports ranges, multi-ranges, and individual IDs.

/corpus-export --refs REF-016:REF-024,REF-121
/corpus-export --refs REF-016,REF-018,REF-024

--topic <name>

Select all papers tagged with a specific topic.

/corpus-export --topic "GUI Agents"

--filter <expr>

Custom filter expression (frontmatter field comparisons).

/corpus-export --filter "year>=2023 AND incoming>=10"
/corpus-export --filter "grade=High AND tag:reproducibility"

Options

--output <path>
(optional)

Output archive path. Default:

.aiwg/research/exports/corpus-<selector>-<date>.tar.gz
.

--format tar.gz|zip
(optional)

Archive format. Default:

tar.gz
.

--include
(optional, repeatable)

Artifact types to include. Defaults:

pdf,analysis,citations,bibtex
.

Available:

pdf
,
text
,
web
,
analysis
,
citations
,
bibtex
,
metadata
,
provenance
.

--dry-run
(optional)

List what would be included without creating the archive.

Execution Flow

Phase 1: Selection

Resolve the selection criteria to a list of REF-XXX identifiers:

  • --cluster
    : look up cluster in citation-network index, return member REFs
  • --refs
    : parse range expression
  • --topic
    : scan findings frontmatter for matching
    tags
  • --filter
    : evaluate expression against frontmatter

Report resolved selection:

Selection: "Agentic Canon" cluster
Papers: 17 (REF-001, REF-016, REF-018, REF-024, ...)

Phase 2: Artifact Gathering

For each selected REF, gather the configured artifact types from canonical locations:

REF-016:
  ✓ PDF: sources/pdfs/full/REF-016-autogen.pdf (2.4 MB)
  ✓ Analysis: findings/REF-016-autogen.md (287 lines)
  ✓ Citations: documentation/citations/REF-016.md (43 outgoing, 12 incoming)
  ✓ BibTeX: citations/bibtex/REF-016.bib
  ✗ Web: no web source (PDF primary)
  ✓ Metadata: sources/metadata/REF-016.yaml

Flag missing artifacts:

REF-299:
  ✗ PDF: MISSING (acquisition failed)
  ✓ Analysis: findings/REF-299-stub.md (22 lines — STUB)
  ...

Phase 3: Manifest Generation

Write a

MANIFEST.md
to the archive root describing the export:

# Corpus Export Manifest

**Date**: 2026-04-13
**Selector**: --cluster "Agentic Canon"
**Papers**: 17
**Total size**: 48.3 MB

## Contents

| REF | Title | Year | GRADE | PDF | Analysis | Citations |
|-----|-------|------|-------|-----|----------|-----------|
| REF-016 | AutoGen | 2023 | High | ✓ | 287 lines | 43/12 |
| REF-018 | Multi-Agent Debate | 2024 | High | ✓ | 312 lines | 28/17 |
...

## Missing Artifacts

- REF-299: PDF missing (acquisition failed)
- REF-312: Analysis doc is a skeleton (<40 lines)

## Provenance

Generated by `corpus-export` v1.0 from corpus at:
- Fixity manifest: .aiwg/research/fixity-manifest.json (checksum: abc123...)
- Citation graph: indices/citation-network.md (generated 2026-04-13T10:00Z)

Phase 4: Archive Creation

Create the archive with structure:

corpus-agentic-canon-2026-04-13.tar.gz
├── MANIFEST.md
├── pdfs/
│   ├── REF-016-autogen.pdf
│   ├── REF-018-multi-agent-debate.pdf
│   └── ...
├── findings/
│   ├── REF-016-autogen.md
│   └── ...
├── citations/
│   ├── REF-016.md
│   └── ...
├── bibtex/
│   ├── REF-016.bib
│   └── all.bib                    # concatenated bibliography
└── README.md                      # extraction + usage instructions

Phase 5: Report

Corpus Export Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Selector: --cluster "Agentic Canon"
Papers selected: 17
Artifacts bundled: 68 files
Missing artifacts: 2 (reported in MANIFEST.md)

Archive: .aiwg/research/exports/corpus-agentic-canon-2026-04-13.tar.gz
Size: 48.3 MB
SHA-256: abc123def456...

Contents:
  17 PDFs (45.1 MB)
  17 analysis docs (1.2 MB)
  17 citation sidecars (0.8 MB)
  17 BibTeX entries + all.bib (50 KB)
  1 MANIFEST.md (4 KB)
  1 README.md (2 KB)

Archive Use Cases

Research sharing

Share a cluster with collaborators without sharing the entire corpus.

/corpus-export --cluster "Agentic Canon"

Snapshot for publication

Package the corpus state referenced by a paper for reproducibility.

/corpus-export --refs REF-016:REF-024 --include pdf,analysis,citations,provenance

Topic digest

Export everything on a specific topic for a focused review.

/corpus-export --topic "Evaluation" --filter "year>=2023"

Quality subset

Export only high-quality sources.

/corpus-export --filter "grade=High"

Integration Points

ComponentRelationship
research-gap-detect
Provides
--cluster
names
corpus-index-build
Provides topic and metadata for selection
research-quality-audit
Flags missing/skeleton artifacts in manifest
research-cite
Generates BibTeX entries bundled in export
Media curator
/acquire
Source of PDF files packaged into export

Examples

# Export a named cluster
/corpus-export --cluster "Agentic Canon"

# Export a REF range
/corpus-export --refs REF-016:REF-024,REF-121

# Export by topic
/corpus-export --topic "GUI Agents"

# Filter: recent high-grade papers with many citations
/corpus-export --filter "year>=2023 AND grade=High AND incoming>=10"

# Preview without creating archive
/corpus-export --cluster "Agentic Canon" --dry-run

# Minimal export (analysis docs only)
/corpus-export --topic "Reproducibility" --include analysis,citations

# Custom output path
/corpus-export --refs REF-001:REF-100 --output /tmp/first-100.tar.gz

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Provides cluster names
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Provides topic/metadata indices
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Flags missing artifacts
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-cite/SKILL.md — Generates BibTeX
  • @$AIWG_ROOT/docs/integrations/media-curator-to-research-handoff.md — Source acquisition contract