Aiwg corpus-snapshot
Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/research-complete/skills/corpus-snapshot" ~/.claude/skills/jmagly-aiwg-corpus-snapshot-c70131 && rm -rf "$T"
agentic/code/frameworks/research-complete/skills/corpus-snapshot/SKILL.mdCorpus Snapshot
Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills
[COMPUTE] sections with data, assists with [ANALYZE] sections, and writes the completed report.
Triggers
- "take a corpus snapshot"
- "generate corpus report"
- "snapshot the research"
- "corpus snapshot"
/corpus-snapshot
Parameters
--compute-only
(optional)
--compute-onlyOnly compute data sections — skip analysis sections. Faster, fully automated.
--delta-only
(optional)
--delta-onlyOnly compute the delta from the previous snapshot. Useful for tracking session progress.
--template <path>
(optional)
--template <path>Custom template path. Default:
.aiwg/reports/corpus-snapshot-template.md.
--format
(optional)
--formatOutput format:
full (default for the report file), summary (terminal), json (programmatic).
Prerequisites
Before generating a snapshot, the following should be current:
| Prerequisite | Command | Gates on |
|---|---|---|
| Citation edges complete | | Topology metrics |
| Indices up to date | | Group counts, hub analysis |
| Stub rate < 10% | | Snapshot validity |
If prerequisites are stale, the snapshot will include warnings.
Execution Flow
Phase 1: Collect Raw Metrics
Scan the corpus and compute:
Dimensions:
- Total papers (node count)
- Total citation edges (edge count)
- Topics (unique tag count)
- Authors (unique author count)
- Year range (oldest → newest)
- Source types distribution
Topology (from citation-network index):
- Graph density: edges / (nodes * (nodes-1))
- Average degree (mean edges per node)
- Max hub (node with most connections)
- Connected components count
- Isolated nodes (degree 0)
- Diameter estimate (longest shortest path in largest component)
Degree Distribution:
- Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+
- Power law fit (if applicable)
Quality Distribution:
- GRADE breakdown: High / Moderate / Low / Very Low
- Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit)
- Source availability: PDF present / Full text extracted / Missing
Phase 2: Compute Delta (if previous snapshot exists)
Compare current metrics against the most recent snapshot:
Delta from previous snapshot (2026-04-10): Papers: +12 (360 → 372) Edges: +87 (1,160 → 1,247) Density: +0.001 (0.008 → 0.009) New topics: +2 (gui-agents, code-generation) Stubs fixed: 23 (88 → 65) New hubs: REF-364 (entered top 10)
Phase 3: Fill Template Sections
Read the snapshot template and fill sections:
sections — fully automated:[COMPUTE]
- Dimensions table
- Topology metrics
- Degree distribution histogram
- GRADE distribution
- Delta table
sections — agent-assisted:[ANALYZE]
- Cluster narrative: describe the main clusters and their themes
- Chain analysis: identify citation chains (A→B→C→D) and their significance
- Gap narrative: summarize disconnected areas and bridge opportunities
- Trend analysis: what's growing, what's stagnant
Phase 4: Write Report
Write the completed snapshot to:
.aiwg/reports/corpus-snapshot-YYYY-MM-DD.md
With frontmatter:
--- type: corpus-snapshot date: 2026-04-13 papers: 372 edges: 1247 density: 0.009 components: 9 stub_rate: 0.17 previous: corpus-snapshot-2026-04-10.md ---
Phase 5: Report Summary
Corpus Snapshot Generated ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Papers: 372 (+12) | Edges: 1,247 (+87) Density: 0.009 | Components: 9 Hub: REF-016 (34) | Isolated: 3 GRADE: 33% High, 24% Mod, 26% Low, 16% VLow Stubs: 65 (17%) | Full text: 54% Delta highlights: +12 papers inducted +87 citation edges (backfill) -23 stubs (expanded) +2 new topics Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md
Template Format
The default template uses markers for computed vs analyzed sections:
# Corpus Snapshot — [DATE] ## Dimensions [COMPUTE: dimensions-table] ## Topology [COMPUTE: topology-metrics] ## Degree Distribution [COMPUTE: degree-histogram] ## Quality Distribution [COMPUTE: grade-distribution] [COMPUTE: depth-distribution] ## Delta [COMPUTE: delta-from-previous] ## Cluster Analysis [ANALYZE: describe main clusters, their themes, and notable papers] ## Citation Chains [ANALYZE: identify significant citation chains and their meaning] ## Gaps and Opportunities [ANALYZE: summarize disconnected areas and bridge opportunities] ## Recommendations [ANALYZE: what should be inducted next, what needs expansion]
Integration Points
| Component | Relationship |
|---|---|
| Reads index metrics (topology, hubs, components) |
| Reads depth distribution; gates if stub rate > 10% |
| Must run before snapshot for accurate topology |
| Cluster data feeds into gap narrative |
| Snapshot is the detailed version of the health score |
Examples
# Full snapshot with analysis /corpus-snapshot # Just data, no analysis sections /corpus-snapshot --compute-only # Delta from previous snapshot only /corpus-snapshot --delta-only # Custom template /corpus-snapshot --template .aiwg/reports/custom-template.md # JSON metrics for dashboards /corpus-snapshot --format json
References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement