Aiwg corpus-snapshot

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/research-complete/skills/corpus-snapshot" ~/.claude/skills/jmagly-aiwg-corpus-snapshot-c70131 && rm -rf "$T"
manifest: agentic/code/frameworks/research-complete/skills/corpus-snapshot/SKILL.md
source content

Corpus Snapshot

Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills

[COMPUTE]
sections with data, assists with
[ANALYZE]
sections, and writes the completed report.

Triggers

  • "take a corpus snapshot"
  • "generate corpus report"
  • "snapshot the research"
  • "corpus snapshot"
  • /corpus-snapshot

Parameters

--compute-only
(optional)

Only compute data sections — skip analysis sections. Faster, fully automated.

--delta-only
(optional)

Only compute the delta from the previous snapshot. Useful for tracking session progress.

--template <path>
(optional)

Custom template path. Default:

.aiwg/reports/corpus-snapshot-template.md
.

--format
(optional)

Output format:

full
(default for the report file),
summary
(terminal),
json
(programmatic).

Prerequisites

Before generating a snapshot, the following should be current:

PrerequisiteCommandGates on
Citation edges complete
/citation-backfill
Topology metrics
Indices up to date
/corpus-index-build
Group counts, hub analysis
Stub rate < 10%
/research-quality-audit
Snapshot validity

If prerequisites are stale, the snapshot will include warnings.

Execution Flow

Phase 1: Collect Raw Metrics

Scan the corpus and compute:

Dimensions:

  • Total papers (node count)
  • Total citation edges (edge count)
  • Topics (unique tag count)
  • Authors (unique author count)
  • Year range (oldest → newest)
  • Source types distribution

Topology (from citation-network index):

  • Graph density: edges / (nodes * (nodes-1))
  • Average degree (mean edges per node)
  • Max hub (node with most connections)
  • Connected components count
  • Isolated nodes (degree 0)
  • Diameter estimate (longest shortest path in largest component)

Degree Distribution:

  • Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+
  • Power law fit (if applicable)

Quality Distribution:

  • GRADE breakdown: High / Moderate / Low / Very Low
  • Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit)
  • Source availability: PDF present / Full text extracted / Missing

Phase 2: Compute Delta (if previous snapshot exists)

Compare current metrics against the most recent snapshot:

Delta from previous snapshot (2026-04-10):
  Papers:     +12 (360 → 372)
  Edges:      +87 (1,160 → 1,247)
  Density:    +0.001 (0.008 → 0.009)
  New topics:  +2 (gui-agents, code-generation)
  Stubs fixed: 23 (88 → 65)
  New hubs:    REF-364 (entered top 10)

Phase 3: Fill Template Sections

Read the snapshot template and fill sections:

[COMPUTE]
sections — fully automated:

  • Dimensions table
  • Topology metrics
  • Degree distribution histogram
  • GRADE distribution
  • Delta table

[ANALYZE]
sections — agent-assisted:

  • Cluster narrative: describe the main clusters and their themes
  • Chain analysis: identify citation chains (A→B→C→D) and their significance
  • Gap narrative: summarize disconnected areas and bridge opportunities
  • Trend analysis: what's growing, what's stagnant

Phase 4: Write Report

Write the completed snapshot to:

.aiwg/reports/corpus-snapshot-YYYY-MM-DD.md

With frontmatter:

---
type: corpus-snapshot
date: 2026-04-13
papers: 372
edges: 1247
density: 0.009
components: 9
stub_rate: 0.17
previous: corpus-snapshot-2026-04-10.md
---

Phase 5: Report Summary

Corpus Snapshot Generated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Papers: 372 (+12)  |  Edges: 1,247 (+87)
Density: 0.009     |  Components: 9
Hub: REF-016 (34)  |  Isolated: 3
GRADE: 33% High, 24% Mod, 26% Low, 16% VLow
Stubs: 65 (17%)    |  Full text: 54%

Delta highlights:
  +12 papers inducted
  +87 citation edges (backfill)
  -23 stubs (expanded)
  +2 new topics

Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md

Template Format

The default template uses markers for computed vs analyzed sections:

# Corpus Snapshot — [DATE]

## Dimensions
[COMPUTE: dimensions-table]

## Topology
[COMPUTE: topology-metrics]

## Degree Distribution
[COMPUTE: degree-histogram]

## Quality Distribution
[COMPUTE: grade-distribution]
[COMPUTE: depth-distribution]

## Delta
[COMPUTE: delta-from-previous]

## Cluster Analysis
[ANALYZE: describe main clusters, their themes, and notable papers]

## Citation Chains
[ANALYZE: identify significant citation chains and their meaning]

## Gaps and Opportunities
[ANALYZE: summarize disconnected areas and bridge opportunities]

## Recommendations
[ANALYZE: what should be inducted next, what needs expansion]

Integration Points

ComponentRelationship
corpus-index-build
Reads index metrics (topology, hubs, components)
research-quality-audit
Reads depth distribution; gates if stub rate > 10%
citation-backfill
Must run before snapshot for accurate topology
research-gap-detect
Cluster data feeds into gap narrative
research-status
Snapshot is the detailed version of the health score

Examples

# Full snapshot with analysis
/corpus-snapshot

# Just data, no analysis sections
/corpus-snapshot --compute-only

# Delta from previous snapshot only
/corpus-snapshot --delta-only

# Custom template
/corpus-snapshot --template .aiwg/reports/custom-template.md

# JSON metrics for dashboards
/corpus-snapshot --format json

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement