Aiwg corpus-index-build
Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/research-complete/skills/corpus-index-build" ~/.claude/skills/jmagly-aiwg-corpus-index-build-5e0bd8 && rm -rf "$T"
agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.mdCorpus Index Build
Build research graph indices from corpus state. Reads graph definitions from
.aiwg/config.yaml and generates by-topic, by-year, authors, and citation-network indices from the current findings and citation data.
Triggers
- "build the research indices"
- "rebuild corpus graphs"
- "update the topic index"
- "index build"
/corpus-index-build
Parameters
--graph <name>
(optional)
--graph <name>Build a single named graph. Must match a key in
config.yaml graphs section.
--all
(optional)
--allBuild all graphs defined in config, including those not in
defaultBuild. Default behavior builds only defaultBuild graphs.
--force
(optional)
--forceRebuild from scratch, ignoring cached state. Default: incremental (only rebuild if source data changed).
--format
(optional)
--formatOutput format:
full (default), summary, or json.
Configuration
Graphs are defined in
.aiwg/config.yaml:
graphs: by-topic: type: cluster source: findings groupBy: tags output: indices/by-topic.md defaultBuild: true by-year: type: timeline source: findings groupBy: year output: indices/by-year.md defaultBuild: true authors: type: entity source: findings groupBy: authors output: indices/authors.md defaultBuild: true citation-network: type: graph source: citations edges: [outgoing, incoming] output: indices/citation-network.md defaultBuild: false # expensive, build on demand by-methodology: type: cluster source: findings groupBy: methodology output: indices/by-methodology.md defaultBuild: false
Execution Flow
Phase 1: Load Configuration
- Read
graph definitions.aiwg/config.yaml - Determine which graphs to build:
- No flags: build all
graphsdefaultBuild: true
: build only the named graph--graph <name>
: build every defined graph--all
- No flags: build all
- Check for staleness (skip up-to-date graphs unless
)--force
Phase 2: Collect Source Data
For each graph, collect the required data:
Cluster graphs (by-topic, by-methodology):
- Scan all
frontmatterfindings/REF-*.md - Extract the
field values (tags, methodology)groupBy - Build
Map<group, Set<REF-XXX>>
Timeline graphs (by-year):
- Extract
from each finding's frontmatteryear - Build
sorted chronologicallyMap<year, Set<REF-XXX>>
Entity graphs (authors):
- Extract
field from each findingauthors - Normalize author names (Last, First → canonical form)
- Build
Map<author, Set<REF-XXX>>
Citation graphs (citation-network):
- Read outgoing and incoming citation data (from citation-backfill output)
- Build adjacency list:
Map<REF-XXX, {outgoing: Set, incoming: Set}> - Compute: degree distribution, hubs, isolated nodes
Phase 3: Generate Index Files
For each graph, write the index markdown to the configured
output path:
Cluster index format (by-topic example):
# By Topic Index Generated: 2026-04-13T12:00:00Z Sources: 372 findings ## agentic-workflows (47 papers) | REF | Title | Year | GRADE | |-----|-------|------|-------| | REF-001 | Multi-Agent Orchestration | 2024 | High | | REF-016 | AutoGen Framework | 2023 | High | ... ## multi-agent-systems (31 papers) ...
Citation network format:
# Citation Network Nodes: 372 | Edges: 1,247 | Density: 0.009 Avg degree: 6.7 | Max hub: REF-016 (34 edges) ## Top 10 Hubs | REF | Title | In | Out | Total | ... ## Isolated Nodes (0 edges) | REF | Title | Reason | ...
Phase 4: Report
Corpus Index Build ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Graphs built: 3 / 3 by-topic: 47 groups, 372 papers → indices/by-topic.md by-year: 8 years, 372 papers → indices/by-year.md authors: 412 authors, 372 papers → indices/authors.md Skipped (not in defaultBuild): citation-network: use --graph citation-network to build by-methodology: use --graph by-methodology to build
Staleness Detection
Each index file stores a
Generated: timestamp and a source checksum. On incremental builds:
- Compute checksum of all source frontmatter
- Compare against stored checksum in the index file
- Skip if identical (report "up to date")
- Rebuild if different
Integration Points
| Component | Relationship |
|---|---|
| Must run before citation-network graph build |
| Consumes citation-network graph for cluster analysis (#815) |
| Reads index metrics for snapshot reports (#814) |
| The existing CLI command — this skill extends it for research-specific graphs |
| Reports index staleness as a health metric |
Examples
# Build default graphs (by-topic, by-year, authors) /corpus-index-build # Build a specific graph /corpus-index-build --graph citation-network # Build everything including optional graphs /corpus-index-build --all # Force full rebuild /corpus-index-build --force # JSON output for programmatic use /corpus-index-build --format json
References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for citation-network graph
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Consumes citation-network
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-snapshot/SKILL.md — Reads index metrics
- @$AIWG_ROOT/src/artifacts/cli.ts — Existing
infrastructureaiwg index build