Aiwg research-gap-detect

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/frameworks/research-complete/skills/research-gap-detect" ~/.claude/skills/jmagly-aiwg-research-gap-detect-ef7dc4 && rm -rf "$T"
manifest: agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md
source content

Research Gap Detect

Analyze the research corpus citation graph to find disconnected clusters, isolated papers, and gap opportunities. Optionally searches for bridge paper candidates and files gap issues.

Triggers

  • "find research gaps"
  • "detect clusters"
  • "cluster analysis"
  • "find isolated papers"
  • "bridge candidate search"
  • /research-gap-detect

Parameters

--clusters-only
(optional)

Only run cluster detection — skip bridge search and issue filing.

--file-issues
(optional)

Auto-file gap issues for each disconnected cluster pair.

--search-bridges
(optional)

Search external databases for papers that could bridge disconnected clusters.

--min-cluster-size N
(optional)

Minimum papers in a cluster to report. Default: 2.

--format
(optional)

Output format:

full
(default),
summary
, or
json
.

Execution Flow

Phase 1: Build Citation Graph

  1. Read the citation-network index (from
    /corpus-index-build --graph citation-network
    )
    • If stale or missing: run
      /corpus-index-build --graph citation-network
      first
  2. Build an adjacency list from outgoing + incoming edges
  3. Treat as undirected for cluster detection (A cites B ≡ A connected to B)

Phase 2: Connected Components (BFS)

Run BFS/connected-components on the undirected citation graph:

  1. Initialize: all nodes unvisited
  2. For each unvisited node: BFS to find its connected component
  3. Collect components sorted by size (largest first)

Output:

Connected Components: 9

Cluster 1: "Agentic Workflows" (124 papers)
  Hub: REF-016 (34 connections)
  Topics: agentic-workflows, multi-agent, orchestration
  Sample: REF-001, REF-016, REF-024, REF-121 ...

Cluster 2: "GUI Agents" (31 papers)
  Hub: REF-198 (12 connections)
  Topics: gui-agents, web-agents, screen-understanding
  Sample: REF-198, REF-201, REF-215 ...

...

Cluster 9: "Isolated" (3 papers)
  No hub (all degree 1)
  REF-299, REF-312, REF-350

Phase 3: Gap Analysis

For each pair of clusters, assess the gap:

  1. Topic overlap — do the clusters share any tags?
  2. Temporal overlap — do they cover the same years?
  3. Author overlap — do any authors appear in both clusters?
  4. Bridgeability — could a single paper connect them?

Prioritize gaps by:

  • Size product — larger clusters disconnected = higher priority
  • Topic proximity — clusters with related but not identical topics
  • Recency — newer clusters may simply be missing recent cross-citations

Output:

Gap Analysis: 12 cluster pairs

Priority 1: "Agentic Workflows" ↔ "GUI Agents"
  Gap: 124 × 31 = 3,844 (size product)
  Topic overlap: agent, llm (2 shared tags)
  Bridge opportunity: HIGH
  Suggested search: "LLM agent GUI interaction orchestration"

Priority 2: "Evaluation" ↔ "Reproducibility"
  Gap: 45 × 28 = 1,260
  Topic overlap: evaluation, benchmark (2 shared tags)
  Bridge opportunity: MEDIUM
  Suggested search: "reproducible LLM evaluation benchmarks"
...

Phase 4: Bridge Search (if --search-bridges)

For each high-priority gap:

  1. Generate search queries from cluster topic overlap
  2. Search external databases (Semantic Scholar, arXiv, Google Scholar)
  3. Filter candidates by:
    • Cites papers from BOTH clusters
    • Published in overlapping time range
    • High citation count (likely to be connecting work)
  4. Rank candidates by bridge potential

Output:

Bridge Candidates Found: 8

For gap "Agentic Workflows" ↔ "GUI Agents":
  1. "WebAgent: World-Centric Web Navigation" (2024)
     Cites: REF-016 (Cluster 1), REF-198 (Cluster 2)
     Citations: 87
     Bridge potential: HIGH

  2. "Agent-E: Vision-Language Planning for Web Tasks" (2024)
     Cites: REF-024 (Cluster 1), REF-201 (Cluster 2)
     Citations: 45
     Bridge potential: MEDIUM

Phase 5: File Issues (if --file-issues)

For each gap with bridge candidates, file a research induction issue:

## Research Gap: [Cluster A] ↔ [Cluster B]

**Gap Size**: [N × M papers disconnected]
**Bridge Candidates**: [list]
**Suggested Action**: Induct [top candidate] to connect clusters

### Bridge Papers to Induct
- [ ] "WebAgent: World-Centric Web Navigation" — arxiv:2401.XXXXX
- [ ] "Agent-E: Vision-Language Planning" — arxiv:2403.XXXXX

Phase 6: Report

Research Gap Detection
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Graph: 372 nodes, 1,247 edges
Connected components: 9
Largest cluster: 124 papers ("Agentic Workflows")
Isolated papers: 3

Gap analysis: 12 cluster pairs
  HIGH priority: 4 (bridge candidates available)
  MEDIUM priority: 5
  LOW priority: 3

Bridge candidates found: 8 papers
Issues filed: 4
Papers recommended for induction: 8

Distinction from research-gap

ToolApproachOutput
research-gap
Intellectual — topic coverage, missing areas, GRADE gapsGap report with search queries
research-gap-detect
Structural — citation graph topology, disconnected componentsCluster map, bridge candidates, filed issues

research-gap
answers "what topics are we missing?" while
research-gap-detect
answers "which existing papers don't cite each other but should?"

Examples

# Full analysis with bridge search
/research-gap-detect --search-bridges

# Just show clusters
/research-gap-detect --clusters-only

# Detect and auto-file issues
/research-gap-detect --file-issues

# Combined: search + file
/research-gap-detect --search-bridges --file-issues

# JSON for visualization
/research-gap-detect --format json

References

  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Builds the citation-network graph
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite: complete bidirectional edges
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap/SKILL.md — Complementary intellectual gap analysis
  • @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/induct-research/SKILL.md — Inducts bridge candidates