Medical-research-skills citation-network
Build and visualize a citation network from a source/target CSV to identify key papers, communities, and emerging hotspots; use when you have citation pairs and need fast literature review or trend analysis.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/citation-network" ~/.claude/skills/aipoch-medical-research-skills-citation-network && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/citation-network/SKILL.mdsource content
When to Use
- You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis.
- You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters.
- You want to detect community structures (research subfields) and compare them across time or datasets.
- You need an interactive, shareable visualization (HTML) or a Gephi-importable graph file (GEXF).
- You are positioning a new project and want evidence of research hotspots and bridging papers between communities.
Key Features
- Builds a directed citation graph from a minimal CSV containing
andsource
.target - De-duplicates nodes by identifier (DOI recommended; otherwise unique titles).
- Exports:
for Gephi and other graph toolscitation_network.gexf
for basic network statisticsnetwork_metrics.json
for interactive browser viewing (auto-generated by the build script)citation_network.html
- Run-directory workflow to keep each execution reproducible and isolated under
.outputs/runs/<timestamp>/ - Optional input encoding control to avoid garbled characters (e.g., UTF-8 / UTF-8-SIG).
Dependencies
- Python 3.10+
- pandas >= 2.0
- networkx >= 3.0
- (Optional, for HTML visualization) pyvis >= 0.3
Example Usage
1) Initialize a run directory
python scripts/init_run.py
This creates a new run folder:
outputs/runs/<timestamp>/ config.json data/ outputs/
2) Prepare the citation CSV (minimal)
Create
citations.csv and place it into:
outputs/runs/<timestamp>/data/citations.csv
Minimal CSV format:
source,target Paper A,Paper B Paper A,Paper C
Recommended DOI-based identifiers:
source,target 10.1234/abcd.1,10.1234/abcd.2 10.1234/abcd.1,10.1234/abcd.3
3) Confirm configuration
Open:
outputs/runs/<timestamp>/config.json
Ensure the configured input filename and column names match your CSV (at minimum
source and target). If you see garbled characters, set an explicit encoding (e.g., utf-8 or utf-8-sig) via an input_encoding field if supported by the config.
4) Build the citation network
python scripts/build_citation_network.py
The build script will also generate the HTML automatically (you do not need to run
scripts/export_gexf_html.py manually).
5) Inspect outputs
Expected outputs under the same run directory:
(import into Gephi)citation_network.gexf
(node/edge counts, density, etc.)network_metrics.json
(open in a browser)citation_network.html
Implementation Details
Data Model
- Nodes: papers, identified by the value in
/source
(DOI preferred; otherwise a unique, consistent title string).target - Edges: directed citations
.source -> target
Input Requirements and Constraints
- The network builder reads only the
andsource
columns.target - Additional columns (e.g., author/year/venue) are ignored by the current scripts.
- If you need metadata, maintain a separate table for downstream joining/annotation (not consumed by the builder), for example:
id,title,authors,year,doi 10.1234/abcd.1,Paper A,"Zhang, Wei; Li, Ming",2021,10.1234/abcd.1 10.1234/abcd.2,Paper B,"Wang, Fang",2019,10.1234/abcd.2
Run Directory Standard
- Always run
before an execution to create a new run directory.python scripts/init_run.py - All inputs, configs, and outputs must remain inside
.outputs/runs/<timestamp>/ - By default, scripts operate on the latest run directory under
.outputs/runs/
Metrics and Analysis (Conceptual)
- Basic network statistics are exported to
(e.g., node/edge counts, density).network_metrics.json - Typical downstream analyses include:
- centrality (degree, betweenness)
- community detection (e.g., Louvain), if enabled/implemented in the pipeline
Common Failure Modes
- Garbled characters: ensure CSV is UTF-8/UTF-8-SIG; set
ininput_encoding
if available.config.json - Duplicate nodes: identical identifiers are treated as the same node; prefer DOIs or enforce unique titles.
- Empty or missing output: verify the CSV header names match the configured
/source
columns.target
Related References
- Data cleaning checklist:
references/data-cleaning-checklist.md - Network metrics notes:
references/network-metrics-notes.md - Additional documentation:
references/README.md