Medical-research-skills citation-network

Build and visualize a citation network from a source/target CSV to identify key papers, communities, and emerging hotspots; use when you have citation pairs and need fast literature review or trend analysis.

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/citation-network" ~/.claude/skills/aipoch-medical-research-skills-citation-network && rm -rf "$T"

manifest: scientific-skills/Evidence Insight/citation-network/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis.
You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters.
You want to detect community structures (research subfields) and compare them across time or datasets.
You need an interactive, shareable visualization (HTML) or a Gephi-importable graph file (GEXF).
You are positioning a new project and want evidence of research hotspots and bridging papers between communities.

Key Features

Builds a directed citation graph from a minimal CSV containing
```
source
```
and
```
target
```
.
De-duplicates nodes by identifier (DOI recommended; otherwise unique titles).
Exports:
- ```
citation_network.gexf
```
  for Gephi and other graph tools
- ```
network_metrics.json
```
  for basic network statistics
- ```
citation_network.html
```
  for interactive browser viewing (auto-generated by the build script)
Run-directory workflow to keep each execution reproducible and isolated under
```
outputs/runs/<timestamp>/
```
.
Optional input encoding control to avoid garbled characters (e.g., UTF-8 / UTF-8-SIG).

Dependencies

Python 3.10+
pandas >= 2.0
networkx >= 3.0
(Optional, for HTML visualization) pyvis >= 0.3

Example Usage

1) Initialize a run directory

python scripts/init_run.py

This creates a new run folder:

outputs/runs/<timestamp>/
  config.json
  data/
  outputs/

2) Prepare the citation CSV (minimal)

Create

citations.csv

and place it into:

outputs/runs/<timestamp>/data/citations.csv

Minimal CSV format:

source,target
Paper A,Paper B
Paper A,Paper C

Recommended DOI-based identifiers:

source,target
10.1234/abcd.1,10.1234/abcd.2
10.1234/abcd.1,10.1234/abcd.3

3) Confirm configuration

Open:

outputs/runs/<timestamp>/config.json

Ensure the configured input filename and column names match your CSV (at minimum

source

and

target

). If you see garbled characters, set an explicit encoding (e.g.,

utf-8

utf-8-sig

) via an

input_encoding

field if supported by the config.

4) Build the citation network

python scripts/build_citation_network.py

The build script will also generate the HTML automatically (you do not need to run

scripts/export_gexf_html.py

manually).

5) Inspect outputs

Expected outputs under the same run directory:

```
citation_network.gexf
```
(import into Gephi)
```
network_metrics.json
```
(node/edge counts, density, etc.)
```
citation_network.html
```
(open in a browser)

Implementation Details

Data Model

Nodes: papers, identified by the value in
```
source
```
/
```
target
```
(DOI preferred; otherwise a unique, consistent title string).
Edges: directed citations
```
source -> target
```
.

Input Requirements and Constraints

The network builder reads only the
```
source
```
and
```
target
```
columns.
Additional columns (e.g., author/year/venue) are ignored by the current scripts.
If you need metadata, maintain a separate table for downstream joining/annotation (not consumed by the builder), for example:

id,title,authors,year,doi
10.1234/abcd.1,Paper A,"Zhang, Wei; Li, Ming",2021,10.1234/abcd.1
10.1234/abcd.2,Paper B,"Wang, Fang",2019,10.1234/abcd.2

Run Directory Standard

Always run
```
python scripts/init_run.py
```
before an execution to create a new run directory.
All inputs, configs, and outputs must remain inside
```
outputs/runs/<timestamp>/
```
.
By default, scripts operate on the latest run directory under
```
outputs/runs/
```
.

Metrics and Analysis (Conceptual)

Basic network statistics are exported to
```
network_metrics.json
```
(e.g., node/edge counts, density).
Typical downstream analyses include:
- centrality (degree, betweenness)
- community detection (e.g., Louvain), if enabled/implemented in the pipeline

Common Failure Modes

Garbled characters: ensure CSV is UTF-8/UTF-8-SIG; set
```
input_encoding
```
in
```
config.json
```
if available.
Duplicate nodes: identical identifiers are treated as the same node; prefer DOIs or enforce unique titles.
Empty or missing output: verify the CSV header names match the configured
```
source
```
/
```
target
```
columns.

Related References

Data cleaning checklist:
```
references/data-cleaning-checklist.md
```
Network metrics notes:
```
references/network-metrics-notes.md
```
Additional documentation:
```
references/README.md
```