Medical-research-skills citation-network

Build and visualize a citation network from a source/target CSV to identify key papers, communities, and emerging hotspots; use when you have citation pairs and need fast literature review or trend analysis.

install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/citation-network" ~/.claude/skills/aipoch-medical-research-skills-citation-network && rm -rf "$T"
manifest: scientific-skills/Evidence Insight/citation-network/SKILL.md
source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

  • You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis.
  • You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters.
  • You want to detect community structures (research subfields) and compare them across time or datasets.
  • You need an interactive, shareable visualization (HTML) or a Gephi-importable graph file (GEXF).
  • You are positioning a new project and want evidence of research hotspots and bridging papers between communities.

Key Features

  • Builds a directed citation graph from a minimal CSV containing
    source
    and
    target
    .
  • De-duplicates nodes by identifier (DOI recommended; otherwise unique titles).
  • Exports:
    • citation_network.gexf
      for Gephi and other graph tools
    • network_metrics.json
      for basic network statistics
    • citation_network.html
      for interactive browser viewing (auto-generated by the build script)
  • Run-directory workflow to keep each execution reproducible and isolated under
    outputs/runs/<timestamp>/
    .
  • Optional input encoding control to avoid garbled characters (e.g., UTF-8 / UTF-8-SIG).

Dependencies

  • Python 3.10+
  • pandas >= 2.0
  • networkx >= 3.0
  • (Optional, for HTML visualization) pyvis >= 0.3

Example Usage

1) Initialize a run directory

python scripts/init_run.py

This creates a new run folder:

outputs/runs/<timestamp>/
  config.json
  data/
  outputs/

2) Prepare the citation CSV (minimal)

Create

citations.csv
and place it into:

outputs/runs/<timestamp>/data/citations.csv

Minimal CSV format:

source,target
Paper A,Paper B
Paper A,Paper C

Recommended DOI-based identifiers:

source,target
10.1234/abcd.1,10.1234/abcd.2
10.1234/abcd.1,10.1234/abcd.3

3) Confirm configuration

Open:

outputs/runs/<timestamp>/config.json

Ensure the configured input filename and column names match your CSV (at minimum

source
and
target
). If you see garbled characters, set an explicit encoding (e.g.,
utf-8
or
utf-8-sig
) via an
input_encoding
field if supported by the config.

4) Build the citation network

python scripts/build_citation_network.py

The build script will also generate the HTML automatically (you do not need to run

scripts/export_gexf_html.py
manually).

5) Inspect outputs

Expected outputs under the same run directory:

  • citation_network.gexf
    (import into Gephi)
  • network_metrics.json
    (node/edge counts, density, etc.)
  • citation_network.html
    (open in a browser)

Implementation Details

Data Model

  • Nodes: papers, identified by the value in
    source
    /
    target
    (DOI preferred; otherwise a unique, consistent title string).
  • Edges: directed citations
    source -> target
    .

Input Requirements and Constraints

  • The network builder reads only the
    source
    and
    target
    columns.
  • Additional columns (e.g., author/year/venue) are ignored by the current scripts.
  • If you need metadata, maintain a separate table for downstream joining/annotation (not consumed by the builder), for example:
id,title,authors,year,doi
10.1234/abcd.1,Paper A,"Zhang, Wei; Li, Ming",2021,10.1234/abcd.1
10.1234/abcd.2,Paper B,"Wang, Fang",2019,10.1234/abcd.2

Run Directory Standard

  • Always run
    python scripts/init_run.py
    before an execution to create a new run directory.
  • All inputs, configs, and outputs must remain inside
    outputs/runs/<timestamp>/
    .
  • By default, scripts operate on the latest run directory under
    outputs/runs/
    .

Metrics and Analysis (Conceptual)

  • Basic network statistics are exported to
    network_metrics.json
    (e.g., node/edge counts, density).
  • Typical downstream analyses include:
    • centrality (degree, betweenness)
    • community detection (e.g., Louvain), if enabled/implemented in the pipeline

Common Failure Modes

  • Garbled characters: ensure CSV is UTF-8/UTF-8-SIG; set
    input_encoding
    in
    config.json
    if available.
  • Duplicate nodes: identical identifiers are treated as the same node; prefer DOIs or enforce unique titles.
  • Empty or missing output: verify the CSV header names match the configured
    source
    /
    target
    columns.

Related References

  • Data cleaning checklist:
    references/data-cleaning-checklist.md
  • Network metrics notes:
    references/network-metrics-notes.md
  • Additional documentation:
    references/README.md