Skillshub primekg

PrimeKG Knowledge Graph Skill

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/K-Dense-AI/claude-scientific-skills/primekg" ~/.claude/skills/comeonoliver-skillshub-primekg && rm -rf "$T"
manifest: skills/K-Dense-AI/claude-scientific-skills/primekg/SKILL.md
source content

PrimeKG Knowledge Graph Skill

Overview

PrimeKG is a precision medicine knowledge graph that integrates over 20 primary databases and high-quality scientific literature into a single resource. It contains over 100,000 nodes and 4 million edges across 29 relationship types, including drug-target, disease-gene, and phenotype-disease associations.

Key capabilities:

  • Search for nodes (genes, proteins, drugs, diseases, phenotypes)
  • Retrieve direct neighbors (associated entities and clinical evidence)
  • Analyze local disease context (related genes, drugs, phenotypes)
  • Identify drug-disease paths (potential repurposing opportunities)

Data access: Programmatic access via

query_primekg.py
. Data is stored at
C:\Users\eamon\Documents\Data\PrimeKG\kg.csv
.

When to Use This Skill

This skill should be used when:

  • Knowledge-based drug discovery: Identifying targets and mechanisms for diseases.
  • Drug repurposing: Finding existing drugs that might have evidence for new indications.
  • Phenotype analysis: Understanding how symptoms/phenotypes relate to diseases and genes.
  • Multiscale biology: Bridging the gap between molecular targets (genes) and clinical outcomes (diseases).
  • Network pharmacology: Investigating the broader network effects of drug-target interactions.

Core Workflow

1. Search for Entities

Find identifiers for genes, drugs, or diseases.

from scripts.query_primekg import search_nodes

# Search for Alzheimer's disease nodes
results = search_nodes("Alzheimer", node_type="disease")
# Returns: [{"id": "EFO_0000249", "type": "disease", "name": "Alzheimer's disease", ...}]

2. Get Neighbors (Direct Associations)

Retrieve all connected nodes and relationship types.

from scripts.query_primekg import get_neighbors

# Get all neighbors of a specific disease ID
neighbors = get_neighbors("EFO_0000249")
# Returns: List of neighbors like {"neighbor_name": "APOE", "relation": "disease_gene", ...}

3. Analyze Disease Context

A high-level function to summarize associations for a disease.

from scripts.query_primekg import get_disease_context

# Comprehensive summary for a disease
context = get_disease_context("Alzheimer's disease")
# Access: context['associated_genes'], context['associated_drugs'], context['phenotypes']

Relationship Types in PrimeKG

The graph contains several key relationship types including:

  • protein_protein
    : Physical PPIs
  • drug_protein
    : Drug target/mechanism associations
  • disease_gene
    : Genetic associations
  • drug_disease
    : Indications and contraindications
  • disease_phenotype
    : Clinical signs and symptoms
  • gwas
    : Genome-wide association studies evidence

Best Practices

  1. Use specific IDs: When using
    get_neighbors
    , ensure you have the correct ID from
    search_nodes
    .
  2. Context first: Use
    get_disease_context
    for a broad overview before diving into specific genes or drugs.
  3. Filter relationships: Use the
    relation_type
    filter in
    get_neighbors
    to focus on specific evidence (e.g., only
    drug_protein
    ).
  4. Multiscale integration: Combine with
    OpenTargets
    for deeper genetic evidence or
    Semantic Scholar
    for the latest literature context.

Resources

Scripts

  • scripts/query_primekg.py
    : Core functions for searching and querying the knowledge graph.

Data Path

  • Data:
    /mnt/c/Users/eamon/Documents/Data/PrimeKG/kg.csv
  • Total nodes: ~129,000
  • Total edges: ~4,000,000
  • Database: CSV-based, optimized for pandas querying.