AlterLab-Academic-Skills alterlab-primekg
Query the Precision Medicine Knowledge Graph (PrimeKG) for multiscale biological data including genes, drugs, diseases, phenotypes, and more. Part of the AlterLab Academic Skills suite.
git clone https://github.com/AlterLab-IEU/AlterLab-Academic-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/AlterLab-IEU/AlterLab-Academic-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/cheminformatics/alterlab-primekg" ~/.claude/skills/alterlab-ieu-alterlab-academic-skills-alterlab-primekg && rm -rf "$T"
skills/cheminformatics/alterlab-primekg/SKILL.mdPrimeKG Knowledge Graph Skill
Overview
PrimeKG is a precision medicine knowledge graph that integrates over 20 primary databases and high-quality scientific literature into a single resource. It contains over 100,000 nodes and 4 million edges across 29 relationship types, including drug-target, disease-gene, and phenotype-disease associations.
Key capabilities:
- Search for nodes (genes, proteins, drugs, diseases, phenotypes)
- Retrieve direct neighbors (associated entities and clinical evidence)
- Analyze local disease context (related genes, drugs, phenotypes)
- Identify drug-disease paths (potential repurposing opportunities)
Data access: Programmatic access via
query_primekg.py. Data is stored at <path-to-primekg>/kg.csv.
When to Use This Skill
This skill should be used when:
- Knowledge-based drug discovery: Identifying targets and mechanisms for diseases.
- Drug repurposing: Finding existing drugs that might have evidence for new indications.
- Phenotype analysis: Understanding how symptoms/phenotypes relate to diseases and genes.
- Multiscale biology: Bridging the gap between molecular targets (genes) and clinical outcomes (diseases).
- Network pharmacology: Investigating the broader network effects of drug-target interactions.
Core Workflow
1. Search for Entities
Find identifiers for genes, drugs, or diseases.
from scripts.query_primekg import search_nodes # Search for Alzheimer's disease nodes results = search_nodes("Alzheimer", node_type="disease") # Returns: [{"id": "EFO_0000249", "type": "disease", "name": "Alzheimer's disease", ...}]
2. Get Neighbors (Direct Associations)
Retrieve all connected nodes and relationship types.
from scripts.query_primekg import get_neighbors # Get all neighbors of a specific disease ID neighbors = get_neighbors("EFO_0000249") # Returns: List of neighbors like {"neighbor_name": "APOE", "relation": "disease_gene", ...}
3. Analyze Disease Context
A high-level function to summarize associations for a disease.
from scripts.query_primekg import get_disease_context # Comprehensive summary for a disease context = get_disease_context("Alzheimer's disease") # Access: context['associated_genes'], context['associated_drugs'], context['phenotypes']
Relationship Types in PrimeKG
The graph contains several key relationship types including:
: Physical PPIsprotein_protein
: Drug target/mechanism associationsdrug_protein
: Genetic associationsdisease_gene
: Indications and contraindicationsdrug_disease
: Clinical signs and symptomsdisease_phenotype
: Genome-wide association studies evidencegwas
Best Practices
- Use specific IDs: When using
, ensure you have the correct ID fromget_neighbors
.search_nodes - Context first: Use
for a broad overview before diving into specific genes or drugs.get_disease_context - Filter relationships: Use the
filter inrelation_type
to focus on specific evidence (e.g., onlyget_neighbors
).drug_protein - Multiscale integration: Combine with
for deeper genetic evidence orOpenTargets
for the latest literature context.Semantic Scholar
Resources
Scripts
: Core functions for searching and querying the knowledge graph.scripts/query_primekg.py
Data Path
- Data:
<path-to-primekg>/kg.csv - Total nodes: ~129,000
- Total edges: ~4,000,000
- Database: CSV-based, optimized for pandas querying.