Medical-research-skills kegg-database
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/kegg-database" ~/.claude/skills/aipoch-medical-research-skills-kegg-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/kegg-database/SKILL.mdsource content
When to Use
- You need to fetch KEGG pathway, gene, compound, enzyme, disease, or drug records directly from the KEGG REST API.
- You want to perform gene ↔ pathway mapping (e.g., building inputs for pathway enrichment or reporting).
- You need cross-references between KEGG databases (e.g., pathway → genes, gene → KO, pathway → compounds).
- You must convert identifiers between KEGG and external databases (e.g., KEGG gene → NCBI Gene ID / UniProt; KEGG compound → PubChem).
- You need drug–drug interaction (DDI) lookups for KEGG drug IDs.
Note: KEGG REST access is intended for academic use. Non-academic/commercial use may require a separate KEGG license.
Key Features
- Full coverage of core KEGG REST operations via Python helpers:
(database metadata)kegg_info
(catalog listing)kegg_list
(keyword/property search)kegg_find
(entry retrieval; sequences/structures/images)kegg_get
(ID conversion)kegg_conv
(cross-database linking)kegg_link
(drug–drug interactions)kegg_ddi
- Supports common KEGG identifiers and formats:
- Pathways:
,map00010hsa00010 - Genes:
hsa:10458 - Compounds:
cpd:C00002 - Drugs:
dr:D00001 - Enzymes:
ec:1.1.1.1 - KO:
ko:K00001
- Pathways:
- Output format options for
:kegg_get
,aaseq
,ntseq
,mol
,kcf
,image
,kgml
(some formats are single-entry only).json
Dependencies
- Python
>=3.9 requests >=2.31.0
Example Usage
""" End-to-end example: 1) Find a human gene by keyword 2) Link the gene to pathways 3) Retrieve one pathway entry 4) Convert the gene ID to UniProt """ from scripts.kegg_api import kegg_find, kegg_link, kegg_get, kegg_conv # 1) Search for a gene keyword in KEGG GENES hits = kegg_find("genes", "p53") print("FIND results (first lines):") print("\n".join(hits.splitlines()[:5]), "\n") # Choose a known KEGG gene ID for TP53 (human) gene_id = "hsa:7157" # 2) Link gene -> pathways pathway_links = kegg_link("pathway", gene_id) print("LINK gene -> pathways (first lines):") print("\n".join(pathway_links.splitlines()[:5]), "\n") # Parse the first pathway ID from the link output # Typical line format: path:hsaXXXXX<TAB>hsa:7157 first_line = next((ln for ln in pathway_links.splitlines() if ln.strip()), None) if not first_line: raise RuntimeError("No pathways returned for the gene ID.") path_id = first_line.split("\t")[0].replace("path:", "") print("Selected pathway:", path_id, "\n") # 3) Retrieve the pathway entry (flat text) pathway_entry = kegg_get(path_id) print("GET pathway entry (first 30 lines):") print("\n".join(pathway_entry.splitlines()[:30]), "\n") # 4) Convert KEGG gene ID -> UniProt uniprot_map = kegg_conv("uniprot", gene_id) print("CONV KEGG -> UniProt:") print(uniprot_map)
Implementation Details
API-to-function mapping
This skill wraps KEGG REST endpoints into Python functions (see
scripts/kegg_api.py):
-
kegg_info(database_or_org)
Retrieves database or organism metadata (release info, counts, etc.). -
kegg_list(database, organism=None)
Lists entries in a database; optionally scoped to an organism (e.g.,
).("pathway", "hsa")
Also supports listing explicit IDs (batch-style) when passed as a single string. -
kegg_find(database, query, option=None)
Searches by keyword or by chemical properties. Common
values:option
(exact match)formula
(range likeexact_mass
)300-310
(range)mol_weight
-
kegg_get(entry_ids, option=None)
Retrieves full entries or specific formats:- Sequences:
,aaseqntseq - Structures:
,molkcf - Pathway assets:
(PNG),image
(XML),kgml
(Pathway JSON)json
Batching rules:
- Most operations allow up to 10 entries per request.
,image
, andkgml
typically allow only 1 entry per request.json
- Sequences:
-
kegg_conv(target_db, source)
Converts IDs between KEGG and external databases (e.g.,
,uniprot
,ncbi-geneid
,pubchem
).chebi
Output is tab-delimited pairs:
.source_id<TAB>target_id -
kegg_link(target_db, source)
Cross-references entries across KEGG databases (e.g., gene → pathway, pathway → compound, gene → KO). -
kegg_ddi(drug_ids)
Returns known drug–drug interactions for one or more KEGG drug IDs (up to typical batch limits).
Practical constraints and error handling
- Entry limits: Prefer chunking lists into batches of ≤10 IDs; enforce single-entry calls for
.image/kgml/json - HTTP status codes: Treat non-200 responses as failures; common issues include:
(bad request / malformed parameters)400
(unknown database or entry ID)404
- Rate behavior: KEGG does not publish strict rate limits; avoid high-frequency polling and add backoff/retry for robustness.
Reference documentation
For detailed endpoint syntax, database lists, and species codes, consult:
references/kegg_reference.md