SciAgent-Skills kegg-database
Direct REST API access to KEGG (academic use only). Query pathways, genes, compounds, enzymes, diseases, drugs. Seven operations: info, list, find, get, conv, link, ddi. ID conversion (NCBI, UniProt, PubChem). For Python workflows with multiple databases, prefer bioservices.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/kegg-database" ~/.claude/skills/jaechang-hits-sciagent-skills-kegg-database && rm -rf "$T"
skills/genomics-bioinformatics/kegg-database/SKILL.mdKEGG Database — Biological Pathway & Molecular Network Queries
Overview
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.
When to Use
- Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
- Retrieving metabolic pathway details, gene lists, or compound structures
- Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
- Checking drug-drug interactions from KEGG's pharmacological database
- Building pathway enrichment context (all genes per pathway for an organism)
- Cross-referencing compounds, reactions, enzymes, and pathways
- For Python-native multi-database queries (KEGG + UniProt + Ensembl in one script), prefer
insteadbioservices - For pathway visualization, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly
Prerequisites
pip install requests
API constraints:
- Academic use only — commercial use requires a separate KEGG license
- Max 10 entries per
/get
/list
/conv
/link
call (image/kgml/json: 1 entry only)ddi - No explicit rate limit, but add
between batch requests to avoid server-side throttlingtime.sleep(0.5) - Base URL:
https://rest.kegg.jp/
Quick Start
import requests import time BASE = "https://rest.kegg.jp" def kegg_get(operation, *args): """Generic KEGG REST API caller.""" url = f"{BASE}/{operation}/{'/'.join(args)}" resp = requests.get(url) resp.raise_for_status() return resp.text # Find pathways linked to human gene TP53 pathways = kegg_get("link", "pathway", "hsa:7157") print(pathways[:200]) # hsa:7157 path:hsa04010 # hsa:7157 path:hsa04110 # ... # Get pathway details detail = kegg_get("get", "hsa04110") print(detail[:300])
Core API
1. Database Information — kegg_info
kegg_infoRetrieve metadata and statistics about KEGG databases.
import requests BASE = "https://rest.kegg.jp" # Database-level info info = requests.get(f"{BASE}/info/pathway").text print(info[:200]) # pathway Pathway # Release 112.0, Dec 2025 # Kanehisa Laboratories # ... # Organism-level info hsa_info = requests.get(f"{BASE}/info/hsa").text print(hsa_info[:200])
Common databases:
kegg, pathway, module, brite, genes, genome, compound, glycan, reaction, enzyme, disease, drug
2. Listing Entries — kegg_list
kegg_listList entry identifiers and names from any KEGG database.
import requests BASE = "https://rest.kegg.jp" # All human pathways hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text for line in hsa_pathways.strip().split("\n")[:5]: pathway_id, name = line.split("\t") print(f"{pathway_id}: {name}") # path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human) # ... # Specific entries (max 10, joined with +) genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text print(genes)
Common organism codes:
hsa (human), mmu (mouse), dme (fruit fly), sce (yeast), eco (E. coli)
3. Keyword Search — kegg_find
kegg_findSearch databases by keywords or molecular properties.
import requests import time BASE = "https://rest.kegg.jp" # Keyword search in genes results = requests.get(f"{BASE}/find/genes/p53").text print(f"Found {len(results.strip().split(chr(10)))} entries") time.sleep(0.5) # Chemical formula search (exact match) compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text print(compounds[:200]) time.sleep(0.5) # Molecular weight range search drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text print(drugs[:200])
Search options: append
/formula (exact match), /exact_mass (range), /mol_weight (range) to compound/drug queries.
4. Entry Retrieval — kegg_get
kegg_getRetrieve complete database entries or specific data formats.
import requests import time BASE = "https://rest.kegg.jp" # Full pathway entry (text format) pathway = requests.get(f"{BASE}/get/hsa00010").text print(pathway[:500]) time.sleep(0.5) # Multiple entries (max 10, joined with +) genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text # Protein sequence (FASTA) fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text print(fasta[:200]) time.sleep(0.5) # Compound structure (MOL format) mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text # ATP # Pathway image (PNG, single entry only) img_resp = requests.get(f"{BASE}/get/hsa05130/image") with open("pathway.png", "wb") as f: f.write(img_resp.content) print(f"Saved pathway image: {len(img_resp.content)} bytes")
Output formats:
aaseq (protein FASTA), ntseq (nucleotide FASTA), mol (MOL), kcf (KCF), image (PNG), kgml (XML), json (pathway JSON). Image/KGML/JSON accept one entry only.
5. ID Conversion — kegg_conv
kegg_convConvert identifiers between KEGG and external databases.
import requests import time BASE = "https://rest.kegg.jp" # KEGG gene → NCBI Gene ID (specific gene) ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text print(ncbi.strip()) # hsa:10458 ncbi-geneid:10458 time.sleep(0.5) # KEGG gene → UniProt uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text print(uniprot.strip()) time.sleep(0.5) # Bulk conversion: all human genes → NCBI Gene IDs all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text lines = all_conv.strip().split("\n") print(f"Total conversions: {len(lines)}") # Reverse: NCBI Gene ID → KEGG reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text print(reverse.strip()) # TP53
Supported external databases:
ncbi-geneid, ncbi-proteinid, uniprot, pubchem, chebi
6. Cross-Referencing — kegg_link
kegg_linkFind related entries within and between KEGG databases.
import requests import time BASE = "https://rest.kegg.jp" # Genes in glycolysis pathway genes = requests.get(f"{BASE}/link/genes/hsa00010").text gene_list = [line.split("\t")[1] for line in genes.strip().split("\n") if line] print(f"Glycolysis genes: {len(gene_list)}") time.sleep(0.5) # Pathways containing a specific gene pathways = requests.get(f"{BASE}/link/pathway/hsa:7157").text # TP53 print(pathways[:300]) time.sleep(0.5) # Compounds in a pathway compounds = requests.get(f"{BASE}/link/compound/hsa00010").text print(f"Compounds in glycolysis: {len(compounds.strip().split(chr(10)))}") # Map genes to KO (orthology) groups ko = requests.get(f"{BASE}/link/ko/hsa:10458").text print(ko.strip())
Common links: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)
7. Drug-Drug Interactions — kegg_ddi
kegg_ddiCheck pharmacological interactions between drugs.
import requests BASE = "https://rest.kegg.jp" # Single drug — all known interactions interactions = requests.get(f"{BASE}/ddi/D00001").text print(f"Interactions: {len(interactions.strip().split(chr(10)))}") # Pairwise check (max 10 drugs, joined with +) pair = requests.get(f"{BASE}/ddi/D00001+D00002+D00003").text print(pair[:300])
Key Concepts
Identifier Formats
| Type | Format | Example |
|---|---|---|
| Reference pathway | | (Glycolysis, generic) |
| Organism pathway | | (Glycolysis, human) |
| Gene | | (TP53) |
| Compound | | (ATP) |
| Drug | | |
| Enzyme | | |
| KO (orthology) | | |
Pathway Categories
KEGG organizes pathways into seven major categories:
- Metabolism —
(Glycolysis, TCA cycle, amino acid metabolism)map001xx - Genetic Information Processing —
(Ribosome, Spliceosome, DNA repair)map030xx - Environmental Information Processing —
(MAPK signaling, ABC transporters)map040xx - Cellular Processes —
(Autophagy, Apoptosis, Cell cycle)map041xx - Organismal Systems —
(Immune, Endocrine, Nervous)map046xx - Human Diseases —
(Cancer, Neurodegenerative, Infectious)map052xx - Drug Development — Chronological and target-based classifications
Common Workflows
Workflow: Gene to Pathway Mapping
Find all pathways associated with a gene of interest.
import requests import time BASE = "https://rest.kegg.jp" # Step 1: Find gene by keyword results = requests.get(f"{BASE}/find/genes/BRCA1+homo+sapiens").text print("Gene search results:") for line in results.strip().split("\n")[:5]: print(f" {line}") time.sleep(0.5) # Step 2: Get pathways linked to BRCA1 pathways = requests.get(f"{BASE}/link/pathway/hsa:672").text pathway_ids = [line.split("\t")[1].replace("path:", "") for line in pathways.strip().split("\n") if line] print(f"\nBRCA1 is in {len(pathway_ids)} pathways:") time.sleep(0.5) # Step 3: Get pathway names for pid in pathway_ids[:5]: info = requests.get(f"{BASE}/get/{pid}").text # Extract NAME field for line in info.split("\n"): if line.startswith("NAME"): print(f" {pid}: {line.replace('NAME', '').strip()}") break time.sleep(0.5)
Workflow: Pathway Enrichment Context
Build a gene-set collection for all pathways of an organism.
import requests import time BASE = "https://rest.kegg.jp" # Step 1: List all human pathways pathways_text = requests.get(f"{BASE}/list/pathway/hsa").text pathways = {} for line in pathways_text.strip().split("\n"): pid, name = line.split("\t", 1) pathways[pid.replace("path:", "")] = name print(f"Total human pathways: {len(pathways)}") time.sleep(0.5) # Step 2: Get genes for each pathway (sample first 3 for demo) gene_sets = {} for pid in list(pathways.keys())[:3]: genes_text = requests.get(f"{BASE}/link/genes/{pid}").text gene_ids = [line.split("\t")[1] for line in genes_text.strip().split("\n") if line] gene_sets[pid] = gene_ids print(f" {pid}: {len(gene_ids)} genes") time.sleep(0.5) # Step 3: Convert to NCBI Gene IDs for enrichment tools # (use kegg_conv for bulk conversion)
Workflow: Compound-Pathway-Reaction Analysis
Trace a compound through metabolic reactions and pathways.
import requests import time BASE = "https://rest.kegg.jp" # Step 1: Search for compound results = requests.get(f"{BASE}/find/compound/glucose").text print("Compound search:") for line in results.strip().split("\n")[:3]: print(f" {line}") time.sleep(0.5) # Step 2: Find reactions involving glucose (C00031) reactions = requests.get(f"{BASE}/link/reaction/cpd:C00031").text rxn_ids = [line.split("\t")[1] for line in reactions.strip().split("\n") if line] print(f"\nReactions involving glucose: {len(rxn_ids)}") time.sleep(0.5) # Step 3: Find pathways for a specific reaction pathways = requests.get(f"{BASE}/link/pathway/rn:R00299").text print(f"\nPathways for R00299:") print(pathways[:300]) time.sleep(0.5) # Step 4: Get pathway detail detail = requests.get(f"{BASE}/get/map00010").text print(f"\nGlycolysis pathway detail (first 500 chars):") print(detail[:500])
Workflow: Cross-Database ID Integration
Map KEGG identifiers to UniProt, NCBI, and PubChem for multi-database workflows.
import requests import time BASE = "https://rest.kegg.jp" # Step 1: Convert gene to multiple external IDs gene = "hsa:7157" # TP53 uniprot = requests.get(f"{BASE}/conv/uniprot/{gene}").text.strip() print(f"UniProt: {uniprot}") time.sleep(0.5) ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/{gene}").text.strip() print(f"NCBI Gene: {ncbi}") time.sleep(0.5) # Step 2: Get protein sequence from KEGG fasta = requests.get(f"{BASE}/get/{gene}/aaseq").text print(f"\nProtein sequence (first 200 chars):\n{fasta[:200]}") time.sleep(0.5) # Step 3: Convert compounds to PubChem CIDs cpd_conv = requests.get(f"{BASE}/conv/pubchem/cpd:C00002").text.strip() # ATP print(f"\nATP PubChem: {cpd_conv}")
Key Parameters
| Parameter | Function/Endpoint | Default | Options | Effect |
|---|---|---|---|---|
| , , | None | 3-4 letter code | Filter by organism (e.g., , ) |
| | None | , , | Search mode for compounds/drugs |
| | text | , , , , , , | Output format |
separator | , , | — | Max 10 entries | Batch query (join IDs with ) |
| | — | , , , | External database for ID conversion |
| | — | , , , , | Related KEGG database |
Best Practices
-
Add delays between batch requests: No explicit rate limit, but
between requests prevents throttling and is courteous to the shared academic resource.time.sleep(0.5) -
Anti-pattern — fetching all entries without filtering: Use
to enumerate IDs first, thenkegg_list
for specific entries. Avoid downloading entire databases when you need a subset.kegg_get -
Parse tab-delimited output consistently: All KEGG responses use
as field separator and\t
as record separator. Always\n
before splitting..strip() -
Respect the 10-entry batch limit:
,kegg_get
,kegg_list
,kegg_conv
,kegg_link
accept max 10 entries (joined withkegg_ddi
). Image/KGML/JSON formats accept only 1.+ -
Use organism-specific pathway IDs:
(human glycolysis) returns organism-specific gene mappings;hsa00010
(reference) returns generic entries. Always prefer organism-specific when analyzing a known organism.map00010 -
Cache frequently-used conversions: Full organism ID conversions (
) return large results. Cache locally rather than repeating.kegg_conv('ncbi-geneid', 'hsa')
Common Recipes
Recipe: Parse KEGG Flat-File Entry
def parse_kegg_entry(text): """Parse a KEGG flat-file entry into a dictionary.""" entry = {} current_key = None for line in text.split("\n"): if line.startswith("///"): break if line[:12].strip(): # New field current_key = line[:12].strip() entry[current_key] = line[12:].strip() elif current_key: # Continuation entry[current_key] += "\n" + line[12:].strip() return entry import requests pathway = requests.get("https://rest.kegg.jp/get/hsa00010").text parsed = parse_kegg_entry(pathway) print(f"Name: {parsed.get('NAME', 'N/A')}") print(f"Description: {parsed.get('DESCRIPTION', 'N/A')[:200]}")
Recipe: Organism Comparison
import requests import time BASE = "https://rest.kegg.jp" organisms = {"hsa": "Human", "mmu": "Mouse", "sce": "Yeast"} pathway = "00010" # Glycolysis for org, name in organisms.items(): genes = requests.get(f"{BASE}/link/genes/{org}{pathway}").text count = len([l for l in genes.strip().split("\n") if l]) print(f"{name} ({org}): {count} genes in Glycolysis") time.sleep(0.5) # Human (hsa): 68 genes in Glycolysis # Mouse (mmu): 67 genes in Glycolysis # Yeast (sce): 31 genes in Glycolysis
Recipe: Build Gene-to-Pathway Mapping Table
import requests import time BASE = "https://rest.kegg.jp" # Get all human gene-pathway links links = requests.get(f"{BASE}/link/pathway/hsa").text gene_pathways = {} for line in links.strip().split("\n"): if not line: continue gene, pathway = line.split("\t") gene_pathways.setdefault(gene, []).append(pathway.replace("path:", "")) print(f"Genes with pathway annotations: {len(gene_pathways)}") # Show top genes by pathway count top = sorted(gene_pathways.items(), key=lambda x: -len(x[1]))[:5] for gene, paths in top: print(f" {gene}: {len(paths)} pathways")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Entry or database doesn't exist | Verify ID format and organism code; use to check valid IDs |
| Malformed API URL | Check URL path: ; no query params |
| Empty response | Search term too specific or no matches | Broaden keywords; try partial matches; check organism code |
| Image/KGML returns error | Batch query with image/kgml/json format | These formats accept one entry only — remove joins |
| Server-side rate limiting | Add between requests; reduce batch frequency |
| Wrong gene IDs returned | Using reference pathway () instead of organism-specific | Use organism prefix: not for gene links |
| ID conversion returns empty | External DB doesn't cover that entry | Not all KEGG entries have UniProt/NCBI mappings; check with first |
| Response encoding issues | Non-ASCII characters in compound names | Use or (requests auto-detects) |
Related Skills
- gget-genomic-databases — unified Python interface to Ensembl, NCBI, UniProt; use for gene-level queries when KEGG pathway context isn't needed
- biopython-molecular-biology — BioPython's
module provides an alternative Python API for KEGG parsingBio.KEGG - pubchem-compound-search — for compound property lookups beyond KEGG's structural data; use
to bridge IDskegg_conv('pubchem', ...)
References
- KEGG REST API documentation — official API specification
- KEGG website — pathway browser, KEGG Mapper, BlastKOALA
- KEGG organism codes — full list of 3-4 letter organism codes
- Kanehisa, M. et al. (2023) "KEGG for taxonomy-based analysis of pathways and genomes" Nucleic Acids Research 51:D483-D489