Medical-research-skills hmdb-database
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/hmdb-database" ~/.claude/skills/aipoch-medical-research-skills-hmdb-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/hmdb-database/SKILL.mdsource content
When to Use
- You need to look up a metabolite by common name (e.g., “Caffeine”) and retrieve its HMDB entry data.
- You have an HMDB ID (e.g.,
) and want to extract standardized chemical/biological/clinical fields for downstream analysis.HMDB0000001 - You want to build a local, scriptable pipeline to mine the HMDB XML dump instead of manually browsing the website.
- You need to map HMDB identifiers to external resources (e.g., KEGG, PubChem, ChEBI) for integration tasks.
- You are preparing metabolomics datasets and need pathway/enzyme/transporter annotations from HMDB entries.
Key Features
- Search metabolites by:
- Text name
- HMDB identifier (e.g.,
)HMDB0000001 - Structure-related query (as supported by the parser/search implementation)
- Parse the HMDB XML dataset and extract:
- Chemical data (formula, molecular weight, InChI/SMILES where available)
- Biological data (pathways, enzymes, transporters)
- Clinical data (disease associations, biofluid concentrations)
- Optional structuring of extracted results for analysis workflows (e.g., tabular outputs).
- Supports integration workflows by exposing identifiers suitable for cross-database mapping.
Dependencies
- Python
>=3.9 - Standard library:
(built-in)xml.etree.ElementTree
- Optional:
pandas >= 1.5
Example Usage
1) Download HMDB XML
Download the HMDB metabolite XML dataset from:
Assume you saved it as:
data/hmdb_metabolites.xml
2) Search and Extract Fields (Runnable Example)
from scripts.hmdb_parser import HMDBParser def main(): # Path to the HMDB XML dump downloaded from hmdb.ca/downloads xml_path = "data/hmdb_metabolites.xml" parser = HMDBParser(xml_path) # Search by metabolite name (text query) results = parser.search("Caffeine") # Print basic information from the first match (structure depends on implementation) if not results: print("No results found.") return first = results[0] print("Top match:") print(first) if __name__ == "__main__": main()
3) Field Reference
For a curated list of extractable fields and how they map to HMDB XML elements, see:
references/hmdb_data_fields.md
Implementation Details
-
Data acquisition
- Primary workflow uses the official HMDB downloadable XML dataset (recommended for bulk parsing).
- Single-entry lookups can be done via the HMDB website, but this skill is designed around XML parsing.
-
Parsing approach
- The parser reads the HMDB XML and traverses metabolite entries using
.xml.etree.ElementTree - Extracted fields should follow the definitions documented in
.references/hmdb_data_fields.md
- The parser reads the HMDB XML and traverses metabolite entries using
-
Search behavior
- Name/ID search typically matches against key textual identifiers (e.g., common name, synonyms, HMDB accession).
- Structure-based search is dependent on what structural fields are indexed/exposed by
(e.g., SMILES/InChI).HMDBParser
-
Integration / cross-references
- HMDB entries often include cross-references to external databases (e.g., KEGG, PubChem, ChEBI).
- A common workflow is to extract these identifiers and build mapping tables for downstream joins.
-
Spectral analysis (conceptual)
- HMDB contains NMR/MS references for some metabolites; this skill can be extended to link parsed entries to spectral metadata.
- Actual spectral matching/identification is not guaranteed unless implemented in the codebase.