LLMs-Universal-Life-Science-and-Clinical-Skills- metabolomics-pathway-enrichment

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Metabolomics/metabolomics-pathway-enrichment" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-metabolomics-pathw && rm -rf "$T"

manifest: Skills/Metabolomics/metabolomics-pathway-enrichment/SKILL.md

🗺️ Metabolomics Pathway Analysis

Map metabolites to biological pathways and perform enrichment, topology, and network analysis.

Core Capabilities

KEGG pathway enrichment: Over-representation analysis (ORA) using MetaboAnalystR
Quantitative enrichment analysis (QEA): For continuous data (fold changes)
Topology-based analysis: Considers pathway structure (betweenness, degree)
pathview visualization: KEGG pathway maps with metabolite data overlay
Network-based analysis: Metabolite-pathway bipartite networks
MSEA: Metabolite Set Enrichment using SMPDB or HMDB sets

CLI Reference

python omicsclaw.py run met-pathway --demo
python omicsclaw.py run met-pathway --input <metabolites.csv> --output <dir>

Algorithm / Methodology

KEGG Pathway Enrichment (MetaboAnalystR)

library(MetaboAnalystR)

mSet <- InitDataObjects('conc', 'pathora', FALSE)
mSet <- SetOrganism(mSet, 'hsa')  # Human

# Load metabolite list (HMDB IDs or compound names)
metabolites <- c('HMDB0000001', 'HMDB0000005', 'HMDB0000010')

mSet <- Setup.MapData(mSet, metabolites)
mSet <- CrossReferencing(mSet, 'hmdb')  # Or 'name', 'kegg', 'pubchem'

# Pathway analysis
mSet <- SetKEGG.PathLib(mSet, 'hsa', 'current')
mSet <- SetMetabolomeFilter(mSet, FALSE)
mSet <- CalculateOraScore(mSet, 'rbc', 'hyperg')

pathway_results <- mSet$analSet$ora.mat

Quantitative Enrichment Analysis (QEA)

mSet <- InitDataObjects('conc', 'pathqea', FALSE)
mSet <- SetOrganism(mSet, 'hsa')

metabolite_data <- data.frame(
    compound = c('Glucose', 'Lactate', 'Pyruvate'),
    fc = c(1.5, 2.3, 0.7)
)

mSet <- Setup.MapData(mSet, metabolite_data)
mSet <- CrossReferencing(mSet, 'name')
mSet <- SetKEGG.PathLib(mSet, 'hsa', 'current')
mSet <- CalculateQeaScore(mSet, 'rbc', 'gt')

qea_results <- mSet$analSet$qea.mat

Topology-Based Analysis

mSet <- InitDataObjects('conc', 'pathinteg', FALSE)
mSet <- SetOrganism(mSet, 'hsa')
mSet <- Setup.MapData(mSet, metabolites)
mSet <- CrossReferencing(mSet, 'hmdb')
mSet <- SetKEGG.PathLib(mSet, 'hsa', 'current')
mSet <- SetMetabolomeFilter(mSet, FALSE)
mSet <- CalculateHyperScore(mSet)  # Combined ORA + topology

topo_results <- mSet$analSet$topo.mat

Pathview Visualization

library(pathview)

metabolite_data <- c('C00031' = 1.5, 'C00186' = 2.3, 'C00022' = 0.7)

pathview(cpd.data = metabolite_data,
         pathway.id = '00010',  # Glycolysis
         species = 'hsa',
         cpd.idtype = 'kegg',
         out.suffix = 'glycolysis_mapped')
# Output: hsa00010.glycolysis_mapped.png

KEGG Mapper (Direct API)

library(KEGGREST)

pathway_info <- keggGet('hsa00010')  # Glycolysis

kegg_ids <- c('C00031', 'C00186', 'C00022')

find_pathways <- function(kegg_id) {
    pathways <- keggLink('pathway', kegg_id)
    return(pathways)
}

all_pathways <- lapply(kegg_ids, find_pathways)

Network-Based Analysis

library(igraph)

build_network <- function(pathway_results) {
    edges <- data.frame()
    for (i in 1:nrow(pathway_results)) {
        pathway <- rownames(pathway_results)[i]
        metabolites <- strsplit(pathway_results$Metabolites[i], '; ')[[1]]
        for (met in metabolites) {
            edges <- rbind(edges, data.frame(from = met, to = pathway))
        }
    }
    g <- graph_from_data_frame(edges, directed = FALSE)
    V(g)$type <- ifelse(V(g)$name %in% edges$from, 'metabolite', 'pathway')
    return(g)
}

network <- build_network(pathway_results)
plot(network, vertex.size = ifelse(V(network)$type == 'pathway', 15, 5))

Metabolite Set Enrichment (MSEA)

mSet <- InitDataObjects('conc', 'msetora', FALSE)
mSet <- SetMetaboliteFilter(mSet, FALSE)
mSet <- SetCurrentMsetLib(mSet, 'smpdb_pathway', 2)
mSet <- Setup.MapData(mSet, metabolites)
mSet <- CrossReferencing(mSet, 'hmdb')
mSet <- CalculateHyperScore(mSet)
msea_results <- mSet$analSet$ora.mat

Export Results

export_pathways <- function(results, output_file) {
    results_df <- as.data.frame(results)
    results_df$pathway <- rownames(results)
    results_df <- results_df[, c('pathway', 'Total', 'Expected', 'Hits',
                                   'Raw p', 'Holm adjust', 'FDR', 'Impact')]
    results_df <- results_df[order(results_df$FDR), ]
    write.csv(results_df, output_file, row.names = FALSE)
    return(results_df)
}

Parameters

Parameter	Default	Description
`--method`	`ora`	ora, qea, topology, msea, mummichog
`--species`	`hsa`	KEGG organism code
`--id-type`	`hmdb`	hmdb, kegg, name, pubchem
`--fdr-cutoff`	`0.05`	FDR threshold

Why This Exists

Without it: Finding 50 significant metabolites offers loose biological intuition but lacks systemic proof
With it: Validates functional perturbations by mapping features onto definitive KEGG/Reactome architectures
Why OmicsClaw: Runs fast local enrichment caches utilizing rigorous topology metrics (Degree/Betweenness)

Workflow

Calculate: Map chemical IDs (HMDB/PubChem) to target Database indices.
Execute: Hypergeometric tests and pathway impact score calculations.
Assess: Perform FDR multiple testing adjustments.
Generate: Output structural network graphs.
Report: Tabulate key functionally enriched terms.

Example Queries

"Perform KEGG pathway analysis on these significant metabolites"
"Run mummichog enrichment directly on m/z features"

Output Structure

output_directory/
├── report.md
├── result.json
├── pathways.csv
├── figures/
│   ├── pathway_overview.png
│   ├── pathway_map.png
│   └── metabolite_network.png
├── tables/
│   ├── pathway_enrichment.csv
│   └── topology_scores.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Safety

Local-first: Local database matching where possible; transparent interactions for external APIs (like KEGG).
Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
Audit trail: Hyperparameters and operational flow states are logged fully.

Integration with Orchestrator

Trigger conditions:

Automatically invoked dynamically based on tool metadata and user intent matching.

Chaining partners:

```
met-diff
```
— Upstream source of significant feature hits

Version Compatibility

Reference examples tested with: MetaboAnalystR 4.0+, ReactomePA 1.46+

Dependencies

Required: numpy, pandas Optional: MetaboAnalystR (R), pathview (R), KEGGREST (R), igraph (R), ReactomePA (R)

Citations

MetaboAnalyst — Pang et al., Nucleic Acids Research 2021
pathview — Luo & Brouwer, Bioinformatics 2013
mummichog — Li et al., PLoS Computational Biology 2013
FELLA — Picart-Armada et al., PLoS Computational Biology 2018

Related Skills

```
met-annotate
```
— Identify metabolites first
```
met-diff
```
— Get significant metabolites for enrichment
```
xcms-preprocess
```
— Feature extraction upstream