install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/Metagenomics/bioSkills/functional-profiling" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-functional-profili && rm -rf "$T"
manifest:
Skills/Genomics/Metagenomics/bioSkills/functional-profiling/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-metagenomics-functional-profiling description: Profile functional potential of metagenomes using HUMAnN3 and similar tools. Use when obtaining pathway abundances, gene family counts, or functional annotations from metagenomic data. tool_type: cli primary_tool: humann measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Functional Profiling
Profile the functional potential of metagenomic samples using HUMAnN3 to get pathway and gene family abundances.
HUMAnN3 Workflow
Installation
# Install via conda (recommended) conda create -n humann -c bioconda humann conda activate humann # Download databases humann_databases --download chocophlan full /path/to/databases humann_databases --download uniref uniref90_diamond /path/to/databases # Update config with database paths humann_config --update database_folders nucleotide /path/to/databases/chocophlan humann_config --update database_folders protein /path/to/databases/uniref
Basic Usage
# Run HUMAnN3 on a single sample humann --input sample.fastq.gz --output sample_humann # With MetaPhlAn taxonomic profile (faster) humann --input sample.fastq.gz \ --taxonomic-profile sample_metaphlan.txt \ --output sample_humann # Paired-end reads (concatenate first) cat sample_R1.fq.gz sample_R2.fq.gz > sample_concat.fq.gz humann --input sample_concat.fq.gz --output sample_humann
Output Files
sample_humann/ ├── sample_genefamilies.tsv # Gene family abundances (UniRef90) ├── sample_pathabundance.tsv # MetaCyc pathway abundances ├── sample_pathcoverage.tsv # Pathway coverage (0-1) └── sample_humann_temp/ # Intermediate files
Output Format
Gene Families
# Gene Family sample_Abundance-RPKs UniRef90_A0A000|g__Bacteroides.s__Bacteroides_vulgatus 123.45 UniRef90_A0A001|unclassified 67.89 UNMAPPED 1000.0
Pathway Abundance
# Pathway sample_Abundance PWY-5100: pyruvate fermentation 456.78 PWY-5100|g__Bacteroides.s__Bacteroides_vulgatus 234.56 PWY-5100|unclassified 222.22
Batch Processing
# Process multiple samples for fq in *.fastq.gz; do sample=$(basename $fq .fastq.gz) humann --input $fq --output ${sample}_humann --threads 8 done # Join tables across samples humann_join_tables -i . -o merged_genefamilies.tsv --file_name genefamilies humann_join_tables -i . -o merged_pathabundance.tsv --file_name pathabundance
Normalization
# Normalize to relative abundance humann_renorm_table -i merged_genefamilies.tsv \ -o genefamilies_relab.tsv \ -u relab # Normalize to copies per million (CPM) humann_renorm_table -i merged_pathabundance.tsv \ -o pathabundance_cpm.tsv \ -u cpm
Regroup Gene Families
# Regroup to different functional categories # EC numbers humann_regroup_table -i genefamilies.tsv \ -g uniref90_level4ec \ -o genefamilies_ec.tsv # KEGG Orthologs humann_regroup_table -i genefamilies.tsv \ -g uniref90_ko \ -o genefamilies_ko.tsv # GO terms humann_regroup_table -i genefamilies.tsv \ -g uniref90_go \ -o genefamilies_go.tsv # Pfam domains humann_regroup_table -i genefamilies.tsv \ -g uniref90_pfam \ -o genefamilies_pfam.tsv
Stratification
Split by Organism
# Unstratify (remove organism info, sum across species) humann_split_stratified_table -i merged_pathabundance.tsv \ -o . # Creates: merged_pathabundance_unstratified.tsv # merged_pathabundance_stratified.tsv
Species Contributions
import pandas as pd df = pd.read_csv('merged_pathabundance.tsv', sep='\t', index_col=0) unstratified = df[~df.index.str.contains('\\|')] stratified = df[df.index.str.contains('\\|')] def get_species_contrib(pathway, df): '''Get species contributions to a pathway''' mask = df.index.str.startswith(pathway + '|') return df[mask] contrib = get_species_contrib('PWY-5100', stratified)
Quality Control
# Check unmapped and unintegrated humann_barplot -i merged_pathabundance.tsv \ -o pathabundance_barplot.png \ --focal-feature UNMAPPED
Key QC Metrics
| Metric | Good | Concerning |
|---|---|---|
| UNMAPPED (gene families) | <30% | >50% |
| UNINTEGRATED (pathways) | <40% | >60% |
| Pathway coverage | >0.5 | <0.3 |
Differential Analysis
LEfSe Format
# Format for LEfSe humann_join_tables -i . -o merged.tsv --file_name pathabundance humann_renorm_table -i merged.tsv -o merged_relab.tsv -u relab
Python Analysis
import pandas as pd from scipy import stats df = pd.read_csv('pathabundance_cpm.tsv', sep='\t', index_col=0) metadata = pd.read_csv('metadata.tsv', sep='\t', index_col=0) group1 = metadata[metadata['condition'] == 'healthy'].index group2 = metadata[metadata['condition'] == 'disease'].index results = [] for pathway in df.index: if '|' not in pathway and pathway != 'UNMAPPED': vals1 = df.loc[pathway, group1] vals2 = df.loc[pathway, group2] stat, pval = stats.mannwhitneyu(vals1, vals2) fc = vals2.mean() / (vals1.mean() + 1e-10) results.append({'pathway': pathway, 'pvalue': pval, 'fold_change': fc}) results_df = pd.DataFrame(results) results_df['padj'] = stats.false_discovery_control(results_df['pvalue'])
Visualization
import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv('pathabundance_relab.tsv', sep='\t', index_col=0) df = df[~df.index.str.contains('\\|')] df = df.drop(['UNMAPPED', 'UNINTEGRATED'], errors='ignore') top = df.mean(axis=1).nlargest(20).index plt.figure(figsize=(12, 8)) sns.heatmap(df.loc[top].T, cmap='viridis', xticklabels=True) plt.tight_layout() plt.savefig('pathway_heatmap.png')
Related Skills
- metagenomics/metaphlan-profiling - Taxonomic profiling (input for HUMAnN)
- metagenomics/kraken-classification - Alternative taxonomy
- metagenomics/metagenome-visualization - Visualization methods
- pathway-analysis/kegg-pathways - KEGG pathway interpretation