install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/Metagenomics/bioSkills/abundance-estimation" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-abundance-estimati && rm -rf "$T"
manifest:
Skills/Genomics/Metagenomics/bioSkills/abundance-estimation/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-metagenomics-abundance description: Species abundance estimation using Bracken with Kraken2 output. Redistributes reads from higher taxonomic levels to species for more accurate estimates. Use when accurate species-level abundances are needed from Kraken2 classification output. tool_type: cli primary_tool: bracken measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Abundance Estimation with Bracken
Basic Abundance Estimation
# Run Bracken on Kraken2 report bracken -d /path/to/kraken2_db \ -i kraken_report.txt \ -o bracken_output.txt \ -r 150 \ # Read length (100, 150, 200, 250, 300) -l S # Taxonomic level
Full Workflow with Kraken2
# Step 1: Classify with Kraken2 kraken2 --db /path/to/kraken2_db \ --threads 8 \ --paired \ --report sample_kraken_report.txt \ reads_R1.fastq.gz reads_R2.fastq.gz # Step 2: Estimate abundances with Bracken bracken -d /path/to/kraken2_db \ -i sample_kraken_report.txt \ -o sample_bracken_species.txt \ -w sample_bracken_report.txt \ -r 150 \ -l S
Different Taxonomic Levels
# Species level (default) bracken -d db -i report.txt -o species.txt -r 150 -l S # Genus level bracken -d db -i report.txt -o genus.txt -r 150 -l G # Family level bracken -d db -i report.txt -o family.txt -r 150 -l F # Phylum level bracken -d db -i report.txt -o phylum.txt -r 150 -l P
Build Bracken Database
# Build Bracken database for specific read lengths # Run AFTER building Kraken2 database bracken-build -d /path/to/kraken2_db -t 8 -l 150 # Build for multiple read lengths bracken-build -d /path/to/kraken2_db -t 8 -l 100 bracken-build -d /path/to/kraken2_db -t 8 -l 250
Output Format
name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads Escherichia coli 562 S 5234 1245 6479 0.52 Staphylococcus aureus 1280 S 2156 456 2612 0.21
Filter Low-Abundance Taxa
# Use threshold for minimum reads bracken -d db \ -i report.txt \ -o bracken.txt \ -r 150 \ -l S \ -t 10 # Minimum reads threshold
Combine Multiple Samples
# Run Bracken on each sample for report in kraken_reports/*.txt; do sample=$(basename $report _kraken_report.txt) bracken -d db -i $report -o bracken/${sample}_species.txt -r 150 -l S done # Combine into abundance matrix combine_bracken_outputs.py --files bracken/*_species.txt -o combined_abundance.txt
Parse Bracken Output in Python
import pandas as pd bracken = pd.read_csv('bracken_output.txt', sep='\t') bracken_sorted = bracken.sort_values('new_est_reads', ascending=False) bracken_sorted[['name', 'fraction_total_reads']].head(20) total_reads = bracken['new_est_reads'].sum() bracken['relative_abundance'] = bracken['new_est_reads'] / total_reads * 100
Convert to Relative Abundance
import pandas as pd df = pd.read_csv('bracken_output.txt', sep='\t') total = df['new_est_reads'].sum() df['relative_abundance'] = df['new_est_reads'] / total * 100 df.to_csv('bracken_relative_abundance.txt', sep='\t', index=False)
Create Abundance Matrix
import pandas as pd import os files = [f for f in os.listdir('bracken') if f.endswith('_species.txt')] dfs = [] for f in files: sample = f.replace('_species.txt', '') df = pd.read_csv(f'bracken/{f}', sep='\t') df = df[['name', 'new_est_reads']].rename(columns={'new_est_reads': sample}) dfs.append(df) merged = dfs[0] for df in dfs[1:]: merged = merged.merge(df, on='name', how='outer') merged = merged.fillna(0) merged.to_csv('abundance_matrix.txt', sep='\t', index=False)
Key Parameters
| Parameter | Description |
|---|---|
| -d | Kraken2 database path |
| -i | Input Kraken2 report |
| -o | Output abundance file |
| -w | Output updated report (optional) |
| -r | Read length used |
| -l | Taxonomic level |
| -t | Minimum read threshold |
Taxonomic Levels
| Level | Code | Description |
|---|---|---|
| Kingdom | K | Bacteria, Archaea |
| Phylum | P | Major divisions |
| Class | C | Class level |
| Order | O | Order level |
| Family | F | Family level |
| Genus | G | Genus level |
| Species | S | Species level |
Read Length Options
Pre-built databases typically include: 50, 75, 100, 150, 200, 250, 300 bp
Choose the length closest to your actual read length.
Related Skills
- kraken-classification - Generate Kraken2 report
- metaphlan-profiling - Alternative profiling method
- metagenome-visualization - Visualize abundances