LLMs-Universal-Life-Science-and-Clinical-Skills- bulkrna-enrichment
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/bulkrna-enrichment" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-bulkrna-enrichment && rm -rf "$T"
manifest:
Skills/Transcriptomics/bulkrna-enrichment/SKILL.mdsource content
Bulk RNA-seq Pathway Enrichment
Pathway enrichment analysis for bulk RNA-seq differential expression results. Primary engine is GSEApy for ORA (Enrichr) and GSEA (pre-ranked); falls back to a built-in hypergeometric test with Benjamini-Hochberg FDR correction when GSEApy is not installed.
Why This Exists
- Without it: Researchers must manually extract significant gene lists, format them for external enrichment tools, and cross-reference multiple pathway databases.
- With it: A single Python command runs ORA or GSEA on DE results, produces publication-ready bar and dot plots, and exports filtered enrichment tables ready for biological interpretation.
- Why OmicsClaw: Wraps the standard hypergeometric and rank-based enrichment approaches into the OmicsClaw reporting framework with automatic fallback when optional dependencies are unavailable.
Workflow
- Load: Read a DE results table (CSV with gene, log2FoldChange, pvalue, padj columns).
- Filter: For ORA, extract significant genes at user-specified padj and log2FC cutoffs. For GSEA, rank all genes by a combined score.
- Enrich: Test gene lists against pathway gene sets using hypergeometric ORA or rank-based GSEA.
- Correct: Apply Benjamini-Hochberg multiple testing correction across all terms.
- Visualize: Generate enrichment bar plot and dot plot of top enriched terms.
- Report: Write markdown report, result.json, full and filtered enrichment tables, and a reproducibility script.
CLI Reference
python bulkrna_enrichment.py --input <de_results.csv> --output <dir> --method ora python bulkrna_enrichment.py --input <de_results.csv> --output <dir> --method gsea python bulkrna_enrichment.py --demo --output /tmp/bulkrna_enrichment_demo python bulkrna_enrichment.py --input <de.csv> --output <dir> --gene-set-file custom_sets.json python omicsclaw.py run bulkrna-enrichment --demo
Example Queries
- "Run pathway enrichment on my bulk RNA DE results"
- "Perform GSEA on this differential expression table"
- "Which GO terms are enriched in my upregulated genes?"
- "Run ORA with KEGG pathways on these DEGs"
Algorithm / Methodology
- ORA (Over-Representation Analysis): For each gene set, compute a hypergeometric test p-value measuring the overlap between the user's significant gene list and the gene set, relative to the background gene count.
- GSEA (Gene Set Enrichment Analysis): Rank all genes by log2FC * -log10(pvalue), then for each gene set compute the mean rank compared to random gene sets via permutation testing.
- Multiple testing: Benjamini-Hochberg correction across all tested terms.
- GSEApy integration: When gseapy is installed, use its Enrichr and pre-ranked GSEA implementations. Otherwise, the built-in hypergeometric and rank-based methods provide equivalent core functionality.
Input Formats
| Format | Extension | Required Columns | Example |
|---|---|---|---|
| DE results CSV | | , , , | Output from |
Output Structure
output_directory/ ├── report.md ├── result.json ├── figures/ │ ├── enrichment_barplot.png │ └── enrichment_dotplot.png ├── tables/ │ ├── enrichment_results.csv │ └── enrichment_significant.csv └── reproducibility/ └── commands.sh
Parameters
| Parameter | Default | Description |
|---|---|---|
| | or |
| | Adjusted p-value significance threshold |
| | Absolute log2 fold-change threshold (ORA gene filter) |
| None | Path to custom gene sets JSON (keys=term names, values=gene lists) |
Dependencies
Required: numpy, pandas, scipy, matplotlib Optional: gseapy (recommended; provides Enrichr and full GSEA functionality)
Safety
- Local-first: All processing runs locally; no data is uploaded to external services.
- Disclaimer: Every report includes the standard OmicsClaw disclaimer.
- Audit trail: Parameters, method used (including fallback events), and input checksums are recorded in result.json.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked when user intent matches bulk RNA pathway enrichment keywords.
Chaining partners:
-- Upstream: differential expression to produce DE tablesbulkrna-de
-- Parallel: co-expression modules for functional interpretationbulkrna-coexpression
Citations
- GSEApy -- Python wrapper for GSEA/Enrichr
- MSigDB -- Molecular Signatures Database
- Benjamini-Hochberg -- Benjamini & Hochberg, JRSSB 1995
- Subramanian et al. -- Gene set enrichment analysis, PNAS 2005
Related Skills
-- Differential expression analysis upstreambulkrna-de
-- Co-expression network analysisbulkrna-coexpression
-- Cell type deconvolution of bulk samplesbulkrna-deconvolution