LLMs-Universal-Life-Science-and-Clinical-Skills- bulkrna-de
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/bulkrna-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-bulkrna-de && rm -rf "$T"
manifest:
Skills/Transcriptomics/bulkrna-de/SKILL.mdsource content
Bulk RNA-seq Differential Expression
Differential expression analysis for bulk RNA-seq count data. Primary engine is PyDESeq2 (a pure-Python re-implementation of DESeq2); falls back to a scipy Welch's t-test with Benjamini-Hochberg FDR correction when PyDESeq2 is not installed.
CLI Reference
python omicsclaw.py run bulkrna-de --demo python omicsclaw.py run bulkrna-de --input <counts.csv> --output <dir> python bulkrna_de.py --input counts.csv --output results/ --control-prefix ctrl --treat-prefix treat python bulkrna_de.py --demo --output /tmp/bulkrna_de_demo python bulkrna_de.py --input counts.csv --output results/ --method ttest --padj-cutoff 0.01 --lfc-cutoff 1.5
Why This Exists
- Without it: Researchers must manually install and configure DESeq2 in R, handle count normalization, dispersion estimation, and multiple-testing correction across thousands of genes.
- With it: A single Python command runs the full DESeq2 pipeline (via PyDESeq2), produces publication-ready volcano and MA plots, and exports filtered DE tables ready for downstream enrichment.
- Why OmicsClaw: Wraps the gold-standard negative-binomial GLM approach into the OmicsClaw reporting framework with automatic fallback to simpler statistics when dependencies are unavailable.
Workflow
- Load: Read a genes-by-samples raw count matrix (CSV with a
column and sample columns prefixed by condition).gene - Partition: Split sample columns into control and treatment groups by prefix matching.
- Model: Fit a negative-binomial GLM per gene using PyDESeq2 (size-factor normalization, dispersion shrinkage, Wald test). Falls back to Welch's t-test with manual Benjamini-Hochberg FDR if PyDESeq2 is not available.
- Filter: Identify significant genes at the user-specified padj and log2FC cutoffs.
- Visualize: Generate volcano plot, MA plot, and DE summary bar chart.
- Report: Write markdown report, result.json, full and filtered DE tables, and a reproducibility script.
PyDESeq2 Enhancements Adopted
- Multi-factor + continuous covariates: Accept formulas like
or~ batch + condition
, mirroring the design support described in the PyDESeq2 repository (v0.5+) so users can adjust for confounders without leaving Python.~ age + condition - Installer parity:
(PyPI) orpip install pydeseq2
; OmicsClaw checks for either path at runtime before falling back to the Welch t-test.conda install -c bioconda pydeseq2 - CI-aligned dependency floor: Default environment pins to pandas ≥2.0, numpy ≥1.24, scipy ≥1.11, matching PyDESeq2's GitHub Actions matrix to reduce drift.
Reliability Notes
- Treat
import failures as actionable warnings; we emit remediation steps sourced from the upstream README (e.g., missing formulaic or anndata dependencies).pydeseq2 - Every packaged report lists the reliability score for the underlying DE engine so downstream orchestration can decide whether to re-run with an alternate method (edgeR/limma) if desired.
Example Queries
- "Run differential expression on my bulk RNA-seq counts"
- "Find DEGs between control and treatment using DESeq2"
- "Perform bulk RNA DE analysis with a log2FC cutoff of 2"
- "Run bulk DE with t-test fallback on this count matrix"
Output Structure
output_directory/ ├── report.md ├── result.json ├── figures/ │ ├── volcano_plot.png │ ├── ma_plot.png │ └── de_barplot.png ├── tables/ │ ├── de_results.csv │ └── de_significant.csv └── reproducibility/ └── commands.sh
Safety
- Local-first: All processing runs locally; no data is uploaded to external services.
- Disclaimer: Every report includes the standard OmicsClaw disclaimer.
- Audit trail: Parameters, method used (including fallback events), and input checksums are recorded in result.json.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked when user intent matches bulk RNA differential expression keywords.
Chaining partners:
— Upstream: aligned BAM to count matrixbulkrna-alignment
— Downstream: pathway/GO enrichment of significant genesbulkrna-enrichment
Parameters
| Parameter | Default | Description |
|---|---|---|
| | or |
| | Column name prefix for control samples |
| | Column name prefix for treatment samples |
| | Adjusted p-value significance threshold |
| | Absolute log2 fold-change threshold |
Version Compatibility
Reference examples tested with: PyDESeq2 0.4+, scipy 1.11+, pandas 2.0+, numpy 1.24+
Dependencies
Required: numpy, pandas, scipy, matplotlib Optional: pydeseq2 (recommended; pure-Python DESeq2 implementation)
Citations
- DESeq2 — Love et al., Genome Biology 2014
- PyDESeq2 — Muzellec et al., Bioinformatics 2023
- Benjamini-Hochberg — Benjamini & Hochberg, JRSSB 1995
Related Skills
— Read alignment and counting upstreambulkrna-alignment
— Pathway enrichment of DE genes downstreambulkrna-enrichment
— Co-expression network analysisbulkrna-coexpression