LLMs-Universal-Life-Science-and-Clinical-Skills- bulkrna-de

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/bulkrna-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-bulkrna-de && rm -rf "$T"

manifest: Skills/Transcriptomics/bulkrna-de/SKILL.md

Bulk RNA-seq Differential Expression

Differential expression analysis for bulk RNA-seq count data. Primary engine is PyDESeq2 (a pure-Python re-implementation of DESeq2); falls back to a scipy Welch's t-test with Benjamini-Hochberg FDR correction when PyDESeq2 is not installed.

CLI Reference

python omicsclaw.py run bulkrna-de --demo
python omicsclaw.py run bulkrna-de --input <counts.csv> --output <dir>
python bulkrna_de.py --input counts.csv --output results/ --control-prefix ctrl --treat-prefix treat
python bulkrna_de.py --demo --output /tmp/bulkrna_de_demo
python bulkrna_de.py --input counts.csv --output results/ --method ttest --padj-cutoff 0.01 --lfc-cutoff 1.5

Why This Exists

Without it: Researchers must manually install and configure DESeq2 in R, handle count normalization, dispersion estimation, and multiple-testing correction across thousands of genes.
With it: A single Python command runs the full DESeq2 pipeline (via PyDESeq2), produces publication-ready volcano and MA plots, and exports filtered DE tables ready for downstream enrichment.
Why OmicsClaw: Wraps the gold-standard negative-binomial GLM approach into the OmicsClaw reporting framework with automatic fallback to simpler statistics when dependencies are unavailable.

Workflow

Load: Read a genes-by-samples raw count matrix (CSV with a
```
gene
```
column and sample columns prefixed by condition).
Partition: Split sample columns into control and treatment groups by prefix matching.
Model: Fit a negative-binomial GLM per gene using PyDESeq2 (size-factor normalization, dispersion shrinkage, Wald test). Falls back to Welch's t-test with manual Benjamini-Hochberg FDR if PyDESeq2 is not available.
Filter: Identify significant genes at the user-specified padj and log2FC cutoffs.
Visualize: Generate volcano plot, MA plot, and DE summary bar chart.
Report: Write markdown report, result.json, full and filtered DE tables, and a reproducibility script.

PyDESeq2 Enhancements Adopted

Multi-factor + continuous covariates: Accept formulas like
```
~ batch + condition
```
or
```
~ age + condition
```
, mirroring the design support described in the PyDESeq2 repository (v0.5+) so users can adjust for confounders without leaving Python.
Installer parity:
```
pip install pydeseq2
```
(PyPI) or
```
conda install -c bioconda pydeseq2
```
; OmicsClaw checks for either path at runtime before falling back to the Welch t-test.
CI-aligned dependency floor: Default environment pins to pandas ≥2.0, numpy ≥1.24, scipy ≥1.11, matching PyDESeq2's GitHub Actions matrix to reduce drift.

Reliability Notes

Treat
```
pydeseq2
```
import failures as actionable warnings; we emit remediation steps sourced from the upstream README (e.g., missing formulaic or anndata dependencies).
Every packaged report lists the reliability score for the underlying DE engine so downstream orchestration can decide whether to re-run with an alternate method (edgeR/limma) if desired.

Example Queries

"Run differential expression on my bulk RNA-seq counts"
"Find DEGs between control and treatment using DESeq2"
"Perform bulk RNA DE analysis with a log2FC cutoff of 2"
"Run bulk DE with t-test fallback on this count matrix"

Output Structure

output_directory/
├── report.md
├── result.json
├── figures/
│   ├── volcano_plot.png
│   ├── ma_plot.png
│   └── de_barplot.png
├── tables/
│   ├── de_results.csv
│   └── de_significant.csv
└── reproducibility/
    └── commands.sh

Safety

Local-first: All processing runs locally; no data is uploaded to external services.
Disclaimer: Every report includes the standard OmicsClaw disclaimer.
Audit trail: Parameters, method used (including fallback events), and input checksums are recorded in result.json.

Integration with Orchestrator

Trigger conditions:

Automatically invoked when user intent matches bulk RNA differential expression keywords.

Chaining partners:

```
bulkrna-alignment
```
— Upstream: aligned BAM to count matrix
```
bulkrna-enrichment
```
— Downstream: pathway/GO enrichment of significant genes

Parameters

Parameter	Default	Description
`--method`	`pydeseq2`	`pydeseq2` or `ttest`
`--control-prefix`	`ctrl`	Column name prefix for control samples
`--treat-prefix`	`treat`	Column name prefix for treatment samples
`--padj-cutoff`	`0.05`	Adjusted p-value significance threshold
`--lfc-cutoff`	`1.0`	Absolute log2 fold-change threshold

Version Compatibility

Reference examples tested with: PyDESeq2 0.4+, scipy 1.11+, pandas 2.0+, numpy 1.24+

Dependencies

Required: numpy, pandas, scipy, matplotlib Optional: pydeseq2 (recommended; pure-Python DESeq2 implementation)

Citations

DESeq2 — Love et al., Genome Biology 2014
PyDESeq2 — Muzellec et al., Bioinformatics 2023
Benjamini-Hochberg — Benjamini & Hochberg, JRSSB 1995

Related Skills

```
bulkrna-alignment
```
— Read alignment and counting upstream
```
bulkrna-enrichment
```
— Pathway enrichment of DE genes downstream
```
bulkrna-coexpression
```
— Co-expression network analysis