BioClaw differential-expression
Bulk transcriptomics differential expression with count-aware modeling, design validation, contrast handling, thresholded exports, and publication-ready DE figures.
install
source · Clone the upstream repo
git clone https://github.com/Runchuan-BU/BioClaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Runchuan-BU/BioClaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/container/skills/differential-expression" ~/.claude/skills/runchuan-bu-bioclaw-differential-expression && rm -rf "$T"
manifest:
container/skills/differential-expression/SKILL.mdsource content
Differential Expression
Version Compatibility
Reference examples assume:
0.4+pydeseq2
2.2+pandas
1.26+numpy
3.8+matplotlib
Verify before use:
- Python:
python -c "import pydeseq2, pandas; print(pydeseq2.__version__, pandas.__version__)"
Overview
Use this skill for count-based DE from bulk RNA-seq or similar count matrices when the user needs:
- robust model fitting
- explicit contrasts
- ranked gene tables
- volcano and MA plots
- pathway-ready output tables
When To Use This Skill
- raw count matrix and sample metadata are available
- the task is condition, treatment, or genotype comparison
- batch or pairing terms may need explicit modeling
Quick Route
- no replicates: do not pretend formal DE is robust
- 2 replicates per group: possible but conservative interpretation
- 3 or more replicates per group: standard starting point
Progressive Disclosure
- Read technical_reference.md for design formulas, confounding checks, and contrast logic.
- Read commands_and_thresholds.md for PyDESeq2 code, recommended filters, and output file conventions.
Prerequisites
| Requirement | Recommendation |
|---|---|
| minimum replicates per group | |
| preferred replicates per group | |
| input values | raw integer counts |
Expected Inputs
- raw count matrix
- sample metadata
- explicit contrast such as treated vs control
Expected Outputs
results/de_results.tsvresults/de_ranked_genes.tsvfigures/volcano.pdffigures/ma_plot.pdfqc/sample_pca.pdf
Starter Pattern
from pydeseq2.dds import DeseqDataSet from pydeseq2.ds import DeseqStats dds = DeseqDataSet( counts=counts_df, metadata=metadata_df, design_factors=["condition", "batch"], ) dds.deseq2() stats = DeseqStats(dds, contrast=("condition", "treated", "control")) stats.summary() res = stats.results_df.sort_values("padj") res.to_csv("results/de_results.tsv", sep="\t")
Workflow
1. Validate the design
Check:
- replicate counts
- factor levels
- batch balance
- paired structure
- confounded variables
2. Fit a count-aware model
Use raw counts, not TPM or log-normalized expression, for count-based DE frameworks.
3. Apply explicit filtering and ranking
Common reporting thresholds:
padj < 0.05abs(log2FoldChange) >= 1
Export both the full table and a thresholded table.
4. Visualize results
At minimum:
- sample PCA
- volcano plot
- MA plot
5. Export pathway-ready artifacts
Produce a ranked gene list sorted by signed effect or Wald statistic for enrichment workflows.
Output Artifacts
results/ ├── de_results.tsv ├── de_significant.tsv └── de_ranked_genes.tsv figures/ ├── sample_pca.pdf ├── volcano.pdf └── ma_plot.pdf qc/ └── design_check.tsv
Quality Review
- raw counts only for model fitting
- no fully confounded batch and condition
- outlier samples reviewed before publication claims
- all final tables should include
,baseMean
,log2FoldChange
, andpvaluepadj
Anti-Patterns
- running DE on TPM as if it were count-based
- omitting batch or pairing terms that clearly exist
- showing only thresholded genes and hiding the full table
- using p-value alone without effect size
Related Skills
- Bulk RNA Expression
- RNA Quantification
- Pathway Analysis
Optional Supplements
pydeseq2