OpenClaw-Medical-Skills bulk-rna-seq-deseq2-analysis-with-omicverse
Walk Claude through PyDESeq2-based differential expression, including ID mapping, DE testing, fold-change thresholding, and enrichment visualisation.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-deseq2-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-deseq2-analysis-with-om && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-deseq2-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-deseq2-analysis-with-om && rm -rf "$T"
manifest:
skills/bulk-deseq2-analysis/SKILL.mdsource content
Bulk RNA-seq DESeq2 analysis with omicverse
Overview
Use this skill when a user wants to reproduce the DESeq2 workflow showcased in
. It covers loading raw featureCounts matrices, mapping Ensembl IDs to symbols, running PyDESeq2 via t_deseq2.ipynb
ov.bulk.pyDEG, and exploring downstream enrichment plots.
Instructions
- Import and format the expression matrix
- Call
andimport omicverse as ov
to standardise visuals.ov.utils.ov_plot_set() - Read tab-separated count data from featureCounts using
.ov.utils.read(..., index_col=0, header=1) - Strip trailing
from column names with.bam
.[c.split('/')[-1].replace('.bam', '') for c in data.columns]
- Call
- Map gene identifiers
- Ensure the appropriate mapping pair exists by running
.ov.utils.download_geneid_annotation_pair() - Replace
with gene symbols usinggene_id
.ov.bulk.Matrix_ID_mapping(data, 'genesets/pair_<GENOME>.tsv')
- Ensure the appropriate mapping pair exists by running
- Initialise the DEG object
- Create
from the mapped counts.dds = ov.bulk.pyDEG(data) - Resolve duplicate gene names with
and confirm success in logs.dds.drop_duplicates_index()
- Create
- Define contrasts and run DESeq2
- Collect sample labels into
andtreatment_groups
lists that match column names exactly.control_groups - Execute
to invoke PyDESeq2.dds.deg_analysis(treatment_groups, control_groups, method='DEseq2')
- Collect sample labels into
- Filter and tune thresholds
- Inspect result shape (
) and optionally filter low-expression genes, e.g.dds.result.shape
.dds.result.loc[dds.result['log2(BaseMean)'] > 1] - Set thresholds via
to auto-pick fold-change cutoffs.dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)
- Inspect result shape (
- Visualise differential genes
- Draw volcano plots with
and summarise key genes.dds.plot_volcano(...) - Produce per-gene boxplots:
.dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=(2, 3))
- Draw volcano plots with
- Run enrichment analyses (optional)
- Download enrichment libraries using
and load them throughov.utils.download_pathway_database()
.ov.utils.geneset_prepare - Rank genes for GSEA with
.rnk = dds.ranking2gsea() - Instantiate
and callgsea_obj = ov.bulk.pyGSEA(rnk, pathway_dict)
to compute terms.gsea_obj.enrichment() - Plot enrichment bubble charts via
and GSEA curves withgsea_obj.plot_enrichment(...)
.gsea_obj.plot_gsea(term_num=..., ...)
- Download enrichment libraries using
- Troubleshooting
- If PyDESeq2 raises errors about size factors, remind users to provide raw counts (not log-transformed data).
mapping depends on species; direct them to download the correct genome pair when results look sparse.gene_id- Large pathway libraries may require raising recursion limits or filtering to the top N terms before plotting.
Examples
- "Run PyDESeq2 on treated vs control replicates and highlight the top enriched WikiPathways terms."
- "Filter DEGs to genes with log2(BaseMean) > 1, auto-select fold-change cutoffs, and create volcano and boxplots."
- "Generate the ranked gene list for GSEA and plot the enrichment curve for the top pathway."
References
- Tutorial notebook:
t_deseq2.ipynb - Sample featureCounts matrix:
sample/counts.txt - Quick copy/paste commands:
reference.md