OpenClaw-Medical-Skills bulk-rna-seq-differential-expression-with-omicverse
Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"
manifest:
skills/bulk-deg-analysis/SKILL.mdsource content
Bulk RNA-seq differential expression with omicverse
Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in
. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.t_deg.ipynb
Instructions
- Set up the session
- Import
,omicverse as ov
, andscanpy as sc
.matplotlib.pyplot as plt - Call
so downstream plots adopt omicverse styling.ov.plot_set()
- Import
- Prepare ID mapping assets
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
and store them underov.utils.download_geneid_annotation_pair()
.genesets/ - Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
- Load the raw counts
- Read tab-delimited featureCounts output with
.ov.pd.read_csv(..., sep='\t', header=1, index_col=0) - Strip trailing
segments from column names using list comprehension so sample IDs are clean..bam
- Read tab-delimited featureCounts output with
- Map gene identifiers
- Run
to replaceov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')
entries with gene symbols.gene_id
- Run
- Initialise the DEG object
- Create
.dds = ov.bulk.pyDEG(mapped_counts) - Handle duplicate gene symbols with
to keep the highest expressed version.dds.drop_duplicates_index()
- Create
- Normalise and estimate size factors
- Execute
to calculate DESeq2 size factors, correcting for library size and batch differences.dds.normalize()
- Execute
- Run differential testing
- Collect treatment and control replicate labels into lists.
- Call
for the default Welch t-test.dds.deg_analysis(treatment_groups, control_groups, method='ttest') - Offer optional alternatives:
for edgeR-like tests andmethod='edgepy'
for limma-style modelling.method='limma'
- Filter and threshold results
- Note that lowly expressed genes are retained by default; filter using
when needed.dds.result.loc[dds.result['log2(BaseMean)'] > 1] - Set dynamic fold-change and significance cutoffs via
(dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)
auto-selects based on log2FC distribution).fc_threshold=-1
- Note that lowly expressed genes are retained by default; filter using
- Visualise differential expression
- Produce volcano plots with
to highlight key genes.dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...) - Generate per-gene boxplots using
; adjust y-axis tick labels if required.dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...)
- Produce volcano plots with
- Perform pathway enrichment (optional)
- Download curated pathway libraries through
.ov.utils.download_pathway_database() - Load genesets with
.ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...) - Build the DEG gene list from
.dds.result.loc[dds.result['sig'] != 'normal'].index - Run enrichment with
. Encourage users without internet access to provide aov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...)
gene list.background - Visualise single-library results via
and combine multiple ontologies usingov.bulk.geneset_plot(...)
.ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...)
- Download curated pathway libraries through
- Document outputs
- Suggest exporting
and enrichment tables to CSV for downstream reporting.dds.result - Encourage users to save figures generated by matplotlib (
) when running outside notebooks.plt.savefig(...)
- Suggest exporting
- Troubleshooting tips
- Ensure sample labels in
/treatment_groups
exactly match column names post-cleanup.control_groups - Verify required packages (
,omicverse
,pyComplexHeatmap
) are installed for enrichment visualisations.gseapy - Remind users that internet access is required the first time they download gene mappings or pathway databases.
- Ensure sample labels in
Examples
- "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."
References
- Detailed walkthrough notebook:
t_deg.ipynb - Sample count matrix for testing:
sample/counts.txt - Quick copy/paste commands:
reference.md