OpenClaw-Medical-Skills bulk-rna-seq-differential-expression-with-omicverse

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"

manifest: skills/bulk-deg-analysis/SKILL.md

Bulk RNA-seq differential expression with omicverse

Overview

Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in

t_deg.ipynb

. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.

Instructions

Set up the session
- Import
```
omicverse as ov
```
  ,
```
scanpy as sc
```
  , and
```
matplotlib.pyplot as plt
```
  .
- Call
```
ov.plot_set()
```
  so downstream plots adopt omicverse styling.
Prepare ID mapping assets
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
```
ov.utils.download_geneid_annotation_pair()
```
  and store them under
```
genesets/
```
  .
- Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
Load the raw counts
- Read tab-delimited featureCounts output with
```
ov.pd.read_csv(..., sep='\t', header=1, index_col=0)
```
  .
- Strip trailing
```
.bam
```
  segments from column names using list comprehension so sample IDs are clean.

Map gene identifiers

Run

ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')

to replace

gene_id

entries with gene symbols.

Initialise the DEG object
- Create
```
dds = ov.bulk.pyDEG(mapped_counts)
```
  .
- Handle duplicate gene symbols with
```
dds.drop_duplicates_index()
```
  to keep the highest expressed version.
Normalise and estimate size factors
- Execute
```
dds.normalize()
```
  to calculate DESeq2 size factors, correcting for library size and batch differences.
Run differential testing
- Collect treatment and control replicate labels into lists.
- Call
```
dds.deg_analysis(treatment_groups, control_groups, method='ttest')
```
  for the default Welch t-test.
- Offer optional alternatives:
```
method='edgepy'
```
  for edgeR-like tests and
```
method='limma'
```
  for limma-style modelling.
Filter and threshold results
- Note that lowly expressed genes are retained by default; filter using
```
dds.result.loc[dds.result['log2(BaseMean)'] > 1]
```
  when needed.
- Set dynamic fold-change and significance cutoffs via
```
dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)
```
  (
```
fc_threshold=-1
```
  auto-selects based on log2FC distribution).

Visualise differential expression

Produce volcano plots with

dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)

to highlight key genes.

Generate per-gene boxplots using

dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...)

; adjust y-axis tick labels if required.

Perform pathway enrichment (optional)

Download curated pathway libraries through
```
ov.utils.download_pathway_database()
```
.

Load genesets with

ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...)

Build the DEG gene list from

dds.result.loc[dds.result['sig'] != 'normal'].index

Run enrichment with

ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...)

. Encourage users without internet access to provide a

background

gene list.

Visualise single-library results via

ov.bulk.geneset_plot(...)

and combine multiple ontologies using

ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...)

Document outputs
- Suggest exporting
```
dds.result
```
  and enrichment tables to CSV for downstream reporting.
- Encourage users to save figures generated by matplotlib (
```
plt.savefig(...)
```
  ) when running outside notebooks.
Troubleshooting tips
- Ensure sample labels in
```
treatment_groups
```
  /
```
control_groups
```
  exactly match column names post-cleanup.
- Verify required packages (
```
omicverse
```
  ,
```
pyComplexHeatmap
```
  ,
```
gseapy
```
  ) are installed for enrichment visualisations.
- Remind users that internet access is required the first time they download gene mappings or pathway databases.

Examples

"I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
"Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
"Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

References

Detailed walkthrough notebook:
```
t_deg.ipynb
```
Sample count matrix for testing:
```
sample/counts.txt
```
Quick copy/paste commands:
```
reference.md
```