OpenClaw-Medical-Skills bulk-rna-seq-differential-expression-with-omicverse

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-deg-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-differential-expression && rm -rf "$T"
manifest: skills/bulk-deg-analysis/SKILL.md
source content

Bulk RNA-seq differential expression with omicverse

Overview

Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in

t_deg.ipynb
. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.

Instructions

  1. Set up the session
    • Import
      omicverse as ov
      ,
      scanpy as sc
      , and
      matplotlib.pyplot as plt
      .
    • Call
      ov.plot_set()
      so downstream plots adopt omicverse styling.
  2. Prepare ID mapping assets
    • When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
      ov.utils.download_geneid_annotation_pair()
      and store them under
      genesets/
      .
    • Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
  3. Load the raw counts
    • Read tab-delimited featureCounts output with
      ov.pd.read_csv(..., sep='\t', header=1, index_col=0)
      .
    • Strip trailing
      .bam
      segments from column names using list comprehension so sample IDs are clean.
  4. Map gene identifiers
    • Run
      ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')
      to replace
      gene_id
      entries with gene symbols.
  5. Initialise the DEG object
    • Create
      dds = ov.bulk.pyDEG(mapped_counts)
      .
    • Handle duplicate gene symbols with
      dds.drop_duplicates_index()
      to keep the highest expressed version.
  6. Normalise and estimate size factors
    • Execute
      dds.normalize()
      to calculate DESeq2 size factors, correcting for library size and batch differences.
  7. Run differential testing
    • Collect treatment and control replicate labels into lists.
    • Call
      dds.deg_analysis(treatment_groups, control_groups, method='ttest')
      for the default Welch t-test.
    • Offer optional alternatives:
      method='edgepy'
      for edgeR-like tests and
      method='limma'
      for limma-style modelling.
  8. Filter and threshold results
    • Note that lowly expressed genes are retained by default; filter using
      dds.result.loc[dds.result['log2(BaseMean)'] > 1]
      when needed.
    • Set dynamic fold-change and significance cutoffs via
      dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)
      (
      fc_threshold=-1
      auto-selects based on log2FC distribution).
  9. Visualise differential expression
    • Produce volcano plots with
      dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)
      to highlight key genes.
    • Generate per-gene boxplots using
      dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...)
      ; adjust y-axis tick labels if required.
  10. Perform pathway enrichment (optional)
    • Download curated pathway libraries through
      ov.utils.download_pathway_database()
      .
    • Load genesets with
      ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...)
      .
    • Build the DEG gene list from
      dds.result.loc[dds.result['sig'] != 'normal'].index
      .
    • Run enrichment with
      ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...)
      . Encourage users without internet access to provide a
      background
      gene list.
    • Visualise single-library results via
      ov.bulk.geneset_plot(...)
      and combine multiple ontologies using
      ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...)
      .
  11. Document outputs
    • Suggest exporting
      dds.result
      and enrichment tables to CSV for downstream reporting.
    • Encourage users to save figures generated by matplotlib (
      plt.savefig(...)
      ) when running outside notebooks.
  12. Troubleshooting tips
    • Ensure sample labels in
      treatment_groups
      /
      control_groups
      exactly match column names post-cleanup.
    • Verify required packages (
      omicverse
      ,
      pyComplexHeatmap
      ,
      gseapy
      ) are installed for enrichment visualisations.
    • Remind users that internet access is required the first time they download gene mappings or pathway databases.

Examples

  • "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
  • "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
  • "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

References