OpenClaw-Medical-Skills bulk-rna-seq-batch-correction-with-combat

Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-combat-correction" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-batch-correction-with-c && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-combat-correction" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-batch-correction-with-c && rm -rf "$T"
manifest: skills/bulk-combat-correction/SKILL.md
source content

Bulk RNA-seq batch correction with ComBat

Overview

Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them before downstream analysis. It follows

t_bulk_combat.ipynb
, w hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.

Instructions

  1. Import core libraries
    • Load
      omicverse as ov
      ,
      anndata
      ,
      pandas as pd
      , and
      matplotlib.pyplot as plt
      .
    • Call
      ov.ov_plot_set()
      (aliased
      ov.plot_set()
      in some releases) to align figures with omicverse styling.
  2. Load each batch separately
    • Read the prepared pickled matrices (or user-provided expression tables) with
      pd.read_pickle(...)
      /
      pd.read_csv(...)
      .
    • Transpose to gene × sample before wrapping them in
      anndata.AnnData
      objects so
      adata.obs
      stores sample metadata.
    • Assign a
      batch
      column for every cohort (
      adata.obs['batch'] = '1'
      ,
      '2'
      , ...). Encourage descriptive labels when availa ble.
  3. Concatenate on shared genes
    • Use
      anndata.concat([adata1, adata2, adata3], merge='same')
      to retain the intersection of genes across batches.
    • Confirm the combined
      adata
      reports balanced sample counts per batch; if not, prompt users to re-check inputs.
  4. Run ComBat batch correction
    • Execute
      ov.bulk.batch_correction(adata, batch_key='batch')
      .
    • Explain that corrected values are stored in
      adata.layers['batch_correction']
      while the original counts remain in
      adata.X
      .
  5. Export corrected and raw matrices
    • Obtain DataFrames via
      adata.to_df().T
      (raw) and
      adata.to_df(layer='batch_correction').T
      (corrected).
    • Encourage saving both tables (
      .to_csv(...)
      ) plus the harmonised AnnData (
      adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')
      ).
  6. Benchmark the correction
    • For per-sample variance checks, draw before/after boxplots and recolour boxes using
      ov.utils.red_color
      ,
      blue_color
      ,
      gree n_color
      palettes to match batches.
    • Copy raw counts to a named layer with
      adata.layers['raw'] = adata.X.copy()
      before PCA.
    • Run
      ov.pp.pca(adata, layer='raw', n_pcs=50)
      and
      ov.pp.pca(adata, layer='batch_correction', n_pcs=50)
      .
    • Visualise embeddings with
      ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small')
      and repeat fo r the corrected layer to verify mixing.
  7. Troubleshooting tips
    • Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
    • pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
    • If
      batch_correction
      layer is missing, ensure the
      batch_key
      matches the column name in
      adata.obs
      .

Examples

  • "Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
  • "Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
  • "Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."

References