OpenClaw-Medical-Skills bulk-rna-seq-batch-correction-with-combat
Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-combat-correction" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-batch-correction-with-c && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-combat-correction" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-batch-correction-with-c && rm -rf "$T"
manifest:
skills/bulk-combat-correction/SKILL.mdsource content
Bulk RNA-seq batch correction with ComBat
Overview
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
before downstream analysis. It follows
, w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.t_bulk_combat.ipynb
Instructions
- Import core libraries
- Load
,omicverse as ov
,anndata
, andpandas as pd
.matplotlib.pyplot as plt - Call
(aliasedov.ov_plot_set()
in some releases) to align figures with omicverse styling.ov.plot_set()
- Load
- Load each batch separately
- Read the prepared pickled matrices (or user-provided expression tables) with
/pd.read_pickle(...)
.pd.read_csv(...) - Transpose to gene × sample before wrapping them in
objects soanndata.AnnData
stores sample metadata.adata.obs - Assign a
column for every cohort (batch
,adata.obs['batch'] = '1'
, ...). Encourage descriptive labels when availa ble.'2'
- Read the prepared pickled matrices (or user-provided expression tables) with
- Concatenate on shared genes
- Use
to retain the intersection of genes across batches.anndata.concat([adata1, adata2, adata3], merge='same') - Confirm the combined
reports balanced sample counts per batch; if not, prompt users to re-check inputs.adata
- Use
- Run ComBat batch correction
- Execute
.ov.bulk.batch_correction(adata, batch_key='batch') - Explain that corrected values are stored in
while the original counts remain inadata.layers['batch_correction']
.adata.X
- Execute
- Export corrected and raw matrices
- Obtain DataFrames via
(raw) andadata.to_df().T
(corrected).adata.to_df(layer='batch_correction').T - Encourage saving both tables (
) plus the harmonised AnnData (.to_csv(...)
).adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')
- Obtain DataFrames via
- Benchmark the correction
- For per-sample variance checks, draw before/after boxplots and recolour boxes using
,ov.utils.red_color
,blue_color
palettes to match batches.gree n_color - Copy raw counts to a named layer with
before PCA.adata.layers['raw'] = adata.X.copy() - Run
andov.pp.pca(adata, layer='raw', n_pcs=50)
.ov.pp.pca(adata, layer='batch_correction', n_pcs=50) - Visualise embeddings with
and repeat fo r the corrected layer to verify mixing.ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small')
- For per-sample variance checks, draw before/after boxplots and recolour boxes using
- Troubleshooting tips
- Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
- pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
- If
layer is missing, ensure thebatch_correction
matches the column name inbatch_key
.adata.obs
Examples
- "Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
- "Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
- "Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."
References
- Tutorial notebook:
t_bulk_combat.ipynb - Example inputs:
omicverse_guide/docs/Tutorials-bulk/data/combat/ - Quick copy/paste commands:
reference.md