OpenClaw-Medical-Skills single-cell-clustering-and-batch-correction-with-omicverse
Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/single-clustering" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-single-cell-clustering-and-batch-cor && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/single-clustering" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-single-cell-clustering-and-batch-cor && rm -rf "$T"
manifest:
skills/single-clustering/SKILL.mdsource content
Single-cell clustering and batch correction with omicverse
Overview
This skill distills the single-cell tutorials
and t_cluster.ipynb
. Use it when a user wants to preprocess an t_single_batch.ipynb
AnnData object, explore clustering alternatives (Leiden, Louvain, scICE, GMM, topic/cNMF models), and evaluate or harmonise batches with omicverse utilities.
Instructions
- Import libraries and set plotting defaults
- Load
,omicverse as ov
, and plotting helpers (scanpy as sc
when using dentate gyrus demo data).scvelo as scv - Apply
orov.plot_set()
so figures adopt omicverse styling before embedding plots.ov.utils.ov_plot_set()
- Load
- Load data and annotate batches
- For demo clustering, fetch
; for integration, read providedscv.datasets.dentategyrus()
files via.h5ad
and setov.read()
identifiers for each cohort.adata.obs['batch'] - Confirm inputs are sparse numeric matrices; convert with
when required for QC steps.adata.X = adata.X.astype(np.int64)
- For demo clustering, fetch
- Run quality control
- Execute
to drop low-quality cells and inspect summary statistics per batch.ov.pp.qc(adata, tresh={'mito_perc': 0.2, 'nUMIs': 500, 'detected_genes': 250}, batch_key='batch') - Save intermediate filtered objects (
) so users can resume from clean checkpoints.adata.write_h5ad(...)
- Execute
- Preprocess and select features
- Call
to normalise, log-transform, and flag highly variable genes; assignov.pp.preprocess(adata, mode='shiftlog|pearson', n_HVGs=3000, batch_key=None)
and subset toadata.raw = adata
for downstream modelling.adata.var.highly_variable_features - Scale expression (
) and compute PCA scores withov.pp.scale(adata)
. Encourage reviewing variance explained viaov.pp.pca(adata, layer='scaled', n_pcs=50)
.ov.utils.plot_pca_variance_ratio(adata)
- Call
- Construct neighbourhood graph and baseline clustering
- Build neighbour graph using
orsc.pp.neighbors(adata, n_neighbors=15, n_pcs=50, use_rep='scaled|original|X_pca')
.ov.pp.neighbors(...) - Generate Leiden or Louvain labels through
,ov.utils.cluster(adata, method='leiden'|'louvain', resolution=1)
, orov.single.leiden(adata, resolution=1.0)
; remind users that resolution tunes granularity.ov.pp.leiden(adata, resolution=1) - IMPORTANT - Dependency checks: Always verify prerequisites before clustering or plotting:
# Before clustering: check neighbors graph exists if 'neighbors' not in adata.uns: if 'X_pca' in adata.obsm: ov.pp.neighbors(adata, n_neighbors=15, use_rep='X_pca') else: raise ValueError("PCA must be computed before neighbors graph") # Before plotting by cluster: check clustering was performed if 'leiden' not in adata.obs: ov.single.leiden(adata, resolution=1.0) - Visualise embeddings with
and confirm cluster separation. Always check that columns inov.pl.embedding(adata, basis='X_umap', color=['clusters','leiden'], frameon='small', wspace=0.5)
parameter exist incolor=
before plotting.adata.obs
- Build neighbour graph using
- Explore advanced clustering strategies
- scICE consensus: instantiate
and inspect stability viamodel = ov.utils.cluster(adata, method='scICE', use_rep='scaled|original|X_pca', resolution_range=(4,20), n_boot=50, n_steps=11)
before selectingmodel.plot_ic(figsize=(6,4))
groups.model.best_k - Gaussian mixtures: run
for model-based assignments.ov.utils.cluster(..., method='GMM', n_components=21, covariance_type='full', tol=1e-9, max_iter=1000) - Topic modelling: fit
, reviewLDA_obj = ov.utils.LDA_topic(...)
, derive cluster calls withLDA_obj.plot_topic_contributions(6)
and optionally refine usingLDA_obj.predicted(k)
.LDA_obj.get_results_rfc(...) - cNMF programs: initialise
, factorise (cnmf_obj = ov.single.cNMF(... components=np.arange(5,11), n_iter=20, num_highvar_genes=2000, output_dir=...)
,factorize
), select K viacombine
, and propagate usage scores back withk_selection_plot
andcnmf_obj.get_results(...)
.cnmf_obj.get_results_rfc(...)
- scICE consensus: instantiate
- Evaluate clustering quality
- Compare predicted labels against known references with
and report metrics for each method (Leiden, Louvain, GMM, LDA variants, cNMF models) to justify chosen parameters.adjusted_rand_score(adata.obs['clusters'], adata.obs['leiden'])
- Compare predicted labels against known references with
- Embed with multiple layouts
- Use
to create MDE projections from different latent spaces (ov.utils.mde(...)
, harmonised embeddings, topic compositions) and plot viaadata.obsm["scaled|original|X_pca"]
orov.utils.embedding(..., color=['batch','cell_type'])
for consistent review of cluster/batch mixing.ov.pl.embedding
- Use
- Perform batch correction and integration
- Apply
sequentially to generate harmonised embeddings stored inov.single.batch_correction(adata, batch_key='batch', methods='harmony'|'combat'|'scanorama'|'scVI'|'CellANOVA', n_pcs=50, ...)
(adata.obsm
,X_harmony
,X_combat
,X_scanorama
,X_scVI
). ForX_cellanova
, mention latent size (scVI
) andn_latent=30
; for CellANOVA define control pools viagene_likelihood="nb"
.control_dict - After each correction, project to 2D with
and visualiseov.utils.mde
vsbatch
to check mixing and conservation.cell_type
- Apply
- Benchmark integration performance
- Persist final object (
) and reload when necessary.adata.write_h5ad('neurips2021_batch_all.h5ad', compression='gzip') - Use
with embeddings list (scib_metrics.benchmark.Benchmarker
) to compute batch-vs-biology trade-offs via["X_pca", "X_combat", "X_harmony", "X_cellanova", "X_scanorama", "X_mira_topic", "X_mira_feature", "X_scVI"]
and summarise withbm.benchmark()
.bm.plot_results_table(min_max_scale=False)
- Persist final object (
- General troubleshooting
- Ensure
captures the unscaled log-normalised matrix before subsetting to HVGs.adata.raw - Confirm
strings exist inuse_rep='scaled|original|X_pca'
prior to clustering; rerun preprocessing if missing.adata.obsm - Monitor memory when running cNMF or scVI; adjust
,n_iter
, or latent dimensions for smaller datasets.components - Pipeline dependency errors: When you encounter errors like "Could not find 'leiden' in adata.obs", always check and add prerequisites:
- Before leiden/louvain clustering → ensure
'neighbors' in adata.uns - Before plotting by clustering → ensure the cluster column exists in
adata.obs - Before UMAP/embedding → ensure PCA or another dimensionality reduction is complete
- Before leiden/louvain clustering → ensure
- Code generation pattern: When generating multi-step code, use defensive checks rather than assuming prior steps completed successfully. This prevents cascading failures when users run steps out of order or in separate sessions.
- Ensure
Examples
- "Normalise dentate gyrus cells, compare Leiden, scICE, and GMM clusters, and report ARI scores versus provided
."clusters - "Batch-correct three NeurIPS datasets with Harmony and scVI, produce MDE embeddings coloured by
andbatch
, and benchmark the embeddings."cell_type - "Fit topic and cNMF models on a preprocessed AnnData object, retrieve classifier-refined cluster calls, and visualise the resulting programs on UMAP."
References
- Clustering walkthrough:
t_cluster.ipynb - Batch integration walkthrough:
t_single_batch.ipynb - Quick copy/paste commands:
reference.md