OpenClaw-Medical-Skills single-cell-downstream-analysis
Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/single-downstream-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-single-cell-downstream-analysis && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/single-downstream-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-single-cell-downstream-analysis && rm -rf "$T"
manifest:
skills/single-downstream-analysis/SKILL.mdsource content
Single-cell downstream analysis quick-reference
This skill sheet distills the OmicVerse single-cell downstream tutorials into an executable checklist. Each module highlights prerequisites, the core API entry points, interpretation checkpoints, resource planning notes, and any optional validation or export steps surfaced in the notebooks.
AUCell pathway scoring (t_aucell.ipynb
)
t_aucell.ipynb- Prerequisites
- Download pathway collections (GO, KEGG, or custom) that match the organism under study before running the tutorial.
- Ensure an
object with clustering/embedding (AnnData
) is prepared.adata.obsm['X_umap']
- Core calls
for one pathway;ov.single.geneset_aucell
for multiple pathways.ov.single.pathway_aucell
to score all pathways in a library (setov.single.pathway_aucell_enrichment
for parallelism).num_workers
- Result checks
- Interpret AUCell scores as expression-like values (0–1). Use
to confirm pathway activity patterns.sc.pl.embedding - Run
on the AUCellsc.tl.rank_genes_groups
to find cluster-enriched pathways and visualize withAnnData
.sc.pl.rank_genes_groups_dotplot
- Interpret AUCell scores as expression-like values (0–1). Use
- Resources
- Library-wide scoring can be CPU-intensive; allocate workers (
in tutorial) and sufficient memory for the dense AUCell matrix.num_workers=8
- Library-wide scoring can be CPU-intensive; allocate workers (
- Optional validation / exports
- Persist scores with
for reuse.adata_aucs.write_h5ad('...') - Plot enriched pathways via
andov.single.pathway_enrichment
heatmaps.ov.single.pathway_enrichment_plot
- Persist scores with
scRNA-seq DEG (bulk-style meta cell) (t_scdeg.ipynb
)
t_scdeg.ipynb- Prerequisites
- Run quality control and preprocessing (
,ov.pp.qc
,ov.pp.preprocess
,ov.pp.scale
).ov.pp.pca - Retain raw counts in
before HVG filtering.adata.raw
- Run quality control and preprocessing (
- Core calls
- Construct differential objects with
for full-cell and metacell views.ov.bulk.pyDEG(test_adata.to_df(...).T) - Build metacells via
when GPU is available for acceleration.ov.single.MetaCell(..., use_gpu=True)
- Construct differential objects with
- Result checks
- Inspect volcano plots (
) and targeted boxplots (dds.plot_volcano
) for top DEGs.dds.plot_boxplot - Map DEG markers back to UMAP embeddings using
to confirm localization.ov.utils.embedding
- Inspect volcano plots (
- Resources
- Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices
passed to
.pyDEG
- Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices
passed to
- Optional validation / exports
- Save metacell embeddings with matplotlib figures; adjust
settings for publication-ready visuals.legend_*
- Save metacell embeddings with matplotlib figures; adjust
scRNA-seq DEG (cell-type & composition) (t_deg_single.ipynb
)
t_deg_single.ipynb- Prerequisites
- Annotated
withadata
,condition
, and optionalcell_label
metadata.batch - Initialize mixed CPU/GPU resources when using graph-based DA methods (
).ov.settings.cpu_gpu_mixed_init()
- Annotated
- Core calls
withov.single.DEG(..., method='wilcoxon'|'t-test'|'memento-de')
to target cell types.deg_obj.run(...)
for differential composition testing.ov.single.DCT(..., method='sccoda'|'milo')- Graph setup for Milo:
,ov.pp.preprocess
,ov.single.batch_correction
,ov.pp.neighbors
.ov.pp.umap
- Result checks
- Review DEG tables from
(Wilcoxon / memento) and adjust capture rate / bootstraps for stability.deg_obj - For scCODA, tune FDR via
; interpret boxplots with condition-level shifts.sim_results.set_fdr() - Milo diagnostics: histogram of P-values, logFC vs –log10 FDR scatter, beeswarm of differential abundance.
- Review DEG tables from
- Resources
- Memento and Milo require multiple CPUs (
,num_cpus
, highnum_boot
); ensure adequate compute time.k - Harmony/scVI batch correction needs GPU memory when enabled; plan for VRAM usage.
- Memento and Milo require multiple CPUs (
- Optional validation / exports
- Visual diagnostics include UMAP overlays (
), Milo beeswarm plots, and custom color palettes.ov.pl.embedding
- Visual diagnostics include UMAP overlays (
scDrug response prediction (t_scdrug.ipynb
)
t_scdrug.ipynb- Prerequisites
- Fetch tumor-focused dataset (e.g.,
).infercnvpy.datasets.maynard2020_3k - Download reference assets before running predictions:
- Gene annotations via
(requires GTF from GENCODE or T2T-CHM13).ov.utils.get_gene_annotation
andov.utils.download_GDSC_data()
for drug-response models.ov.utils.download_CaDRReS_model()- Clone CaDRReS-Sc repo (
).git clone https://github.com/CSB5/CaDRReS-Sc
- Gene annotations via
- Fetch tumor-focused dataset (e.g.,
- Core calls
- Tumor resolution detection:
.ov.single.autoResolution(adata, cpus=4) - Drug response runner:
.ov.single.Drug_Response(adata, scriptpath='CaDRReS-Sc', modelpath='models/', output='result')
- Tumor resolution detection:
- Result checks
- Inspect clustering and IC50 outputs stored under
; cross-reference with inferred CNV states.output
- Inspect clustering and IC50 outputs stored under
- Resources
- Requires external CaDRReS-Sc environment (Python/R dependencies) and storage for model downloads.
- Running inferCNV preprocessing may need multiple CPUs and substantial RAM.
- Optional validation / exports
- Persist intermediate
(AnnData
) to reuse for downstream analyses or re-runs.adata.write('scanpyobj.h5ad')
- Persist intermediate
SCENIC regulon discovery (t_scenic.ipynb
)
t_scenic.ipynb- Prerequisites
- Mouse hematopoiesis dataset loaded via
(or provide preprocessed data with raw counts).ov.single.mouse_hsc_nestorowa16() - Download cisTarget ranking databases (
) and motif annotations (*.feather
) for the species; allocatemotifs-*.tbl3 GB disk space and verify paths (
,db_glob
).motif_path
- Mouse hematopoiesis dataset loaded via
- Core calls
- Initialize analysis:
.ov.single.SCENIC(adata, db_glob=..., motif_path=..., n_jobs=12) - Run RegDiffusion-based GRN inference, regulon pruning, and AUCell scoring via the SCENIC object methods.
- Initialize analysis:
- Result checks
- Examine regulon activity matrices (
), RSS scores, and embeddings colored by regulon activity.scenic_obj.auc_mtx.head() - Use RSS plots, dendrograms, and AUCell distributions to interpret TF specificity and activity thresholds.
- Examine regulon activity matrices (
- Resources
- Multi-core CPU recommended (
matches available cores); ensure enough RAM for motif enrichment.n_jobs - Large downloads and intermediate objects (pickle/h5ad) require disk space.
- Multi-core CPU recommended (
- Optional validation / exports
- Save
(scenic_obj
) and regulon AnnData (ov.utils.save
).regulon_ad.write - Optional plots: RSS per cell type, regulon embeddings, AUC histograms with threshold lines, GRN network visualizations.
- Save
cNMF program discovery (t_cnmf.ipynb
)
t_cnmf.ipynb- Prerequisites
- Preprocess with HVG selection (
), scaling (ov.pp.preprocess
), PCA, and have UMAP embeddings for inspection.ov.pp.scale - Select component range (e.g.,
) and iterations; ensure output directory exists.np.arange(5, 11)
- Preprocess with HVG selection (
- Core calls
- Instantiate analysis:
.ov.single.cNMF(..., output_dir='...', name='...') - Factorization workflow:
,cnmf_obj.factorize(...)
,cnmf_obj.combine(...)
,cnmf_obj.k_selection_plot()
.cnmf_obj.consensus(...) - Extract results:
,cnmf_obj.load_results(...)
, optional RF classifier viacnmf_obj.get_results(...)
.get_results_rfc
- Instantiate analysis:
- Result checks
- Evaluate stability via K-selection plot and local density histogram; confirm chosen K with consensus heatmaps.
- Inspect topic usage embeddings (
), cluster labels, and dotplots of top genes.ov.pl.embedding
- Resources
- Multiple iterations and components are CPU-heavy; consider distributing workers (
) and verifying disk space for intermediate factorization files.total_workers
- Multiple iterations and components are CPU-heavy; consider distributing workers (
- Optional validation / exports
- Visualizations include Euclidean distance heatmaps, density histograms, UMAP overlays for topics/clusters, and dotplots.
NOCD overlapping communities (t_nocd.ipynb
)
t_nocd.ipynb- Prerequisites
- Prepare AnnData via
(automated preprocessing) before running NOCD.ov.single.scanpy_lazy - Note: Tutorial warns NOCD implementation is under active development—expect variability.
- Prepare AnnData via
- Core calls
- Pipeline wrapper:
followed by chained methods (scbrca = ov.single.scnocd(adata)
,matrix_transform
,matrix_normalize
,GNN_configure
,GNN_preprocess
,GNN_model
,GNN_result
,GNN_plot
,cal_nocd
).calculate_nocd
- Pipeline wrapper:
- Result checks
- Compare standard Leiden clusters versus NOCD outputs on UMAP embeddings to identify multi-fate cells.
- Resources
- Graph neural network stages can be GPU-accelerated; ensure CUDA availability or be prepared for longer CPU runtimes.
- Track memory usage when constructing large adjacency matrices.
- Optional validation / exports
- Generate multiple UMAP overlays (
) forsc.pl.umap
,nocd
, and Leiden labels using shared color maps.nocd_n
- Generate multiple UMAP overlays (
Lazy pipeline & reporting (t_lazy.ipynb
)
t_lazy.ipynb- Prerequisites
- Install OmicVerse ≥1.7.0 with lazy utilities; supported species currently human/mouse.
- Prepare batch metadata (
) and optionally initialize hybrid compute (sample_key
).ov.settings.cpu_gpu_mixed_init()
- Core calls
- Turnkey preprocessing:
with optionalov.single.lazy(adata, species='mouse', sample_key='batch', ...)
and module-specific kwargs.reforce_steps - Reporting:
to build HTML summary;ov.single.generate_scRNA_report(...)
for citation tracking.ov.generate_reference_table(adata)
- Turnkey preprocessing:
- Result checks
- Inspect generated embeddings (
) for quality and annotation alignment.ov.pl.embedding - Review HTML report for QC metrics, normalization, batch correction, and embeddings.
- Inspect generated embeddings (
- Resources
- Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust
accordingly.reforce_steps - Report generation writes to disk; ensure output path is writable.
- Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust
- Optional validation / exports
- Customize embeddings by color key; store HTML report and reference table alongside project documentation.