OpenClaw-Medical-Skills single-cell-multi-omics-integration

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/single-multiomics" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-single-cell-multi-omics-integration && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/single-multiomics" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-single-cell-multi-omics-integration && rm -rf "$T"

manifest: skills/single-multiomics/SKILL.md

Single-Cell Multi-Omics Tutorials Cheat Sheet

This skill walk-through summarizes the OmicVerse notebooks that cover paired and unpaired multi-omic integration, multi-batch embedding, reference transfer, and trajectory cartography.

MOFA on paired scRNA + scATAC (

t_mofa.ipynb

)

Data preparation: Load preprocessed AnnData objects for RNA (
```
rna_p_n_raw.h5ad
```
) and ATAC (
```
atac_p_n_raw.h5ad
```
) with
```
ov.utils.read
```
, and initialise
```
pyMOFA
```
with matching
```
omics
```
and
```
omics_name
```
lists.
Model training: Call
```
mofa_preprocess()
```
to select highly variable features and run the factor model with
```
mofa_run(outfile=...)
```
, which exports the learned MOFA+ factors to an HDF5 model file.
Result inspection: Reload downstream AnnData, append factor scores via
```
ov.single.factor_exact
```
, and explore factor–cluster associations using
```
factor_correlation
```
,
```
get_weights
```
, and the plotting helpers in
```
pyMOFAART
```
(
```
plot_r2
```
,
```
plot_cor
```
,
```
plot_factor
```
,
```
plot_weights
```
, etc.).
Export workflow: Persist factors and weights through the MOFA HDF5 artifact and reuse them by instantiating
```
pyMOFAART(model_path=...)
```
for later annotation or visualisation sessions.
Dependencies & hardware: Requires
```
mofapy2
```
; plots optionally rely on
```
pymde
```
/
```
scvi-tools
```
but run on CPU.

MOFA after GLUE pairing (

t_mofa_glue.ipynb

)

Data preparation: Start from GLUE-derived embeddings (
```
rna-emb.h5ad
```
,
```
atac.emb.h5ad
```
), build a
```
GLUE_pair
```
object, and run
```
correlation()
```
to align unpaired cells before subsetting to highly variable features.
Model training: Instantiate
```
pyMOFA
```
with the aligned AnnData objects, run
```
mofa_preprocess()
```
, and save the joint factors through
```
mofa_run(outfile='models/chen_rna_atac.hdf5')
```
.
Result inspection: Use
```
pyMOFAART
```
plus AnnData that now contains the GLUE embeddings to compute factors (
```
get_factors
```
) and visualise variance explained, factor–cluster correlations, and ranked feature weights.
Export workflow: Reuse the saved MOFA HDF5 model for downstream inspection; GLUE embeddings can be embedded with
```
scvi.model.utils.mde
```
(GPU-accelerated MDE is optional,
```
sc.tl.umap
```
works on CPU).
Dependencies & hardware: Requires both
```
mofapy2
```
and the GLUE tooling (
```
scglue
```
,
```
scvi-tools
```
,
```
pymde
```
); GPU acceleration only affects optional MDE visualisation.

SIMBA batch integration (

t_simba.ipynb

)

Data preparation: Fetch the concatenated AnnData (
```
simba_adata_raw.h5ad
```
) derived from multiple pancreas studies and pass it, alongside a results directory, to
```
pySIMBA
```
.
Model training: Execute
```
preprocess(...)
```
to bin features and build a SIMBA-compatible graph, then call
```
gen_graph()
```
followed by
```
train(num_workers=...)
```
to launch PyTorch-BigGraph optimisation (can scale with CPU workers) and
```
load(...)
```
to resume trained checkpoints.
Result inspection: Apply
```
batch_correction()
```
to obtain the harmonised AnnData with SIMBA embeddings (
```
X_simba
```
) and visualise using
```
mde
```
/
```
sc.tl.umap
```
coloured by cell type or batch.
Export workflow: Training outputs reside in the workdir (e.g.,
```
result_human_pancreas/pbg/graph0
```
); reuse them with
```
simba_object.load(...)
```
for later analyses.
Dependencies & hardware: Requires installing
```
simba
```
and
```
simba_pbg
```
(PyTorch BigGraph backend). GPU is optional; make sure adequate CPU threads and memory are available for graph training.

TOSICA reference transfer (

t_tosica.ipynb

)

Data preparation: Download demo AnnData references (
```
demo_train.h5ad
```
,
```
demo_test.h5ad
```
) and required gene-set GMT files via
```
ov.utils.download_tosica_gmt()
```
; confirm datasets are log-normalised before training.
Model training: Create
```
pyTOSICA
```
with the reference AnnData, chosen pathway mask, label key, project directory, and batch size; train with
```
train(epochs=...)
```
, then persist weights with
```
save()
```
and optionally reload via
```
load()
```
.
Result inspection: Generate predictions on query AnnData through
```
predicted(pre_adata=...)
```
, embed with OmicVerse preprocessing and GPU-enabled
```
mde
```
(UMAP fallback available), and explore pathway attention to interpret transformer heads.
Export workflow: Saved project folder keeps model checkpoints and attention summaries; reuse the exported assets to annotate future datasets without retraining from scratch.
Dependencies & hardware: Needs TOSICA (PyTorch transformer) plus downloaded gene-set masks; avoid setting
```
depth=2
```
if memory is constrained. GPU acceleration improves embedding (
```
mde
```
) but training runs on standard PyTorch (CPU/GPU depending on environment).

StaVIA trajectory cartography (

t_stavia.ipynb

)

Data preparation: Load example dentate gyrus velocity data via
```
scvelo.datasets.dentategyrus()
```
, preprocess with OmicVerse (
```
preprocess
```
,
```
scale
```
,
```
pca
```
, neighbours, UMAP) to populate the AnnData matrices used by VIA.
Model training: Configure VIA hyperparameters (components, neighbours, seeds, root selection) and instantiate/run
```
VIA.core.VIA
```
on the chosen representation (
```
adata.obsm['scaled|original|X_pca']
```
).
Result inspection: Store outputs such as pseudotime (
```
single_cell_pt_markov
```
), cluster graph abstractions, trajectory curves, atlas views, and stream plots through VIA plotting helpers.
Export workflow: Persist derived visualisations and animations (e.g.,
```
animate_streamplot_ov
```
,
```
animate_atlas
```
) to files (
```
.gif
```
) for reporting; recompute edge bundles via
```
make_edgebundle_milestone
```
when needed.
Dependencies & hardware: Relies on
```
scvelo
```
,
```
pyVIA
```
, and OmicVerse plotting; computations are CPU-bound though producing large stream/animation outputs benefits from ample memory.

OpenClaw-Medical-Skills single-cell-multi-omics-integration

Single-Cell Multi-Omics Tutorials Cheat Sheet

MOFA on paired scRNA + scATAC (
`t_mofa.ipynb`
)

MOFA after GLUE pairing (
`t_mofa_glue.ipynb`
)

SIMBA batch integration (
`t_simba.ipynb`
)

TOSICA reference transfer (
`t_tosica.ipynb`
)

StaVIA trajectory cartography (
`t_stavia.ipynb`
)

OpenClaw-Medical-Skills single-cell-multi-omics-integration

Single-Cell Multi-Omics Tutorials Cheat Sheet

MOFA on paired scRNA + scATAC (t_mofa.ipynb)

MOFA after GLUE pairing (t_mofa_glue.ipynb)

SIMBA batch integration (t_simba.ipynb)

TOSICA reference transfer (t_tosica.ipynb)

StaVIA trajectory cartography (t_stavia.ipynb)

MOFA on paired scRNA + scATAC (
`t_mofa.ipynb`
)

MOFA after GLUE pairing (
`t_mofa_glue.ipynb`
)

SIMBA batch integration (
`t_simba.ipynb`
)

TOSICA reference transfer (
`t_tosica.ipynb`
)

StaVIA trajectory cartography (
`t_stavia.ipynb`
)