AlterLab-Academic-Skills alterlab-scvi-tools
Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy. Part of the AlterLab Academic Skills suite.
git clone https://github.com/AlterLab-IEU/AlterLab-Academic-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/AlterLab-IEU/AlterLab-Academic-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bioinformatics/alterlab-scvi-tools" ~/.claude/skills/alterlab-ieu-alterlab-academic-skills-alterlab-scvi-tools && rm -rf "$T"
skills/bioinformatics/alterlab-scvi-tools/SKILL.mdscvi-tools
Overview
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
When to Use This Skill
Use this skill when:
- Analyzing single-cell RNA-seq data (dimensionality reduction, batch correction, integration)
- Working with single-cell ATAC-seq or chromatin accessibility data
- Integrating multimodal data (CITE-seq, multiome, paired/unpaired datasets)
- Analyzing spatial transcriptomics data (deconvolution, spatial mapping)
- Performing differential expression analysis on single-cell data
- Conducting cell type annotation or transfer learning tasks
- Working with specialized single-cell modalities (methylation, cytometry, RNA velocity)
- Building custom probabilistic models for single-cell analysis
Core Capabilities
scvi-tools provides models organized by data modality:
1. Single-Cell RNA-seq Analysis
Core models for expression analysis, batch correction, and integration. See
references/models-scrna-seq.md for:
- scVI: Unsupervised dimensionality reduction and batch correction
- scANVI: Semi-supervised cell type annotation and integration
- AUTOZI: Zero-inflation detection and modeling
- VeloVI: RNA velocity analysis
- contrastiveVI: Perturbation effect isolation
2. Chromatin Accessibility (ATAC-seq)
Models for analyzing single-cell chromatin data. See
references/models-atac-seq.md for:
- PeakVI: Peak-based ATAC-seq analysis and integration
- PoissonVI: Quantitative fragment count modeling
- scBasset: Deep learning approach with motif analysis
3. Multimodal & Multi-omics Integration
Joint analysis of multiple data types. See
references/models-multimodal.md for:
- totalVI: CITE-seq protein and RNA joint modeling
- MultiVI: Paired and unpaired multi-omic integration
- MrVI: Multi-resolution cross-sample analysis
4. Spatial Transcriptomics
Spatially-resolved transcriptomics analysis. See
references/models-spatial.md for:
- DestVI: Multi-resolution spatial deconvolution
- Stereoscope: Cell type deconvolution
- Tangram: Spatial mapping and integration
- scVIVA: Cell-environment relationship analysis
5. Specialized Modalities
Additional specialized analysis tools. See
references/models-specialized.md for:
- MethylVI/MethylANVI: Single-cell methylation analysis
- CytoVI: Flow/mass cytometry batch correction
- Solo: Doublet detection
- CellAssign: Marker-based cell type annotation
Typical Workflow
All scvi-tools models follow a consistent API pattern:
# 1. Load and preprocess data (AnnData format) import scvi import scanpy as sc adata = scvi.data.heart_cell_atlas_subsampled() sc.pp.filter_genes(adata, min_counts=3) sc.pp.highly_variable_genes(adata, n_top_genes=1200) # 2. Register data with model (specify layers, covariates) scvi.model.SCVI.setup_anndata( adata, layer="counts", # Use raw counts, not log-normalized batch_key="batch", categorical_covariate_keys=["donor"], continuous_covariate_keys=["percent_mito"] ) # 3. Create and train model model = scvi.model.SCVI(adata) model.train() # 4. Extract latent representations and normalized values latent = model.get_latent_representation() normalized = model.get_normalized_expression(library_size=1e4) # 5. Store in AnnData for downstream analysis adata.obsm["X_scVI"] = latent adata.layers["scvi_normalized"] = normalized # 6. Downstream analysis with scanpy sc.pp.neighbors(adata, use_rep="X_scVI") sc.tl.umap(adata) sc.tl.leiden(adata)
Key Design Principles:
- Raw counts required: Models expect unnormalized count data for optimal performance
- Unified API: Consistent interface across all models (setup → train → extract)
- AnnData-centric: Seamless integration with the scanpy ecosystem
- GPU acceleration: Automatic utilization of available GPUs
- Batch correction: Handle technical variation through covariate registration
Common Analysis Tasks
Differential Expression
Probabilistic DE analysis using the learned generative models:
de_results = model.differential_expression( groupby="cell_type", group1="TypeA", group2="TypeB", mode="change", # Use composite hypothesis testing delta=0.25 # Minimum effect size threshold )
See
references/differential-expression.md for detailed methodology and interpretation.
Model Persistence
Save and load trained models:
# Save model model.save("./model_directory", overwrite=True) # Load model model = scvi.model.SCVI.load("./model_directory", adata=adata)
Batch Correction and Integration
Integrate datasets across batches or studies:
# Register batch information scvi.model.SCVI.setup_anndata(adata, batch_key="study") # Model automatically learns batch-corrected representations model = scvi.model.SCVI(adata) model.train() latent = model.get_latent_representation() # Batch-corrected
Theoretical Foundations
scvi-tools is built on:
- Variational inference: Approximate posterior distributions for scalable Bayesian inference
- Deep generative models: VAE architectures that learn complex data distributions
- Amortized inference: Shared neural networks for efficient learning across cells
- Probabilistic modeling: Principled uncertainty quantification and statistical testing
See
references/theoretical-foundations.md for detailed background on the mathematical framework.
Additional Resources
- Workflows:
contains common workflows, best practices, hyperparameter tuning, and GPU optimizationreferences/workflows.md - Model References: Detailed documentation for each model category in the
directoryreferences/ - Official Documentation: https://docs.scvi-tools.org/en/stable/
- Tutorials: https://docs.scvi-tools.org/en/stable/tutorials/index.html
- API Reference: https://docs.scvi-tools.org/en/stable/api/index.html
Installation
uv pip install scvi-tools # For GPU support uv pip install scvi-tools[cuda]
Best Practices
- Use raw counts: Always provide unnormalized count data to models
- Filter genes: Remove low-count genes before analysis (e.g.,
)min_counts=3 - Register covariates: Include known technical factors (batch, donor, etc.) in
setup_anndata - Feature selection: Use highly variable genes for improved performance
- Model saving: Always save trained models to avoid retraining
- GPU usage: Enable GPU acceleration for large datasets (
)accelerator="gpu" - Scanpy integration: Store outputs in AnnData objects for downstream analysis