Knowledge-work-plugins scvi-tools
Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.
git clone https://github.com/anthropics/knowledge-work-plugins
T=$(mktemp -d) && git clone --depth=1 https://github.com/anthropics/knowledge-work-plugins "$T" && mkdir -p ~/.claude/skills && cp -r "$T/bio-research/skills/scvi-tools" ~/.claude/skills/anthropics-knowledge-work-plugins-scvi-tools && rm -rf "$T"
bio-research/skills/scvi-tools/SKILL.mdscvi-tools Deep Learning Skill
This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.
How to Use This Skill
- Identify the appropriate workflow from the model/workflow tables below
- Read the corresponding reference file for detailed steps and code
- Use scripts in
to avoid rewriting common codescripts/ - For installation or GPU issues, consult
references/environment_setup.md - For debugging, consult
references/troubleshooting.md
When to Use This Skill
- When scvi-tools, scVI, scANVI, or related models are mentioned
- When deep learning-based batch correction or integration is needed
- When working with multi-modal data (CITE-seq, multiome)
- When reference mapping or label transfer is required
- When analyzing ATAC-seq or spatial transcriptomics data
- When learning latent representations of single-cell data
Model Selection Guide
| Data Type | Model | Primary Use Case |
|---|---|---|
| scRNA-seq | scVI | Unsupervised integration, DE, imputation |
| scRNA-seq + labels | scANVI | Label transfer, semi-supervised integration |
| CITE-seq (RNA+protein) | totalVI | Multi-modal integration, protein denoising |
| scATAC-seq | PeakVI | Chromatin accessibility analysis |
| Multiome (RNA+ATAC) | MultiVI | Joint modality analysis |
| Spatial + scRNA reference | DestVI | Cell type deconvolution |
| RNA velocity | veloVI | Transcriptional dynamics |
| Cross-technology | sysVI | System-level batch correction |
Workflow Reference Files
| Workflow | Reference File | Description |
|---|---|---|
| Environment Setup | | Installation, GPU, version info |
| Data Preparation | | Formatting data for any model |
| scRNA Integration | | scVI/scANVI batch correction |
| ATAC-seq Analysis | | PeakVI for accessibility |
| CITE-seq Analysis | | totalVI for protein+RNA |
| Multiome Analysis | | MultiVI for RNA+ATAC |
| Spatial Deconvolution | | DestVI spatial analysis |
| Label Transfer | | scANVI reference mapping |
| scArches Mapping | | Query-to-reference mapping |
| Batch Correction | | Advanced batch methods |
| RNA Velocity | | veloVI dynamics |
| Troubleshooting | | Common issues and solutions |
CLI Scripts
Modular scripts for common workflows. Chain together or modify as needed.
Pipeline Scripts
| Script | Purpose | Usage |
|---|---|---|
| QC, filter, HVG selection | |
| Train any scvi-tools model | |
| Neighbors, UMAP, Leiden | |
| DE analysis | |
| Label transfer with scANVI | |
| Multi-dataset integration | |
| Check data compatibility | |
Example Workflow
# 1. Validate input data python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest # 2. Prepare data (QC, HVG selection) python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000 # 3. Train model python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch # 4. Cluster and visualize python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8 # 5. Differential expression python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
Python Utilities
The
scripts/model_utils.py provides importable functions for custom workflows:
| Function | Purpose |
|---|---|
| Data preparation (QC, HVG, layer setup) |
| Train scVI or scANVI |
| Compute integration metrics |
| Extract DE markers |
| Save model, data, plots |
| Suggest best model |
| Neighbors + UMAP + Leiden |
Critical Requirements
-
Raw counts required: scvi-tools models require integer count data
adata.layers["counts"] = adata.X.copy() # Before normalization scvi.model.SCVI.setup_anndata(adata, layer="counts") -
HVG selection: Use 2000-4000 highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3") adata = adata[:, adata.var['highly_variable']].copy() -
Batch information: Specify batch_key for integration
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
Quick Decision Tree
Need to integrate scRNA-seq data? ├── Have cell type labels? → scANVI (references/label_transfer.md) └── No labels? → scVI (references/scrna_integration.md) Have multi-modal data? ├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md) ├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md) └── scATAC-seq only? → PeakVI (references/atac_peakvi.md) Have spatial data? └── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md) Have pre-trained reference model? └── Map query to reference? → scArches (references/scarches_mapping.md) Need RNA velocity? └── veloVI (references/rna_velocity_velovi.md) Strong cross-technology batch effects? └── sysVI (references/batch_correction_sysvi.md)