git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/K-Dense-AI/claude-scientific-skills/scvelo" ~/.claude/skills/comeonoliver-skillshub-scvelo && rm -rf "$T"
skills/K-Dense-AI/claude-scientific-skills/scvelo/SKILL.mdscVelo — RNA Velocity Analysis
Overview
scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation:
pip install scvelo
Key resources:
- Documentation: https://scvelo.readthedocs.io/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
When to Use This Skill
Use scVelo when:
- Trajectory inference from snapshot data: Determine which direction cells are differentiating
- Cell fate prediction: Identify progenitor cells and their downstream fates
- Driver gene identification: Find genes whose dynamics best explain observed trajectories
- Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
- Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
- Complement to Scanpy: Add directional information to UMAP embeddings
Prerequisites
scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
- STARsolo or kallisto|bustools with
modelamanno - velocyto CLI:
/velocyto run10xvelocyto run - alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an
AnnData object with layers["spliced"] and layers["unspliced"].
Standard RNA Velocity Workflow
1. Setup and Data Loading
import scvelo as scv import scanpy as sc import numpy as np import matplotlib.pyplot as plt # Configure settings scv.settings.verbosity = 3 # Show computation steps scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo') # Load data (AnnData with spliced/unspliced layers) # Option A: Load from loom (velocyto output) adata = scv.read("cellranger_output.loom", cache=True) # Option B: Merge velocyto loom with Scanpy-processed AnnData adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity) # Verify layers print(adata) # obs × var: N × G # layers: 'spliced', 'unspliced' (required) # obsm['X_umap'] (required for visualization)
2. Preprocessing
# Filter and normalize (follows Scanpy conventions) scv.pp.filter_and_normalize( adata, min_shared_counts=20, # Minimum counts in spliced+unspliced n_top_genes=2000 # Top highly variable genes ) # Compute first and second order moments (means and variances) # knn_connectivities must be computed first sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )
3. Velocity Estimation — Stochastic Model
The stochastic model is fast and suitable for exploratory analysis:
# Stochastic velocity (faster, less accurate) scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata) # Visualize scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (Stochastic)" )
4. Velocity Estimation — Dynamical Model (Recommended)
The dynamical model fits the full splicing kinetics and is more accurate:
# Recover dynamics (computationally intensive; ~10-30 min for 10K cells) scv.tl.recover_dynamics(adata, n_jobs=4) # Compute velocity from dynamical model scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)
5. Latent Time
The dynamical model enables computation of a shared latent time (pseudotime):
# Compute latent time scv.tl.latent_time(adata) # Visualize latent time on UMAP scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='Latent time' ) # Identify top genes ordered by latent time top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )
6. Driver Gene Analysis
# Identify genes with highest velocity fit scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) df = scv.DataFrame(adata.uns['rank_velocity_genes']['names']) print(df.head(10)) # Speed and coherence scv.tl.velocity_confidence(adata) scv.pl.scatter( adata, c=['velocity_length', 'velocity_confidence'], cmap='coolwarm', perc=[5, 95] ) # Phase portraits for specific genes scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))
7. Velocity Arrows and Pseudotime
# Arrow plot on UMAP scv.pl.velocity_embedding( adata, arrow_length=3, arrow_size=2, color='leiden', basis='umap' ) # Stream plot (cleaner visualization) scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', smooth=0.8, min_mass=4 ) # Velocity pseudotime (alternative to latent time) scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
8. PAGA Trajectory Graph
# PAGA graph with velocity-informed transitions scv.tl.paga(adata, groups='leiden') df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T df.style.background_gradient(cmap='Blues').format('{:.2g}') # Plot PAGA with velocity scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )
Complete Workflow Script
import scvelo as scv import scanpy as sc def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4): """ Complete RNA velocity workflow. Args: adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm n_top_genes: Number of top HVGs for velocity mode: 'stochastic' (fast) or 'dynamical' (accurate) n_jobs: Parallel jobs for dynamical model Returns: Processed AnnData with velocity information """ scv.settings.verbosity = 2 # 1. Preprocessing scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes) if 'neighbors' not in adata.uns: sc.pp.neighbors(adata, n_neighbors=30) scv.pp.moments(adata, n_pcs=30, n_neighbors=30) # 2. Velocity estimation if mode == 'dynamical': scv.tl.recover_dynamics(adata, n_jobs=n_jobs) scv.tl.velocity(adata, mode=mode) scv.tl.velocity_graph(adata) # 3. Downstream analyses if mode == 'dynamical': scv.tl.latent_time(adata) scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) scv.tl.velocity_confidence(adata) scv.tl.velocity_pseudotime(adata) return adata
Key Output Fields in AnnData
After running the workflow, the following fields are added:
| Location | Key | Description |
|---|---|---|
| | RNA velocity per gene per cell |
| | Fitted latent time per gene per cell |
| | 2D velocity vectors on UMAP |
| | Pseudotime from velocity |
| | Latent time from dynamical model |
| | Speed of each cell |
| | Confidence score per cell |
| | Gene-level model fit quality |
| | Transcription rate |
| | Splicing rate |
| | Degradation rate |
| | Cell-cell transition probability matrix |
Velocity Models Comparison
| Model | Speed | Accuracy | When to Use |
|---|---|---|---|
| Fast | Moderate | Exploratory; large datasets |
| Medium | Moderate | Simple linear kinetics |
| Slow | High | Publication-quality; identifies driver genes |
Best Practices
- Start with stochastic mode for exploration; switch to dynamical for final analysis
- Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
- Minimum 2,000 cells: RNA velocity is noisy with fewer cells
- Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
- k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
- Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
- Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
Troubleshooting
| Problem | Solution |
|---|---|
| Missing unspliced layer | Re-run velocyto or use STARsolo with |
| Very few velocity genes | Lower ; check sequencing depth |
| Random-looking arrows | Try different or velocity model |
| Memory error with dynamical | Set ; reduce |
| Negative velocity everywhere | Check that spliced/unspliced layers are not swapped |
Additional Resources
- scVelo documentation: https://scvelo.readthedocs.io/
- Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
- velocyto (preprocessing): http://velocyto.org/
- CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
- dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/