LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-trajectory
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/sc-trajectory" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-trajectory-ffb1ad && rm -rf "$T"
Skills/Transcriptomics/sc-trajectory/SKILL.md🛤️ Single-Cell Trajectory Inference
You are SC Trajectory, a specialised OmicsClaw agent for trajectory inference and pseudotime ordering in single-cell data.
Why This Exists
- Without it: Understanding cellular differentiation dynamics requires complex multi-step analysis
- With it: Automated trajectory and pseudotime computation with publication-ready visualisations
- Why OmicsClaw: Standardised trajectory analysis across datasets and methods
Core Capabilities
- DPT (Diffusion Pseudotime): Built-in via Scanpy — fast, robust
- PAGA: Partition-based graph abstraction for coarse-grained trajectories
- Monocle3 (R): Principal graph learning with branch point analysis
- Slingshot (R): Minimum spanning tree + principal curves for smooth lineages
- scVelo (Python): RNA velocity — spliced/unspliced dynamics for cell fate prediction
- CellRank: Fate mapping combining RNA velocity with transcriptomic similarity
Input Formats
| Format | Extension | Required |
|---|---|---|
| AnnData | | Pre-processed (PCA/UMAP done) |
| Loom | | With spliced/unspliced layers (for scVelo) |
Workflow
- Calculate: Set root parameters and infer lineage topology frameworks.
- Execute: Determine pseudotime distance coordinates for all viable paths.
- Assess: Perform branching statistics and gene transition cascades.
- Visualise: Create vector maps (RNA-Velocity) or abstraction curves (PAGA).
- Report: Write metrics defining dynamic differential behaviors.
CLI Reference
python skills/singlecell/trajectory/sc_trajectory.py \ --input <processed.h5ad> --output <dir> python omicsclaw.py run sc-trajectory --demo
Algorithm / Methodology
DPT with PAGA (Scanpy — Python)
Goal: Compute diffusion pseudotime and use PAGA for coarse-grained trajectory visualization.
import scanpy as sc import numpy as np # Compute PAGA sc.tl.paga(adata, groups='leiden') sc.pl.paga(adata, color='leiden', threshold=0.03) # PAGA-initialized UMAP sc.tl.draw_graph(adata, init_pos='paga') sc.pl.draw_graph(adata, color='leiden') # Diffusion pseudotime adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden'] == 'root_cluster')[0] sc.tl.dpt(adata) sc.pl.draw_graph(adata, color='dpt_pseudotime')
Monocle3 (R)
Goal: Infer developmental trajectories and pseudotime ordering using Monocle3's principal graph approach.
Approach: Learn a principal graph through the data manifold, order cells along the graph from a root state, and extract pseudotime values.
library(monocle3) # Create cell_data_set from Seurat cds <- as.cell_data_set(seurat_obj) # Preprocess (if not already done) cds <- preprocess_cds(cds, num_dim = 50) cds <- reduce_dimension(cds, reduction_method = 'UMAP') # Cluster cells cds <- cluster_cells(cds) # Learn trajectory graph cds <- learn_graph(cds) # Order cells (select root interactively or programmatically) cds <- order_cells(cds, root_cells = root_cell_ids) # Plot trajectory with pseudotime plot_cells(cds, color_cells_by = 'pseudotime', label_branch_points = TRUE, label_leaves = TRUE) # Get pseudotime values pseudotime <- pseudotime(cds)
Set Root Programmatically
# Find root by progenitor cluster get_earliest_principal_node <- function(cds, cluster_name) { cell_ids <- which(colData(cds)$seurat_clusters == cluster_name) closest_vertex <- cds@principal_graph_aux[['UMAP']]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[cell_ids, ]) root_pr_nodes <- igraph::V(principal_graph(cds)[['UMAP']])$name[ as.numeric(names(which.max(table(closest_vertex))))] root_pr_nodes } cds <- order_cells(cds, root_pr_nodes = get_earliest_principal_node(cds, 'stem_cluster'))
Slingshot (R)
Goal: Infer smooth lineage trajectories and pseudotime using minimum spanning tree and principal curves.
library(slingshot) library(SingleCellExperiment) # From Seurat object sce <- as.SingleCellExperiment(seurat_obj) reducedDims(sce)$UMAP <- Embeddings(seurat_obj, 'umap') # Run slingshot sce <- slingshot(sce, clusterLabels = 'seurat_clusters', reducedDim = 'UMAP') # Get pseudotime for each lineage pseudotime_mat <- slingPseudotime(sce) # Get lineage curves curves <- slingCurves(sce) # Specify start and end clusters sce <- slingshot(sce, clusterLabels = 'seurat_clusters', reducedDim = 'UMAP', start.clus = 'HSC', end.clus = c('Erythroid', 'Myeloid'))
scVelo RNA Velocity (Python)
Goal: Estimate RNA velocity to predict future cell states from spliced/unspliced transcript ratios.
Approach: Model the dynamics of splicing using stochastic or dynamical models, compute velocity vectors, and project directional flow onto UMAP.
import scvelo as scv import scanpy as sc # Load data with spliced/unspliced counts adata = scv.read('data.h5ad') # Or merge loom files from velocyto ldata = scv.read('velocyto_output.loom') adata = scv.utils.merge(adata, ldata) # Preprocess scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000) scv.pp.moments(adata, n_pcs=30, n_neighbors=30) # Compute velocity (stochastic model) scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata) # Visualize velocity streams scv.pl.velocity_embedding_stream(adata, basis='umap', color='clusters')
scVelo Dynamical Model (More Accurate)
# More accurate but slower scv.tl.recover_dynamics(adata, n_jobs=8) scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata) # Latent time (pseudotime) scv.tl.latent_time(adata) scv.pl.scatter(adata, color='latent_time', cmap='gnuplot') # Velocity confidence scv.tl.velocity_confidence(adata) scv.pl.scatter(adata, color=['velocity_confidence', 'velocity_length'])
Gene Dynamics Along Trajectory
# Monocle3: Find genes varying over pseudotime graph_test_res <- graph_test(cds, neighbor_graph = 'principal_graph', cores = 4) sig_genes <- graph_test_res %>% filter(q_value < 0.05) %>% arrange(desc(morans_I)) # Plot gene expression over pseudotime plot_genes_in_pseudotime(cds[rownames(cds) %in% top_genes, ], color_cells_by = 'cluster')
# scVelo: Top likelihood genes scv.tl.rank_velocity_genes(adata, groupby='clusters', min_corr=0.3) top_genes = adata.uns['rank_velocity_genes']['names'] # Plot phase portraits scv.pl.velocity(adata, var_names=['gene1', 'gene2'], basis='umap')
Branch Point Analysis
# Slingshot + tradeSeq for branch analysis library(tradeSeq) sce <- fitGAM(sce, nknots = 6) branch_res <- earlyDETest(sce, knots = c(3, 4))
Velocyto Preprocessing
# Generate loom file with spliced/unspliced counts velocyto run10x -m repeat_mask.gtf /path/to/cellranger_output annotation.gtf # For SmartSeq2 velocyto run_smartseq2 -o output -m repeat_mask.gtf -e sample bam_files/*.bam annotation.gtf
Parameters
| Parameter | Default | Description |
|---|---|---|
| | Method: dpt, paga, monocle3, slingshot, scvelo |
| auto | Root cluster for pseudotime |
| | Diffusion components |
| | scVelo mode: stochastic, dynamical, deterministic |
Example Queries
- "Compute cellular trajectories via DPT pseudotime"
- "Infer branching timelines using Monocle3 principal graphs"
Version Compatibility
Reference examples tested with: scanpy 1.10+, scvelo 0.3+, cellrank 2.0+
Output Structure
output_dir/ ├── report.md ├── result.json ├── processed.h5ad ├── figures/ │ └── summary_plot.png ├── tables/ │ └── metrics.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required: scanpy >= 1.9 Optional: scvelo, cellrank, monocle3 (R), slingshot (R), tradeSeq (R)
Citations
- DPT — Haghverdi et al., Nature Methods 2016
- PAGA — Wolf et al., Genome Biology 2019
- Monocle3 — Cao et al., Nature 2019
- Slingshot — Street et al., BMC Genomics 2018
- scVelo — Bergen et al., Nature Biotechnology 2020
- CellRank — Lange et al., Nature Methods 2022
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
Chaining partners:
— Prerequisite preprocessingsc-preprocess
— Communication along trajectorysc-communication
— Differential expression along pseudotimesc-de