OpenClaw-Medical-Skills tooluniverse-spatial-transcriptomics
Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-spatial-transcriptomics" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-spatial-transcriptomics && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-spatial-transcriptomics" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-spatial-transcriptomics && rm -rf "$T"
skills/tooluniverse-spatial-transcriptomics/SKILL.mdSpatial Transcriptomics Analysis
Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.
When to Use This Skill
Triggers:
- User has spatial transcriptomics data (Visium, MERFISH, seqFISH, etc.)
- Questions about tissue architecture or spatial organization
- Spatial gene expression pattern analysis
- Cell-cell proximity or neighborhood analysis requests
- Tumor microenvironment spatial structure questions
- Integration of spatial with single-cell data
- Spatial domain identification
- Tissue morphology correlation with expression
Example Questions This Skill Solves:
- "Analyze this 10x Visium dataset to identify spatial domains"
- "Which genes show spatially variable expression in this tissue?"
- "Map the tumor microenvironment spatial organization"
- "Find genes enriched at tissue boundaries"
- "Identify cell-cell interactions based on spatial proximity"
- "Integrate spatial transcriptomics with scRNA-seq annotations"
- "Characterize spatial gradients in gene expression"
- "Map ligand-receptor pairs in tissue context"
Core Capabilities
| Capability | Description |
|---|---|
| Data Import | 10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats |
| Quality Control | Spot/cell QC, spatial alignment verification, tissue coverage |
| Normalization | Spatial-aware normalization accounting for tissue heterogeneity |
| Spatial Clustering | Identify spatial domains with similar expression profiles |
| Spatial Variable Genes | Find genes with non-random spatial patterns |
| Neighborhood Analysis | Cell-cell proximity, spatial neighborhoods, niche identification |
| Spatial Patterns | Gradients, boundaries, hotspots, expression waves |
| Integration | Merge with scRNA-seq for cell type mapping |
| Ligand-Receptor Spatial | Map cell communication in tissue context |
| Visualization | Spatial plots, heatmaps on tissue, 3D reconstruction |
Workflow Overview
Input: Spatial Transcriptomics Data + Tissue Image | v Phase 1: Data Import & QC |-- Load spatial coordinates + expression matrix |-- Load tissue histology image |-- Quality control per spot/cell |-- Filter low-quality spots |-- Align spatial coordinates to tissue | v Phase 2: Preprocessing |-- Normalization (spatial-aware methods) |-- Highly variable gene selection |-- Dimensionality reduction (PCA) |-- Spatial lag smoothing (optional) | v Phase 3: Spatial Clustering |-- Identify spatial domains/regions |-- Graph-based clustering with spatial constraints |-- Annotate domains with marker genes |-- Visualize domains on tissue | v Phase 4: Spatial Variable Genes |-- Test for spatial autocorrelation (Moran's I, Geary's C) |-- Identify genes with spatial patterns |-- Classify pattern types (gradient, hotspot, boundary) |-- Rank by spatial significance | v Phase 5: Neighborhood Analysis |-- Define spatial neighborhoods (k-NN, radius) |-- Calculate neighborhood composition |-- Identify interaction zones |-- Niche characterization | v Phase 6: Integration with scRNA-seq |-- Cell type deconvolution per spot |-- Map cell types to spatial locations |-- Predict cell type spatial distributions |-- Validate with marker genes | v Phase 7: Spatial Cell Communication |-- Identify proximal cell type pairs |-- Query ligand-receptor database (OmniPath) |-- Score spatial interactions |-- Map communication hotspots | v Phase 8: Generate Spatial Report |-- Tissue overview with domains |-- Spatially variable genes |-- Cell type spatial maps |-- Interaction networks in tissue context |-- 3D visualization (if applicable)
Phase Details
Phase 1: Data Import & Quality Control
Objective: Load spatial data and assess quality.
Supported platforms:
10x Visium (most common):
- Spots: 55μm diameter, ~50 cells per spot
- Resolution: ~5,000-10,000 spots per capture area
- Data: Expression matrix + spatial coordinates + H&E image
MERFISH/seqFISH (imaging-based):
- Single-cell resolution
- Targeted gene panels (100-10,000 genes)
- Absolute coordinates per cell
Slide-seq/Slide-seqV2:
- 10μm bead resolution
- Genome-wide profiling
Xenium (10x single-cell spatial):
- Single-cell resolution
- Large gene panels (300+ genes)
- Subcellular resolution
Data loading (Visium):
def load_visium_data(data_dir): """ Load 10x Visium spatial transcriptomics data. Expected structure: data_dir/ ├── filtered_feature_bc_matrix/ │ ├── barcodes.tsv.gz │ ├── features.tsv.gz │ └── matrix.mtx.gz ├── spatial/ │ ├── tissue_positions_list.csv │ ├── scalefactors_json.json │ └── tissue_hires_image.png Returns: AnnData object with spatial coordinates """ import scanpy as sc import pandas as pd # Load expression data adata = sc.read_visium(data_dir) # Spatial coordinates are in adata.obsm['spatial'] # Tissue image in adata.uns['spatial'] return adata
Quality Control:
- Spot-level QC:
def spatial_qc(adata): """ Quality control for spatial transcriptomics data. """ import scanpy as sc # Calculate QC metrics sc.pp.calculate_qc_metrics(adata, inplace=True) # Visualize QC metrics spatially sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot') sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot') # Filter criteria # - Min 200 genes per spot # - Min 500 UMI counts per spot # - Max mitochondrial content < 20% sc.pp.filter_cells(adata, min_genes=200) sc.pp.filter_cells(adata, min_counts=500) # Mitochondrial filtering adata.var['mt'] = adata.var_names.str.startswith('MT-') sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True) adata = adata[adata.obs['pct_counts_mt'] < 20].copy() return adata
- Spatial alignment verification:
def verify_spatial_alignment(adata): """ Verify spatial coordinates align with tissue image. """ import matplotlib.pyplot as plt # Plot spots on tissue image fig, ax = plt.subplots(figsize=(10, 10)) # Tissue image img = adata.uns['spatial']['tissue_hires_image'] ax.imshow(img) # Overlay spot coordinates coords = adata.obsm['spatial'] ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5) ax.set_title('Spatial Alignment Verification') plt.axis('off')
Phase 2: Preprocessing & Normalization
Objective: Normalize data accounting for spatial heterogeneity.
Normalization:
def normalize_spatial(adata): """ Normalize spatial transcriptomics data. """ import scanpy as sc # Filter genes (min 3 spots) sc.pp.filter_genes(adata, min_cells=3) # Normalize to median total counts sc.pp.normalize_total(adata, target_sum=1e4) # Log-transform sc.pp.log1p(adata) # Store raw counts adata.raw = adata return adata
Highly variable genes:
def select_hvg_spatial(adata): """ Select highly variable genes for spatial analysis. """ import scanpy as sc # Standard HVG selection sc.pp.highly_variable_genes(adata, n_top_genes=2000) # Optionally: weight by spatial autocorrelation # Genes with spatial patterns are more informative return adata
Spatial smoothing (optional):
def spatial_smooth(adata, radius=2): """ Smooth expression by averaging over spatial neighbors. Useful for noisy data, but can blur boundaries. """ from sklearn.neighbors import NearestNeighbors # Find spatial neighbors coords = adata.obsm['spatial'] nn = NearestNeighbors(n_neighbors=radius, metric='euclidean') nn.fit(coords) distances, indices = nn.kneighbors(coords) # Smooth expression matrix X_smooth = adata.X.copy() for i in range(adata.n_obs): neighbors = indices[i] X_smooth[i] = adata.X[neighbors].mean(axis=0) adata.layers['smoothed'] = X_smooth return adata
Phase 3: Spatial Clustering
Objective: Identify spatial domains (regions with distinct expression).
Graph-based clustering with spatial constraints:
def spatial_clustering(adata, n_neighbors=6): """ Cluster spots into spatial domains. Uses both expression similarity AND spatial proximity. """ import scanpy as sc import squidpy as sq # PCA for dimensionality reduction sc.pp.pca(adata, n_comps=50) # Build spatial neighbor graph sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors) # Clustering with spatial constraints # Uses both PCA space and spatial graph sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain') # Visualize domains on tissue sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains') return adata
Domain marker genes:
def find_domain_markers(adata): """ Identify marker genes for each spatial domain. """ import scanpy as sc # Differential expression per domain sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon') # Get top markers per domain markers = sc.get.rank_genes_groups_df(adata, group=None) return markers
Phase 4: Spatially Variable Genes
Objective: Find genes with non-random spatial patterns.
Moran's I (spatial autocorrelation):
def identify_spatial_genes(adata): """ Test for spatial autocorrelation using Moran's I. Moran's I > 0: Positive spatial autocorrelation (clustering) Moran's I ~ 0: Random spatial distribution Moran's I < 0: Negative autocorrelation (checkerboard) """ import squidpy as sq # Calculate Moran's I for all genes sq.gr.spatial_autocorr( adata, mode='moran', n_perms=100, n_jobs=-1 ) # Results in adata.uns['moranI'] spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False) # Filter significant spatial genes (FDR < 0.05) sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05] return sig_spatial
Spatial pattern classification:
def classify_spatial_patterns(adata, spatial_genes): """ Classify types of spatial patterns. Pattern types: - Gradient: Smooth directional change - Hotspot: Localized high expression - Boundary: Expression at domain edges - Periodic: Regular spacing """ patterns = {} for gene in spatial_genes.index[:100]: # Top 100 spatial genes # Get expression and coordinates expr = adata[:, gene].X.toarray().flatten() coords = adata.obsm['spatial'] # Detect pattern type pattern_type = detect_pattern_type(expr, coords) patterns[gene] = pattern_type return patterns
Phase 5: Neighborhood Analysis
Objective: Analyze cell-cell proximity and spatial niches.
Define spatial neighborhoods:
def analyze_neighborhoods(adata, radius=150): """ Analyze spatial neighborhood composition. For each spot, characterize its microenvironment. """ import squidpy as sq # Calculate neighborhood enrichment # Tests if cell types are enriched in proximity sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain') # Visualize neighborhood enrichment sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain') # Results: which domains are spatially proximal? return adata
Interaction zones:
def identify_interaction_zones(adata, domain_a, domain_b): """ Find boundary regions between two spatial domains. These are hotspots for cell-cell interactions. """ # Get spots from each domain spots_a = adata.obs['spatial_domain'] == domain_a spots_b = adata.obs['spatial_domain'] == domain_b # Find spots that neighbor the other domain # (spots from A that have neighbors in B) coords = adata.obsm['spatial'] from sklearn.neighbors import NearestNeighbors nn = NearestNeighbors(n_neighbors=6) nn.fit(coords) distances, indices = nn.kneighbors(coords) interaction_spots = [] for i, spot_in_a in enumerate(spots_a): if spot_in_a: neighbors = indices[i] if any(spots_b[neighbors]): interaction_spots.append(i) # Mark interaction zone adata.obs['interaction_zone'] = False adata.obs.loc[interaction_spots, 'interaction_zone'] = True return adata
Phase 6: Integration with Single-Cell RNA-seq
Objective: Map cell types from scRNA-seq to spatial locations.
Cell type deconvolution:
def deconvolve_cell_types(adata_spatial, adata_sc): """ Predict cell type composition per spatial spot. Uses scRNA-seq reference to deconvolve Visium spots. Methods: Cell2location, Tangram, SPOTlight """ import cell2location # Prepare single-cell reference # Extract signature genes per cell type cell_type_signatures = extract_signatures(adata_sc) # Run cell2location # Estimates cell type abundances per spot mod = cell2location.models.Cell2location( adata_spatial, cell_state_df=cell_type_signatures ) mod.train(max_epochs=30000) # Add cell type proportions to adata_spatial adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions() return adata_spatial
Spatial cell type mapping:
def map_cell_types_spatial(adata): """ Visualize cell type spatial distributions. """ import scanpy as sc # For each cell type, plot abundance on tissue cell_types = adata.obsm['cell_type_fractions'].columns for ct in cell_types: sc.pl.spatial( adata, color=adata.obsm['cell_type_fractions'][ct], title=f'{ct} Spatial Distribution' )
Phase 7: Spatial Cell Communication
Objective: Map ligand-receptor interactions in tissue context.
Spatial proximity-based communication:
def spatial_cell_communication(adata): """ Identify cell-cell communication based on spatial proximity. Requires: - Cell type annotations (from deconvolution) - Ligand-receptor database (OmniPath) """ import squidpy as sq from tooluniverse import ToolUniverse tu = ToolUniverse() # Get ligand-receptor pairs from OmniPath lr_pairs = tu.run_one_function({ "name": "OmniPath_get_ligand_receptor_interactions", "arguments": {"partners": ""} # Get all pairs }) # For each cell type pair that are spatially proximal # Calculate interaction scores sq.gr.ligrec( adata, n_perms=100, cluster_key='cell_type', interactions=lr_pairs, copy=False ) # Visualize significant interactions sq.pl.ligrec(adata, cluster_key='cell_type') return adata
Communication hotspot mapping:
def map_communication_hotspots(adata, ligand, receptor): """ Map spatial locations of specific L-R interactions. """ import matplotlib.pyplot as plt # Get ligand expression ligand_expr = adata[:, ligand].X.toarray().flatten() # Get receptor expression receptor_expr = adata[:, receptor].X.toarray().flatten() # Interaction score = ligand × receptor interaction_score = ligand_expr * receptor_expr # Add to adata adata.obs[f'{ligand}_{receptor}_score'] = interaction_score # Visualize on tissue sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score', title=f'{ligand}-{receptor} Interaction Hotspots')
Phase 8: Spatial Report Generation
Generate comprehensive spatial report:
# Spatial Transcriptomics Analysis Report ## Dataset Summary - **Platform**: 10x Visium - **Tissue**: Breast cancer tumor section - **Spots**: 3,562 (after QC filtering) - **Genes**: 18,432 detected - **Resolution**: 55μm spot diameter (~50 cells/spot) ## Quality Control - **Mean genes per spot**: 3,245 - **Mean UMI counts**: 12,543 - **Mitochondrial content**: 8.2% average - **Tissue coverage**: 85% of capture area ## Spatial Domains Identified - **7 distinct spatial domains** detected via graph-based clustering - Domain 1: Tumor core (32% of tissue) - Domain 2: Invasive margin (18%) - Domain 3: Stromal region (25%) - Domain 4: Immune infiltrate (12%) - Domain 5: Necrotic region (8%) - Domain 6: Normal epithelium (3%) - Domain 7: Adipose tissue (2%) ## Top Marker Genes per Domain ### Domain 1 (Tumor Core) - EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor) ### Domain 2 (Invasive Margin) - VIM, FN1, MMP2, SNAI2 (EMT signature) ### Domain 4 (Immune Infiltrate) - CD3D, CD8A, CD4, PTPRC (T cell enriched) - CD68, CD14 (macrophage enriched) ## Spatially Variable Genes - **456 genes with significant spatial patterns** (Moran's I, FDR < 0.05) ### Top 10 Spatial Genes 1. **MKI67** (I=0.82) - Hotspot pattern in tumor core 2. **CD8A** (I=0.78) - Gradient from margin to stroma 3. **VIM** (I=0.75) - Boundary enrichment at invasive margin 4. **COL1A1** (I=0.71) - Stromal-specific expression 5. **EPCAM** (I=0.69) - Tumor region pattern ## Cell Type Deconvolution Integration with scRNA-seq reference (Bassez et al. 2021) ### Cell Type Spatial Distributions - **Tumor cells**: Concentrated in core, sparse at margin - **T cells**: Enriched at invasive margin and infiltrate zones - **CAFs**: Stromal region and invasive margin - **Macrophages**: Scattered, enriched near necrosis - **B cells**: Lymphoid aggregates (2% of tissue) ### Tumor Microenvironment Composition - Tumor core: 85% tumor cells, 10% CAFs, 5% immune - Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich) - Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells ## Spatial Cell Communication ### Top L-R Interactions (Spatially Proximal) 1. **Tumor → T cell**: CD274 (PD-L1) → PDCD1 (PD-1) - Hotspot: Invasive margin - Interpretation: Immune checkpoint evasion 2. **CAF → Tumor**: TGFB1 → TGFBR2 - Hotspot: Stromal-tumor interface - Interpretation: TGF-β-driven EMT 3. **Macrophage → Tumor**: TNF → TNFRSF1A - Scattered across tumor - Interpretation: Inflammatory signaling ### Interaction Zones - **Tumor-Immune Interface**: 245 spots (7% of tissue) - High expression: CXCL10, CXCL9 (chemokines) - T cell recruitment and activation - **Stromal-Tumor Interface**: 387 spots (11% of tissue) - High expression: MMP2, MMP9 (matrix remodeling) - Invasion-promoting niche ## Spatial Gradients - **Hypoxia gradient**: HIF1A, VEGFA increase toward tumor core - **Proliferation gradient**: MKI67, TOP2A decrease from core to margin - **Immune gradient**: CD8A, GZMB peak at invasive margin ## Biological Interpretation Spatial analysis reveals distinct tumor microenvironment organization: 1. **Tumor core**: Highly proliferative, hypoxic, immune-excluded 2. **Invasive margin**: Active EMT, high immune infiltration, checkpoint expression 3. **Stromal barrier**: CAF-rich, matrix remodeling, immunosuppressive signals The invasive margin shows hallmarks of immune-tumor interaction with PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy resistance at tumor-stroma interface. ## Clinical Relevance - **Checkpoint inhibitor response**: High immune infiltration at margin suggests potential - **Resistance mechanisms**: CAF barrier and TGF-β signaling - **Biomarkers**: Spatial arrangement of immune cells more predictive than bulk tumor metrics
Integration with ToolUniverse Skills
| Skill | Used For | Phase |
|---|---|---|
| scRNA-seq reference for deconvolution | Phase 6 |
(Phase 10) | L-R database for communication | Phase 7 |
| Pathway enrichment for spatial domains | Phase 3 |
| Integrate with other omics | Phase 8 |
Example Use Cases
Use Case 1: Tumor Microenvironment Mapping
Question: "Map the spatial organization of tumor, immune, and stromal cells"
Workflow:
- Load Visium data, QC and normalize
- Spatial clustering → 7 domains identified
- Cell type deconvolution using scRNA-seq reference
- Map cell type distributions spatially
- Identify interaction zones (tumor-immune, tumor-stroma)
- Analyze L-R interactions in each zone
- Report: Comprehensive TME spatial architecture
Use Case 2: Developmental Gradient Analysis
Question: "Identify spatial gene expression gradients in developing tissue"
Workflow:
- Load spatial data (e.g., mouse embryo)
- Identify spatially variable genes
- Classify gradient patterns (anterior-posterior, dorsal-ventral)
- Map morphogen expression (WNT, BMP, FGF)
- Correlate with cell fate markers
- Report: Developmental spatial patterns
Use Case 3: Brain Region Identification
Question: "Automatically segment brain tissue into anatomical regions"
Workflow:
- Load Visium mouse brain data
- Spatial clustering with high resolution
- Match domains to known brain regions (cortex, hippocampus, etc.)
- Identify region-specific marker genes
- Validate with Allen Brain Atlas
- Report: Automated brain region annotation
Quantified Minimums
| Component | Requirement |
|---|---|
| Spots/cells | At least 500 spatial locations |
| QC | Filter low-quality spots, verify alignment |
| Spatial clustering | At least one method (graph-based or spatial) |
| Spatial genes | Moran's I or similar spatial test |
| Visualization | Spatial plots on tissue images |
| Report | Domains, spatial genes, visualizations |
Limitations
- Resolution: Visium spots contain multiple cells (not single-cell)
- Gene coverage: Imaging methods have limited gene panels
- 3D structure: Most platforms are 2D sections
- Tissue quality: Requires well-preserved tissue for imaging
- Computational: Large datasets require significant memory
- Reference dependency: Deconvolution quality depends on scRNA-seq reference
References
Methods:
- Squidpy: https://doi.org/10.1038/s41592-021-01358-2
- Cell2location: https://doi.org/10.1038/s41587-021-01139-4
- SpatialDE: https://doi.org/10.1038/nmeth.4636
Platforms: