LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-de
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Spatial_Omics/spatial-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-de && rm -rf "$T"
manifest:
Skills/Spatial_Omics/spatial-de/SKILL.mdsource content
🧬 Spatial DE
You are Spatial DE, the differential expression and marker gene discovery skill for OmicsClaw. Your role is to identify differentially expressed genes between spatial clusters or user-defined groups, producing ranked marker gene tables, dot plots, and volcano plots.
Why This Exists
- Without it: Users manually run
with inconsistent parameters and no structured outputsc.tl.rank_genes_groups - With it: One command discovers markers per cluster or between two groups, with publication-ready figures and reproducible reports
- Why OmicsClaw: Standardised DE ensures consistent methodology across spatial analysis pipelines
Core Capabilities
- Cluster-vs-rest markers: Rank genes per cluster using Wilcoxon, t-test, or PyDESeq2
- Two-group comparison: Compare any two groups within a groupby column
- Multiple methods: Wilcoxon (default, non-parametric), t-test (parametric, fast), PyDESeq2 (pseudobulk, gold standard)
- Dot plot: Top marker genes per cluster
- Volcano plot: Log2 fold-change vs. −log10 p-value for two-group comparisons
- Marker table: CSV of top N markers per cluster with scores, p-values, and log fold-changes
Input Formats
| Format | Extension | Required | Example |
|---|---|---|---|
| Preprocessed AnnData | | Normalised, with clusters in | |
| Demo | n/a | flag | Runs spatial-preprocess demo first |
Workflow
- Load: Read preprocessed h5ad (output of spatial-preprocess)
- Validate: Ensure groupby column exists; fallback to minimal preprocessing if missing
- Rank genes:
for cluster-vs-restsc.tl.rank_genes_groups(adata, groupby, method) - Two-group (optional): If
and--group1
provided, run pairwise comparison--group2 - Tables: Extract top N markers per group to
; full results tomarkers_top.csvde_full.csv - Figures: Dot plot of top markers; volcano plot if two-group mode
- Report: Write report.md, result.json, processed.h5ad, figures, reproducibility bundle
CLI Reference
# Cluster-vs-rest markers (default: Wilcoxon) python skills/spatial-de/spatial_de.py \ --input <processed.h5ad> --output <report_dir> # Two-group comparison python skills/spatial-de/spatial_de.py \ --input <processed.h5ad> --output <dir> --group1 0 --group2 1 # Use t-test method python skills/spatial-de/spatial_de.py \ --input <file> --method t-test --output <dir> # Use PyDESeq2 for pseudobulk DE python skills/spatial-de/spatial_de.py \ --input <file> --method pydeseq2 --group1 0 --group2 1 --output <dir> # Demo mode python skills/spatial-de/spatial_de.py --demo --output /tmp/de_demo # Via OmicsClaw runner python omicsclaw.py run spatial-de --input <file> --output <dir> python omicsclaw.py run spatial-de --demo
Algorithm / Methodology
Wilcoxon (default)
- Cluster-vs-rest:
sc.tl.rank_genes_groups(adata, groupby=groupby, method='wilcoxon') - Non-parametric: Robust to non-normal distributions
- Fast: Suitable for large datasets
t-test
- Parametric:
sc.tl.rank_genes_groups(adata, groupby=groupby, method='t-test') - Welch's t-test: Assumes normality, faster than Wilcoxon
- Use case: Quick exploratory analysis
PyDESeq2
- Pseudobulk: Aggregates counts per sample/replicate
- Negative binomial GLM: Gold standard for RNA-seq DE
- Requires: Sample-level replicates for proper statistical modeling
- Use case: Publication-quality DE with proper dispersion estimation
Common steps
- Two-group comparison:
sc.tl.rank_genes_groups(adata, groupby=groupby, groups=[group1], reference=group2, method=method) - Marker extraction:
to produce structured DataFramessc.get.rank_genes_groups_df - Volcano plot: x-axis = log2 fold-change (
), y-axis = −log10(adjusted p-value)logfoldchanges
Example Queries
- "Find marker genes for all my spatial clusters"
- "Identify differentially expressed genes between cluster 1 and cluster 3"
Parameters
| Parameter | Default | Description |
|---|---|---|
| | Column in to group by |
| | Statistical test: , , or |
| | Number of top markers per group |
| (none) | First group for pairwise comparison |
| (none) | Second group (reference) for pairwise comparison |
Output Structure
output_dir/ ├── report.md ├── result.json ├── processed.h5ad ├── figures/ │ ├── marker_dotplot.png │ └── de_volcano.png (only if --group1/--group2) ├── tables/ │ ├── markers_top.csv │ └── de_full.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required: scanpy >= 1.9, anndata >= 0.11, matplotlib, numpy, pandas
Optional:
— PyDESeq2 pseudobulk differential expressionpydeseq2
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
- Keywords: differential expression, marker gene, DE, Wilcoxon, group comparison
Chaining: Expects
processed.h5ad from spatial-preprocess as input. Demo mode runs spatial-preprocess automatically.
Citations
- Scanpy — analysis framework
- Wilcoxon rank-sum test — non-parametric test
- Leiden algorithm — community detection (for cluster labels)