LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-de

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Spatial_Omics/spatial-de" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-de && rm -rf "$T"

manifest: Skills/Spatial_Omics/spatial-de/SKILL.md

🧬 Spatial DE

You are Spatial DE, the differential expression and marker gene discovery skill for OmicsClaw. Your role is to identify differentially expressed genes between spatial clusters or user-defined groups, producing ranked marker gene tables, dot plots, and volcano plots.

Why This Exists

Without it: Users manually run
```
sc.tl.rank_genes_groups
```
with inconsistent parameters and no structured output
With it: One command discovers markers per cluster or between two groups, with publication-ready figures and reproducible reports
Why OmicsClaw: Standardised DE ensures consistent methodology across spatial analysis pipelines

Core Capabilities

Cluster-vs-rest markers: Rank genes per cluster using Wilcoxon, t-test, or PyDESeq2
Two-group comparison: Compare any two groups within a groupby column
Multiple methods: Wilcoxon (default, non-parametric), t-test (parametric, fast), PyDESeq2 (pseudobulk, gold standard)
Dot plot: Top marker genes per cluster
Volcano plot: Log2 fold-change vs. −log10 p-value for two-group comparisons
Marker table: CSV of top N markers per cluster with scores, p-values, and log fold-changes

Input Formats

Format	Extension	Required	Example
Preprocessed AnnData	`.h5ad`	Normalised, with clusters in `.obs`	`processed.h5ad`
Demo	n/a	`--demo` flag	Runs spatial-preprocess demo first

Workflow

Load: Read preprocessed h5ad (output of spatial-preprocess)
Validate: Ensure groupby column exists; fallback to minimal preprocessing if missing

Rank genes:

sc.tl.rank_genes_groups(adata, groupby, method)

for cluster-vs-rest

Two-group (optional): If
```
--group1
```
and
```
--group2
```
provided, run pairwise comparison
Tables: Extract top N markers per group to
```
markers_top.csv
```
; full results to
```
de_full.csv
```
Figures: Dot plot of top markers; volcano plot if two-group mode
Report: Write report.md, result.json, processed.h5ad, figures, reproducibility bundle

CLI Reference

# Cluster-vs-rest markers (default: Wilcoxon)
python skills/spatial-de/spatial_de.py \
  --input <processed.h5ad> --output <report_dir>

# Two-group comparison
python skills/spatial-de/spatial_de.py \
  --input <processed.h5ad> --output <dir> --group1 0 --group2 1

# Use t-test method
python skills/spatial-de/spatial_de.py \
  --input <file> --method t-test --output <dir>

# Use PyDESeq2 for pseudobulk DE
python skills/spatial-de/spatial_de.py \
  --input <file> --method pydeseq2 --group1 0 --group2 1 --output <dir>

# Demo mode
python skills/spatial-de/spatial_de.py --demo --output /tmp/de_demo

# Via OmicsClaw runner
python omicsclaw.py run spatial-de --input <file> --output <dir>
python omicsclaw.py run spatial-de --demo

Algorithm / Methodology

Wilcoxon (default)

Cluster-vs-rest:

sc.tl.rank_genes_groups(adata, groupby=groupby, method='wilcoxon')

Non-parametric: Robust to non-normal distributions
Fast: Suitable for large datasets

t-test

Parametric:

sc.tl.rank_genes_groups(adata, groupby=groupby, method='t-test')

Welch's t-test: Assumes normality, faster than Wilcoxon
Use case: Quick exploratory analysis

PyDESeq2

Pseudobulk: Aggregates counts per sample/replicate
Negative binomial GLM: Gold standard for RNA-seq DE
Requires: Sample-level replicates for proper statistical modeling
Use case: Publication-quality DE with proper dispersion estimation

Common steps

Two-group comparison:

sc.tl.rank_genes_groups(adata, groupby=groupby, groups=[group1], reference=group2, method=method)

Marker extraction:
```
sc.get.rank_genes_groups_df
```
to produce structured DataFrames
Volcano plot: x-axis = log2 fold-change (
```
logfoldchanges
```
), y-axis = −log10(adjusted p-value)

Example Queries

"Find marker genes for all my spatial clusters"
"Identify differentially expressed genes between cluster 1 and cluster 3"

Parameters

Parameter	Default	Description
`--groupby`	`leiden`	Column in `adata.obs` to group by
`--method`	`wilcoxon`	Statistical test: `wilcoxon` , `t-test` , or `pydeseq2`
`--n-top-genes`	`10`	Number of top markers per group
`--group1`	(none)	First group for pairwise comparison
`--group2`	(none)	Second group (reference) for pairwise comparison

Output Structure

output_dir/
├── report.md
├── result.json
├── processed.h5ad
├── figures/
│   ├── marker_dotplot.png
│   └── de_volcano.png          (only if --group1/--group2)
├── tables/
│   ├── markers_top.csv
│   └── de_full.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Dependencies

Required: scanpy >= 1.9, anndata >= 0.11, matplotlib, numpy, pandas

Optional:

```
pydeseq2
```
— PyDESeq2 pseudobulk differential expression

Safety

Local-first: Strict offline processing without external upload.
Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
Audit trail: Hyperparameters and operational flow states are logged fully.

Integration with Orchestrator

Trigger conditions:

Automatically invoked dynamically based on tool metadata and user intent matching.
Keywords: differential expression, marker gene, DE, Wilcoxon, group comparison

Chaining: Expects

processed.h5ad

from spatial-preprocess as input. Demo mode runs spatial-preprocess automatically.

Citations

Scanpy — analysis framework
Wilcoxon rank-sum test — non-parametric test
Leiden algorithm — community detection (for cluster labels)