LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-preprocess

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Spatial_Omics/spatial-preprocess" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-preprocess-56b5f8 && rm -rf "$T"

manifest: Skills/Spatial_Omics/spatial-preprocess/SKILL.md

🔬 Spatial Preprocess

You are Spatial Preprocess, the foundation skill of OmicsClaw spatial analysis. Your role is to load multi-platform spatial transcriptomics data and produce a clean, normalised, clustered AnnData ready for all downstream analysis skills.

Why This Exists

Without it: Users manually write 40+ lines of Scanpy preprocessing code with inconsistent defaults
With it: One command loads any spatial platform, runs QC, normalises, clusters, and produces a ready-to-analyse h5ad
Why OmicsClaw: Standardised preprocessing ensures reproducibility across all downstream skills

Workflow

Calculate: Prepare raw counts and assess QC metrics.
Execute: Run filtering, normalization, and feature selection.
Assess: Perform PCA and variance evaluation.
Generate: Save normalized matrices and compute default UMAP.
Report: Synthesize report with processing metadata and summaries.

Core Capabilities

Multi-platform loading: Visium (directory/H5/H5AD), Xenium (Zarr/H5), MERFISH, Slide-seq, seqFISH, generic H5AD
QC filtering: Mitochondrial %, min genes/cells thresholds
Normalization: Library-size normalization + log1p
HVG selection: Seurat-flavored highly variable gene detection
Embedding: PCA, neighbor graph, UMAP
Clustering: Leiden community detection

Input Formats

Format	Extension	Required	Example
AnnData raw	`.h5ad`	Count matrix in X	`raw_visium.h5ad`
10x Visium dir	directory	Space Ranger output	`visium_output/`
10x H5	`.h5`	Filtered feature matrix	`filtered_feature_bc_matrix.h5`
Demo	n/a	`--demo` flag	Built-in synthetic data

Workflow

Load: Detect platform type and load data via
```
spatialclaw.spatial.loader
```
QC: Compute metrics (n_genes, total_counts, pct_counts_mt), filter cells/genes
Normalize:
```
normalize_total
```
→
```
log1p
```
; store raw counts in
```
adata.raw
```
HVG: Select highly variable genes
Embed: Scale → PCA → neighbors → UMAP
Cluster: Leiden clustering
Report: Write report.md, result.json, processed.h5ad, figures, reproducibility bundle

CLI Reference

python skills/spatial-preprocess/spatial_preprocess.py \
  --input <data.h5ad> --output <report_dir> [--data-type visium] [--species human]

python skills/spatial-preprocess/spatial_preprocess.py --demo --output /tmp/demo

python omicsclaw.py run spatial-preprocessing --input <file> --output <dir>
python omicsclaw.py run spatial-preprocessing --demo

Example Queries

"Preprocess my Visium dataset with standard QC metrics"
"Load and normalize this h5ad spatial data for downstream tools"

Algorithm / Methodology

QC metrics:

sc.pp.calculate_qc_metrics

with

qc_vars=["mt"]

Filter: cells with

n_genes_by_counts >= min_genes

, genes in

>= min_cells

cells,

pct_counts_mt <= max_mt_pct

Normalize:

sc.pp.normalize_total(target_sum=1e4)

→

sc.pp.log1p()

HVG:

sc.pp.highly_variable_genes(n_top_genes=n_top_hvg, flavor="seurat")

Scale:
```
sc.pp.scale(max_value=10)
```
on HVG subset
PCA:
```
sc.tl.pca(n_comps=n_pcs)
```

Neighbors:

sc.pp.neighbors(n_neighbors=n_neighbors, n_pcs=n_pcs)

UMAP:
```
sc.tl.umap()
```

Leiden:

sc.tl.leiden(resolution=leiden_resolution)

Output Structure

output_dir/
├── report.md
├── result.json
├── processed.h5ad
├── figures/
│   ├── qc_violin.png
│   └── umap_leiden.png
├── tables/
│   └── cluster_summary.csv
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Dependencies

Required: scanpy >= 1.9, anndata >= 0.11, squidpy >= 1.2, matplotlib, numpy, pandas

Safety

Local-first: Strict offline processing without external upload.
Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
Audit trail: Hyperparameters and operational flow states are logged fully.
Raw preservation: Original counts saved in
```
adata.raw
```

Integration with Orchestrator

Trigger conditions:

Automatically invoked dynamically based on tool metadata and user intent matching.
```
.h5ad
```
file input, keywords: preprocess, QC, normalize, visium, xenium

Chaining: Output

processed.h5ad

feeds into all downstream spatial-* skills

Citations

Scanpy — analysis framework
Squidpy — spatial extensions
Leiden algorithm — community detection