LLMs-Universal-Life-Science-and-Clinical-Skills- spatial-preprocess
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Spatial_Omics/spatial-preprocess" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-spatial-preprocess-56b5f8 && rm -rf "$T"
manifest:
Skills/Spatial_Omics/spatial-preprocess/SKILL.mdsource content
🔬 Spatial Preprocess
You are Spatial Preprocess, the foundation skill of OmicsClaw spatial analysis. Your role is to load multi-platform spatial transcriptomics data and produce a clean, normalised, clustered AnnData ready for all downstream analysis skills.
Why This Exists
- Without it: Users manually write 40+ lines of Scanpy preprocessing code with inconsistent defaults
- With it: One command loads any spatial platform, runs QC, normalises, clusters, and produces a ready-to-analyse h5ad
- Why OmicsClaw: Standardised preprocessing ensures reproducibility across all downstream skills
Workflow
- Calculate: Prepare raw counts and assess QC metrics.
- Execute: Run filtering, normalization, and feature selection.
- Assess: Perform PCA and variance evaluation.
- Generate: Save normalized matrices and compute default UMAP.
- Report: Synthesize report with processing metadata and summaries.
Core Capabilities
- Multi-platform loading: Visium (directory/H5/H5AD), Xenium (Zarr/H5), MERFISH, Slide-seq, seqFISH, generic H5AD
- QC filtering: Mitochondrial %, min genes/cells thresholds
- Normalization: Library-size normalization + log1p
- HVG selection: Seurat-flavored highly variable gene detection
- Embedding: PCA, neighbor graph, UMAP
- Clustering: Leiden community detection
Input Formats
| Format | Extension | Required | Example |
|---|---|---|---|
| AnnData raw | | Count matrix in X | |
| 10x Visium dir | directory | Space Ranger output | |
| 10x H5 | | Filtered feature matrix | |
| Demo | n/a | flag | Built-in synthetic data |
Workflow
- Load: Detect platform type and load data via
spatialclaw.spatial.loader - QC: Compute metrics (n_genes, total_counts, pct_counts_mt), filter cells/genes
- Normalize:
→normalize_total
; store raw counts inlog1padata.raw - HVG: Select highly variable genes
- Embed: Scale → PCA → neighbors → UMAP
- Cluster: Leiden clustering
- Report: Write report.md, result.json, processed.h5ad, figures, reproducibility bundle
CLI Reference
python skills/spatial-preprocess/spatial_preprocess.py \ --input <data.h5ad> --output <report_dir> [--data-type visium] [--species human] python skills/spatial-preprocess/spatial_preprocess.py --demo --output /tmp/demo python omicsclaw.py run spatial-preprocessing --input <file> --output <dir> python omicsclaw.py run spatial-preprocessing --demo
Example Queries
- "Preprocess my Visium dataset with standard QC metrics"
- "Load and normalize this h5ad spatial data for downstream tools"
Algorithm / Methodology
- QC metrics:
withsc.pp.calculate_qc_metricsqc_vars=["mt"] - Filter: cells with
, genes inn_genes_by_counts >= min_genes
cells,>= min_cellspct_counts_mt <= max_mt_pct - Normalize:
→sc.pp.normalize_total(target_sum=1e4)sc.pp.log1p() - HVG:
sc.pp.highly_variable_genes(n_top_genes=n_top_hvg, flavor="seurat") - Scale:
on HVG subsetsc.pp.scale(max_value=10) - PCA:
sc.tl.pca(n_comps=n_pcs) - Neighbors:
sc.pp.neighbors(n_neighbors=n_neighbors, n_pcs=n_pcs) - UMAP:
sc.tl.umap() - Leiden:
sc.tl.leiden(resolution=leiden_resolution)
Output Structure
output_dir/ ├── report.md ├── result.json ├── processed.h5ad ├── figures/ │ ├── qc_violin.png │ └── umap_leiden.png ├── tables/ │ └── cluster_summary.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required: scanpy >= 1.9, anndata >= 0.11, squidpy >= 1.2, matplotlib, numpy, pandas
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
- Raw preservation: Original counts saved in
adata.raw
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
file input, keywords: preprocess, QC, normalize, visium, xenium.h5ad
Chaining: Output
processed.h5ad feeds into all downstream spatial-* skills
Citations
- Scanpy — analysis framework
- Squidpy — spatial extensions
- Leiden algorithm — community detection