OpenClaw-Medical-Skills scrna-orchestrator
Local Scanpy pipeline for single-cell RNA-seq QC, clustering, marker discovery, and optional two-group differential expression from raw-count .h5ad.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scrna-orchestrator" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/scrna-orchestrator" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"
manifest:
skills/scrna-orchestrator/SKILL.mdsource content
🦖 scRNA Orchestrator
You are scRNA Orchestrator, a specialised ClawBio agent for local single-cell RNA-seq analysis with Scanpy.
Why This Exists
Single-cell workflows are easy to misconfigure and hard to reproduce when run ad hoc.
- Without it: Users manually stitch QC, normalization, clustering, and marker/DE steps with inconsistent defaults.
- With it: One command produces a consistent
, figures, tables, and reproducibility bundle.report.md - Why ClawBio: The workflow is local-first, explicit about assumptions (raw counts), and ships machine-readable outputs.
Core Capabilities
- QC and Filtering: Mitochondrial percentage filtering and min genes/cells thresholds.
- Preprocessing: Library-size normalization,
, and HVG selection.log1p - Embedding and Clustering: PCA, neighbors graph, UMAP, Leiden clustering.
- Cluster Markers: Wilcoxon cluster-vs-rest marker detection.
- Optional Group DE (v1): Two-group Wilcoxon DE on any
column.obs - Optional Volcano Plot: Generate DE volcano plot with
.--de-volcano - Reporting: Markdown report, CSV/TSV tables, PNG figures, reproducibility files.
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| AnnData raw counts | | Raw count matrix in ; cell metadata in ; gene metadata in | |
| Demo mode | n/a | none | |
Notes:
- Processed/normalized/scaled
inputs are rejected with an actionable error..h5ad
-style inputs are out of scope for this skill.pbmc3k_processed
Workflow
When the user asks for scRNA QC/clustering/markers/DE:
- Validate: Check
input (or.h5ad
), and reject processed-like matrices.--demo - Process: Run QC filtering, normalization, HVG selection, PCA, neighbors, UMAP, and Leiden.
- Analyze:
- Always run cluster marker analysis (
, Wilcoxon).leiden - Optionally run DE if
are all provided.--de-groupby --de-group1 --de-group2
- Generate: Write
,report.md
, tables, figures, and reproducibility bundle.result.json
CLI Reference
# Standard usage python skills/scrna-orchestrator/scrna_orchestrator.py \ --input <input.h5ad> --output <report_dir> # Demo mode python skills/scrna-orchestrator/scrna_orchestrator.py \ --demo --output <report_dir> # Optional two-group DE python skills/scrna-orchestrator/scrna_orchestrator.py \ --input <input.h5ad> --output <report_dir> \ --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b> # Optional DE volcano plot python skills/scrna-orchestrator/scrna_orchestrator.py \ --input <input.h5ad> --output <report_dir> \ --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b> \ --de-volcano # Via ClawBio runner python clawbio.py run scrna --input <input.h5ad> --output <report_dir> python clawbio.py run scrna --demo
Demo
python clawbio.py run scrna --demo
Expected output:
with QC, clustering, and marker summariesreport.md- figure files (
,qc_violin.png
,umap_leiden.png
)marker_dotplot.png - optional DE figure (
) whende_volcano.png
is set--de-volcano - marker tables and reproducibility bundle
Algorithm / Methodology
- QC:
- Compute QC metrics (
,n_genes_by_counts
,total_counts
)pct_counts_mt - Filter by
,min_genes
,min_cellsmax_mt_pct
- Preprocess:
- Normalize total counts to
1e4 - Apply
log1p - Select HVGs (
)flavor="seurat"
- Embed and cluster:
- Scale (
)max_value=10 - PCA, neighbors graph, UMAP
- Leiden clustering
- Markers:
scanpy.tl.rank_genes_groups(groupby="leiden", method="wilcoxon", pts=True)
- Optional DE v1:
scanpy.tl.rank_genes_groups(groupby=<de_groupby>, groups=[group1], reference=group2, method="wilcoxon", pts=True)- Export full statistics and top genes by score
- Optional volcano plot:
- Plot
vslogfoldchanges
(fallback to-log10(pvals_adj)
if needed)pvals - Highlight genes with
andp < 0.05|log2FC| >= 1
Example Queries
- "Run standard QC and clustering on my h5ad file"
- "Find marker genes for each cluster"
- "Generate a UMAP coloured by cluster"
- "Run differential expression for treated vs control"
Output Structure
output_directory/ ├── report.md ├── result.json ├── figures/ │ ├── qc_violin.png │ ├── umap_leiden.png │ ├── marker_dotplot.png │ └── de_volcano.png # only when DE volcano is enabled ├── tables/ │ ├── cluster_summary.csv │ ├── markers_top.csv │ ├── markers_top.tsv │ ├── de_full.csv # only when DE is enabled │ └── de_top.csv # only when DE is enabled └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required:
>= 1.10scanpy
>= 0.10anndata
,numpy
,pandas
,matplotlib
,leidenalgpython-igraph
Optional (future):
(cell-type annotation)celltypist
(deep generative modeling)scvi-tools
Safety
- Local-first: No patient data upload.
- Disclaimer: Reports include the ClawBio medical disclaimer.
- Input guardrails: Rejects processed-like matrices to reduce invalid biological inferences.
- Reproducibility: Writes command/environment/checksum bundle.
Integration with Bio Orchestrator
Trigger conditions:
- File extension
.h5ad - User intent includes scRNA terms (single-cell, Scanpy, clustering, marker genes, DE)
Current limitations:
- Raw-count
only.h5ad - Seurat input/output is not implemented in Python path
- Multi-group pairwise DE, within-cluster DE, and automated annotation are future work
Citations
- Scanpy documentation — analysis API and methods.
- AnnData documentation — data model.
- Leiden algorithm paper — community detection.