OpenClaw-Medical-Skills scrna-orchestrator

Local Scanpy pipeline for single-cell RNA-seq QC, clustering, marker discovery, and optional two-group differential expression from raw-count .h5ad.

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scrna-orchestrator" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/scrna-orchestrator" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"

manifest: skills/scrna-orchestrator/SKILL.md

🦖 scRNA Orchestrator

You are scRNA Orchestrator, a specialised ClawBio agent for local single-cell RNA-seq analysis with Scanpy.

Why This Exists

Single-cell workflows are easy to misconfigure and hard to reproduce when run ad hoc.

Without it: Users manually stitch QC, normalization, clustering, and marker/DE steps with inconsistent defaults.
With it: One command produces a consistent
```
report.md
```
, figures, tables, and reproducibility bundle.
Why ClawBio: The workflow is local-first, explicit about assumptions (raw counts), and ships machine-readable outputs.

Core Capabilities

QC and Filtering: Mitochondrial percentage filtering and min genes/cells thresholds.
Preprocessing: Library-size normalization,
```
log1p
```
, and HVG selection.
Embedding and Clustering: PCA, neighbors graph, UMAP, Leiden clustering.
Cluster Markers: Wilcoxon cluster-vs-rest marker detection.
Optional Group DE (v1): Two-group Wilcoxon DE on any
```
obs
```
column.
Optional Volcano Plot: Generate DE volcano plot with
```
--de-volcano
```
.
Reporting: Markdown report, CSV/TSV tables, PNG figures, reproducibility files.

Input Formats

Format Extension Required Fields Example

AnnData raw counts

.h5ad

Raw count matrix in

; cell metadata in

obs

; gene metadata in

var

pbmc_raw.h5ad

Demo mode

n/a

none

python clawbio.py run scrna --demo

Notes:

Processed/normalized/scaled
```
.h5ad
```
inputs are rejected with an actionable error.
```
pbmc3k_processed
```
-style inputs are out of scope for this skill.

Workflow

When the user asks for scRNA QC/clustering/markers/DE:

Validate: Check
```
.h5ad
```
input (or
```
--demo
```
), and reject processed-like matrices.
Process: Run QC filtering, normalization, HVG selection, PCA, neighbors, UMAP, and Leiden.
Analyze:

Always run cluster marker analysis (
```
leiden
```
, Wilcoxon).
Optionally run DE if
```
--de-groupby --de-group1 --de-group2
```
are all provided.

Generate: Write
```
report.md
```
,
```
result.json
```
, tables, figures, and reproducibility bundle.

CLI Reference

# Standard usage
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir>

# Demo mode
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --demo --output <report_dir>

# Optional two-group DE
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir> \
  --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b>

# Optional DE volcano plot
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir> \
  --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b> \
  --de-volcano

# Via ClawBio runner
python clawbio.py run scrna --input <input.h5ad> --output <report_dir>
python clawbio.py run scrna --demo

Demo

python clawbio.py run scrna --demo

Expected output:

```
report.md
```
with QC, clustering, and marker summaries

figure files (

qc_violin.png

umap_leiden.png

marker_dotplot.png

)

optional DE figure (
```
de_volcano.png
```
) when
```
--de-volcano
```
is set
marker tables and reproducibility bundle

Algorithm / Methodology

Compute QC metrics (

n_genes_by_counts

total_counts

pct_counts_mt

)

Filter by
```
min_genes
```
,
```
min_cells
```
,
```
max_mt_pct
```

Preprocess:

Normalize total counts to
```
1e4
```
Apply
```
log1p
```
Select HVGs (
```
flavor="seurat"
```
)

Embed and cluster:

Scale (
```
max_value=10
```
)
PCA, neighbors graph, UMAP
Leiden clustering

Markers:

scanpy.tl.rank_genes_groups(groupby="leiden", method="wilcoxon", pts=True)

Optional DE v1:

scanpy.tl.rank_genes_groups(groupby=<de_groupby>, groups=[group1], reference=group2, method="wilcoxon", pts=True)

Export full statistics and top genes by score

Optional volcano plot:

Plot
```
logfoldchanges
```
vs
```
-log10(pvals_adj)
```
(fallback to
```
pvals
```
if needed)
Highlight genes with
```
p < 0.05
```
and
```
|log2FC| >= 1
```

Example Queries

"Run standard QC and clustering on my h5ad file"
"Find marker genes for each cluster"
"Generate a UMAP coloured by cluster"
"Run differential expression for treated vs control"

Output Structure

output_directory/
├── report.md
├── result.json
├── figures/
│   ├── qc_violin.png
│   ├── umap_leiden.png
│   ├── marker_dotplot.png
│   └── de_volcano.png    # only when DE volcano is enabled
├── tables/
│   ├── cluster_summary.csv
│   ├── markers_top.csv
│   ├── markers_top.tsv
│   ├── de_full.csv      # only when DE is enabled
│   └── de_top.csv       # only when DE is enabled
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Dependencies

Required:

```
scanpy
```
>= 1.10
```
anndata
```
>= 0.10

numpy

pandas

matplotlib

leidenalg

python-igraph

Optional (future):

```
celltypist
```
(cell-type annotation)
```
scvi-tools
```
(deep generative modeling)

Safety

Local-first: No patient data upload.
Disclaimer: Reports include the ClawBio medical disclaimer.
Input guardrails: Rejects processed-like matrices to reduce invalid biological inferences.
Reproducibility: Writes command/environment/checksum bundle.

Integration with Bio Orchestrator

Trigger conditions:

File extension
```
.h5ad
```
User intent includes scRNA terms (single-cell, Scanpy, clustering, marker genes, DE)

Current limitations:

Raw-count
```
.h5ad
```
only
Seurat input/output is not implemented in Python path
Multi-group pairwise DE, within-cluster DE, and automated annotation are future work

Citations

Scanpy documentation — analysis API and methods.
AnnData documentation — data model.
Leiden algorithm paper — community detection.