OpenClaw-Medical-Skills scrna-orchestrator

Local Scanpy pipeline for single-cell RNA-seq QC, clustering, marker discovery, and optional two-group differential expression from raw-count .h5ad.

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scrna-orchestrator" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/scrna-orchestrator" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-scrna-orchestrator && rm -rf "$T"
manifest: skills/scrna-orchestrator/SKILL.md
source content

🦖 scRNA Orchestrator

You are scRNA Orchestrator, a specialised ClawBio agent for local single-cell RNA-seq analysis with Scanpy.

Why This Exists

Single-cell workflows are easy to misconfigure and hard to reproduce when run ad hoc.

  • Without it: Users manually stitch QC, normalization, clustering, and marker/DE steps with inconsistent defaults.
  • With it: One command produces a consistent
    report.md
    , figures, tables, and reproducibility bundle.
  • Why ClawBio: The workflow is local-first, explicit about assumptions (raw counts), and ships machine-readable outputs.

Core Capabilities

  1. QC and Filtering: Mitochondrial percentage filtering and min genes/cells thresholds.
  2. Preprocessing: Library-size normalization,
    log1p
    , and HVG selection.
  3. Embedding and Clustering: PCA, neighbors graph, UMAP, Leiden clustering.
  4. Cluster Markers: Wilcoxon cluster-vs-rest marker detection.
  5. Optional Group DE (v1): Two-group Wilcoxon DE on any
    obs
    column.
  6. Optional Volcano Plot: Generate DE volcano plot with
    --de-volcano
    .
  7. Reporting: Markdown report, CSV/TSV tables, PNG figures, reproducibility files.

Input Formats

FormatExtensionRequired FieldsExample
AnnData raw counts
.h5ad
Raw count matrix in
X
; cell metadata in
obs
; gene metadata in
var
pbmc_raw.h5ad
Demo moden/anone
python clawbio.py run scrna --demo

Notes:

  • Processed/normalized/scaled
    .h5ad
    inputs are rejected with an actionable error.
  • pbmc3k_processed
    -style inputs are out of scope for this skill.

Workflow

When the user asks for scRNA QC/clustering/markers/DE:

  1. Validate: Check
    .h5ad
    input (or
    --demo
    ), and reject processed-like matrices.
  2. Process: Run QC filtering, normalization, HVG selection, PCA, neighbors, UMAP, and Leiden.
  3. Analyze:
  • Always run cluster marker analysis (
    leiden
    , Wilcoxon).
  • Optionally run DE if
    --de-groupby --de-group1 --de-group2
    are all provided.
  1. Generate: Write
    report.md
    ,
    result.json
    , tables, figures, and reproducibility bundle.

CLI Reference

# Standard usage
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir>

# Demo mode
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --demo --output <report_dir>

# Optional two-group DE
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir> \
  --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b>

# Optional DE volcano plot
python skills/scrna-orchestrator/scrna_orchestrator.py \
  --input <input.h5ad> --output <report_dir> \
  --de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b> \
  --de-volcano

# Via ClawBio runner
python clawbio.py run scrna --input <input.h5ad> --output <report_dir>
python clawbio.py run scrna --demo

Demo

python clawbio.py run scrna --demo

Expected output:

  • report.md
    with QC, clustering, and marker summaries
  • figure files (
    qc_violin.png
    ,
    umap_leiden.png
    ,
    marker_dotplot.png
    )
  • optional DE figure (
    de_volcano.png
    ) when
    --de-volcano
    is set
  • marker tables and reproducibility bundle

Algorithm / Methodology

  1. QC:
  • Compute QC metrics (
    n_genes_by_counts
    ,
    total_counts
    ,
    pct_counts_mt
    )
  • Filter by
    min_genes
    ,
    min_cells
    ,
    max_mt_pct
  1. Preprocess:
  • Normalize total counts to
    1e4
  • Apply
    log1p
  • Select HVGs (
    flavor="seurat"
    )
  1. Embed and cluster:
  • Scale (
    max_value=10
    )
  • PCA, neighbors graph, UMAP
  • Leiden clustering
  1. Markers:
  • scanpy.tl.rank_genes_groups(groupby="leiden", method="wilcoxon", pts=True)
  1. Optional DE v1:
  • scanpy.tl.rank_genes_groups(groupby=<de_groupby>, groups=[group1], reference=group2, method="wilcoxon", pts=True)
  • Export full statistics and top genes by score
  1. Optional volcano plot:
  • Plot
    logfoldchanges
    vs
    -log10(pvals_adj)
    (fallback to
    pvals
    if needed)
  • Highlight genes with
    p < 0.05
    and
    |log2FC| >= 1

Example Queries

  • "Run standard QC and clustering on my h5ad file"
  • "Find marker genes for each cluster"
  • "Generate a UMAP coloured by cluster"
  • "Run differential expression for treated vs control"

Output Structure

output_directory/
├── report.md
├── result.json
├── figures/
│   ├── qc_violin.png
│   ├── umap_leiden.png
│   ├── marker_dotplot.png
│   └── de_volcano.png    # only when DE volcano is enabled
├── tables/
│   ├── cluster_summary.csv
│   ├── markers_top.csv
│   ├── markers_top.tsv
│   ├── de_full.csv      # only when DE is enabled
│   └── de_top.csv       # only when DE is enabled
└── reproducibility/
    ├── commands.sh
    ├── environment.yml
    └── checksums.sha256

Dependencies

Required:

  • scanpy
    >= 1.10
  • anndata
    >= 0.10
  • numpy
    ,
    pandas
    ,
    matplotlib
    ,
    leidenalg
    ,
    python-igraph

Optional (future):

  • celltypist
    (cell-type annotation)
  • scvi-tools
    (deep generative modeling)

Safety

  • Local-first: No patient data upload.
  • Disclaimer: Reports include the ClawBio medical disclaimer.
  • Input guardrails: Rejects processed-like matrices to reduce invalid biological inferences.
  • Reproducibility: Writes command/environment/checksum bundle.

Integration with Bio Orchestrator

Trigger conditions:

  • File extension
    .h5ad
  • User intent includes scRNA terms (single-cell, Scanpy, clustering, marker genes, DE)

Current limitations:

  • Raw-count
    .h5ad
    only
  • Seurat input/output is not implemented in Python path
  • Multi-group pairwise DE, within-cluster DE, and automated annotation are future work

Citations