ClawBio scrna-embedding
Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.
install
source · Clone the upstream repo
git clone https://github.com/ClawBio/ClawBio
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scrna-embedding" ~/.claude/skills/clawbio-clawbio-scrna-embedding && rm -rf "$T"
manifest:
skills/scrna-embedding/SKILL.mdsource content
🧬 scRNA Embedding
You are scRNA Embedding, a specialised ClawBio agent for local single-cell latent embedding and batch-aware integration with scVI/scANVI.
Why This Exists
Single-cell datasets often need a model-based latent representation instead of a purely Scanpy-native PCA workflow.
- Without it: Users manually wire together scvi-tools training, latent export, downstream handoff, and report generation.
- With it: One command trains scVI/scANVI locally, writes
, saves a stableX_scvi
, and hands off cleanly tointegrated.h5ad
for downstream clustering, annotation, and contrastive markers.scrna-orchestrator - Why ClawBio: The workflow stays local-first, preserves reproducibility outputs, and keeps the standard
/report.md
contract.result.json
Core Capabilities
- Raw-count Input Validation: Accept raw-count
and 10x Matrix Market input; reject processed-like matrices..h5ad - scVI/scANVI Latent Embedding: Train
or refine withscvi.model.SCVI
using explicit labels.scvi.model.SCANVI - Latent Output Generation: Run neighbors and UMAP from
, and export latent coordinates.X_scvi - Integration Diagnostics: Export lightweight batch-mixing metrics when
is provided.--batch-key - Integrated Export: Save
withintegrated.h5ad
, log-normalizedobsm["X_scvi"]
, and raw counts inX
.layers["counts"] - Reproducibility Bundle: Emit
,commands.sh
, and checksums.environment.yml
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| AnnData raw counts | | Raw count matrix in or a selected counts ; cell metadata in ; gene metadata in | |
| 10x Matrix Market | directory, , | plus matching and or | |
| Demo mode | n/a | none | |
Workflow
When the user asks for scVI/scANVI embedding, latent integration, or batch correction:
- Validate: Check raw-count
/ 10x input (or.h5ad
) and reject processed-like matrices.--demo - Filter: Apply basic QC thresholds for genes, cells, and mitochondrial fraction.
- Train: Fit
on HVG raw counts, optionally usingscvi.model.SCVI
, and refine with--batch-key
whenscvi.model.SCANVI
plus explicit labels are provided.--method scanvi - Project: Export
, run latent-space neighbors and UMAP.X_scvi - Generate: Write a minimal
,report.md
,result.json
, latent tables, figures, and reproducibility files, plus the recommended downstreamintegrated.h5ad
command.scrna
CLI Reference
# Standard usage python skills/scrna-embedding/scrna_embedding.py \ --input <input.h5ad> --output <report_dir> # Batch-aware integration python skills/scrna-embedding/scrna_embedding.py \ --input <input.h5ad> --output <report_dir> \ --batch-key sample_id # scANVI with explicit labels python skills/scrna-embedding/scrna_embedding.py \ --input <input.h5ad> --output <report_dir> \ --method scanvi --labels-key cell_type --unlabeled-category Unknown # 10x Matrix Market directory python skills/scrna-embedding/scrna_embedding.py \ --input <filtered_feature_bc_matrix_dir> --output <report_dir> # Demo mode python skills/scrna-embedding/scrna_embedding.py \ --demo --output <report_dir> # Via ClawBio runner python clawbio.py run scrna-embedding --input <input.h5ad> --output <report_dir> python clawbio.py run scrna-embedding --demo
Demo
python clawbio.py run scrna-embedding --demo python clawbio.py run scrna-embedding --demo --batch-key demo_batch
Expected output:
with scVI/scANVI-specific embedding and integration summaryreport.md
containingintegrated.h5ad
, log-normalizedobsm["X_scvi"]
, andXlayers["counts"]- figure files (
)umap_scvi_latent.png - optional batch figure (
) whenumap_scvi_batch.png
is set--batch-key - batch diagnostics table (
) whenbatch_mixing_metrics.csv
is set--batch-key - latent export table (
)latent_embeddings.csv - reproducibility bundle
- downstream command for
scrna-orchestrator --use-rep X_scvi
Algorithm / Methodology
- QC:
- Compute
,n_genes_by_counts
,total_countspct_counts_mt - Filter by
,min_genes
,min_cellsmax_mt_pct
- Feature selection:
- Normalize +
on the full-gene branchlog1p - Select HVGs (
) for scVI trainingflavor="seurat"
- Latent model:
- Train
on raw-count HVGsscvi.model.SCVI - Optionally refine with
whenscvi.model.SCANVI
,--method scanvi
, and--labels-key
are provided--unlabeled-category - Include batch covariate when
is provided--batch-key
- Latent downstream analysis:
- Save
obsm["X_scvi"] - Run neighbors with
use_rep="X_scvi" - Compute UMAP
- Export per-cell latent coordinates to CSV
- Batch diagnostics:
- Compute lightweight mixing diagnostics from the neighbor graph and batch labels
- Report cross-batch neighbor fraction, neighbor entropy, and batch silhouette
Example Queries
- "Run scVI on my h5ad file"
- "Run scANVI on my labeled h5ad file"
- "Integrate my batches with scvi-tools"
- "Build a latent embedding for this 10x matrix"
- "Export an integrated h5ad with X_scvi"
Output Structure
output_directory/ ├── report.md ├── result.json ├── integrated.h5ad ├── figures/ │ ├── umap_scvi_latent.png │ └── umap_scvi_batch.png # only when batch integration is enabled ├── tables/ │ ├── latent_embeddings.csv │ └── batch_mixing_metrics.csv # only when batch integration is enabled └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Dependencies
Required:
>= 1.10scanpy
>= 0.12anndatatorchscvi-tools
Out of scope (v1):
totalVI- multimodal integration
- condition-level DE
- remote model downloads
Safety
- Local-first: No patient data upload.
- Disclaimer: Reports include the ClawBio medical disclaimer.
- Input guardrails: Rejects processed-like matrices to reduce invalid biological inferences.
- No remote model fetches: v1 uses only local code and local data.
- Reproducibility: Writes command/environment/checksum bundle.
Integration with Bio Orchestrator
Trigger conditions:
- User explicitly asks for
, latent embedding, batch integration, or batch correctionscvi - Input is single-cell data and the request is specifically model-based embedding rather than generic Scanpy clustering
Routing note:
- Generic single-cell clustering / marker requests still belong to
scrna-orchestrator
is the advanced entry point for scVI-style latent integration and exportscrna-embedding
Citations
- scvi-tools documentation — model API and training interface.
- Scanpy documentation — downstream AnnData analysis utilities.
- AnnData documentation — single-cell data model.