OpenClaw-Medical-Skills bio-machine-learning-atlas-mapping

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-machine-learning-atlas-mapping" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-machine-learning-atlas-mapping && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-machine-learning-atlas-mapping" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-machine-learning-atlas-mapping && rm -rf "$T"
manifest: skills/bio-machine-learning-atlas-mapping/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-machine-learning-atlas-mapping description: Maps query single-cell data to reference atlases using scArches transfer learning with scVI and scANVI models. Transfers cell type labels without retraining on combined data. Use when annotating new single-cell datasets using pre-trained reference models. tool_type: python primary_tool: scvi-tools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Transfer Learning for Single-Cell Data

scVI Reference Mapping (scArches)

import scvi
import scanpy as sc

# Load pre-trained reference model
adata_ref = sc.read_h5ad('reference.h5ad')
# Model must have been saved with save_anndata=True
scvi.model.SCVI.setup_anndata(adata_ref, layer='counts', batch_key='batch')
ref_model = scvi.model.SCVI.load('reference_model/', adata=adata_ref)

# Prepare query data
adata_query = sc.read_h5ad('query.h5ad')
# Subset to reference genes
adata_query = adata_query[:, adata_ref.var_names].copy()

# Set up query AnnData using reference setup
scvi.model.SCVI.prepare_query_anndata(adata_query, ref_model)

# Load query into model (creates "surgical" fine-tuned model)
query_model = scvi.model.SCVI.load_query_data(adata_query, ref_model)

# Surgical training: update only query-specific parameters
# weight_decay=0.0: Standard for surgery; prevents reference drift
query_model.train(max_epochs=200, plan_kwargs={'weight_decay': 0.0})

# Get latent representation
adata_query.obsm['X_scVI'] = query_model.get_latent_representation()

scANVI for Label Transfer

import scvi
import scanpy as sc

# Reference with cell type labels
adata_ref = sc.read_h5ad('reference_labeled.h5ad')
scvi.model.SCVI.setup_anndata(adata_ref, layer='counts', batch_key='batch')
ref_vae = scvi.model.SCVI(adata_ref, n_latent=30)
ref_vae.train(max_epochs=100)

# Convert to scANVI (semi-supervised)
scvi.model.SCANVI.setup_anndata(adata_ref, layer='counts', batch_key='batch', labels_key='cell_type', unlabeled_category='Unknown')
ref_scanvi = scvi.model.SCANVI.from_scvi_model(ref_vae, labels_key='cell_type', unlabeled_category='Unknown')
ref_scanvi.train(max_epochs=50)
ref_scanvi.save('reference_scanvi/')

# Map query data
adata_query = sc.read_h5ad('query.h5ad')
adata_query = adata_query[:, adata_ref.var_names].copy()

scvi.model.SCANVI.prepare_query_anndata(adata_query, ref_scanvi)
query_scanvi = scvi.model.SCANVI.load_query_data(adata_query, ref_scanvi)
query_scanvi.train(max_epochs=100, plan_kwargs={'weight_decay': 0.0})

# Transfer labels
adata_query.obs['predicted_cell_type'] = query_scanvi.predict()
adata_query.obsm['X_scANVI'] = query_scanvi.get_latent_representation()

Prediction Confidence

# Get prediction probabilities
soft_predictions = query_scanvi.predict(soft=True)
adata_query.obs['prediction_confidence'] = soft_predictions.max(axis=1)

# Flag low-confidence predictions
# confidence < 0.5: May be novel cell type or poor mapping
low_conf = adata_query.obs['prediction_confidence'] < 0.5
print(f'Low confidence predictions: {low_conf.sum()} ({low_conf.mean():.1%})')

Joint Embedding Visualization

import scanpy as sc

# Combine reference and query for visualization
adata_combined = adata_ref.concatenate(adata_query, batch_key='dataset', batch_categories=['reference', 'query'])

# Use latent space for neighbors/UMAP
sc.pp.neighbors(adata_combined, use_rep='X_scVI')
sc.tl.umap(adata_combined)
sc.pl.umap(adata_combined, color=['dataset', 'cell_type'], save='_transfer.png')

Pre-trained Reference Atlases

AtlasModelURL
Human Lung Cell AtlasscANVIcellxgene.cziscience.com
Tabula SapiensscVItabula-sapiens-portal.ds.czbiohub.org
Mouse Cell AtlasscVIbis.zju.edu.cn/MCA

Training Parameters

ParameterSurgicalFull RetrainNotes
weight_decay0.00.0010.0 preserves reference
max_epochs100-200200-400Less for surgery
early_stoppingTrueTruePrevents overfitting

Related Skills

  • single-cell/cell-annotation - Manual annotation methods
  • single-cell/batch-integration - Batch effect correction
  • single-cell/preprocessing - Data preparation before transfer
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->