OpenClaw-Medical-Skills scfoundation-model-agent

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scfoundation-model-agent" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-scfoundation-model-agent && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/scfoundation-model-agent" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-scfoundation-model-agent && rm -rf "$T"
manifest: skills/scfoundation-model-agent/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: 'scfoundation-model-agent' description: 'Unified agent for leveraging single-cell foundation models (scGPT, scBERT, Geneformer, scFoundation) for cross-species annotation, perturbation prediction, and gene network inference.' measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

scFoundation Model Agent

The scFoundation Model Agent provides a unified interface to leverage state-of-the-art single-cell foundation models for diverse downstream tasks. It integrates scGPT, scBERT, Geneformer, scFoundation, and emerging models to enable cross-species cell annotation, in silico perturbation prediction, gene regulatory network inference, and batch integration.

When to Use This Skill

  • When annotating cell types across species (human, mouse, cross-species).
  • For predicting perturbation effects (knockouts, drug treatments) in silico.
  • To infer gene regulatory networks from single-cell data.
  • When integrating batches without losing biological signal.
  • For generating cell embeddings for downstream analysis.

Core Capabilities

  1. Cross-Species Cell Annotation: Transfer cell type labels across species using unified embeddings.

  2. In Silico Perturbation: Predict gene expression changes from knockouts/treatments.

  3. Gene Regulatory Network Inference: Discover TF-target relationships from attention patterns.

  4. Batch Integration: Remove technical variation while preserving biology.

  5. Cell Embedding Generation: Generate universal cell representations for any downstream task.

  6. Multi-Model Ensemble: Combine predictions from multiple foundation models.

Supported Foundation Models

ModelParametersTraining DataStrengths
scGPT50M33M human cellsGeneral purpose, perturbations
Geneformer10M30M cellsChromatin, gene networks
scBERT20M1.2M cellsCell type annotation
scFoundation100M50M cellsLarge-scale, multi-species
scTab15M22M cellsTabular prediction
UCE (Universal Cell Embeddings)100M36M cellsCross-species transfer

Workflow

  1. Input: Single-cell RNA-seq data (AnnData format).

  2. Model Selection: Choose appropriate model(s) for task.

  3. Preprocessing: Tokenize genes, normalize expression.

  4. Inference: Generate embeddings or predictions.

  5. Task Execution: Annotation, perturbation, or network inference.

  6. Ensemble (Optional): Combine multi-model predictions.

  7. Output: Annotated data, predictions, networks.

Example Usage

User: "Use scGPT to predict the effect of CRISPR knockout of TP53 on these cancer cells."

Agent Action:

python3 Skills/Genomics/scFoundation_Model_Agent/foundation_predict.py \
    --input cancer_cells.h5ad \
    --model scgpt \
    --task perturbation \
    --perturbation "TP53 knockout" \
    --model_checkpoint scgpt_human_gene_v1.pt \
    --output tp53_ko_predictions.h5ad

Task-Specific Usage

Cell Type Annotation

python3 foundation_predict.py \
    --input query_cells.h5ad \
    --model geneformer \
    --task annotation \
    --reference tabula_sapiens.h5ad \
    --output annotated_cells.h5ad

Gene Network Inference

python3 foundation_predict.py \
    --input cells.h5ad \
    --model scgpt \
    --task grn_inference \
    --transcription_factors tf_list.txt \
    --output gene_network.csv

Batch Integration

python3 foundation_predict.py \
    --input multi_batch.h5ad \
    --model scfoundation \
    --task integration \
    --batch_key batch \
    --output integrated.h5ad

Output Formats

TaskOutputFormat
AnnotationCell type labels.h5ad obs column
PerturbationPredicted expression.h5ad layer
GRNTF-target edges.csv, .graphml
IntegrationCorrected embeddings.h5ad obsm
EmbeddingsCell representations.h5ad obsm

Performance Benchmarks

TaskModelDatasetPerformance
AnnotationscGPTTabula Sapiens93% accuracy
AnnotationGeneformerHLCA91% accuracy
Perturbation (R²)scGPTNorman 20190.87
Integration (kBET)scFoundationMulti-atlas0.92
Cross-speciesUCEHuman→Mouse85% F1

AI/ML Architecture

Transformer Backbone:

  • Gene-level tokenization
  • Attention-based gene interactions
  • Masked expression prediction pretraining

Perturbation Module:

  • Conditional generation
  • Counterfactual prediction
  • Dose-response modeling

Transfer Learning:

  • Zero-shot annotation
  • Few-shot fine-tuning
  • Domain adaptation

Prerequisites

  • Python 3.10+
  • PyTorch 2.0+
  • transformers, flash-attn
  • Scanpy, AnnData
  • Model-specific weights
  • GPU with 16GB+ VRAM

Related Skills

  • Nicheformer_Spatial_Agent - For spatial foundation models
  • scGPT_Agent - Dedicated scGPT workflows
  • Cell_Type_Annotation - Traditional annotation methods
  • Pathway_Analysis - Gene set enrichment

Model Selection Guide

Use CaseRecommended ModelReason
General annotationscGPTBroad training, robust
Cross-speciesUCESpecies-agnostic embeddings
PerturbationscGPTBest perturbation performance
GRN inferenceGeneformerAttention → regulatory links
Large-scalescFoundationEfficient, scalable
Tabular predictionscTabOptimized for classification

Special Considerations

  1. Gene Coverage: Models trained on variable gene sets; check overlap
  2. Species: Some models human-only; use UCE for cross-species
  3. Compute: Large models need significant GPU memory
  4. Fine-Tuning: Task-specific fine-tuning improves performance
  5. Versioning: Model weights update frequently; track versions

Ensemble Strategies

StrategyMethodBenefit
Majority VoteMode of predictionsRobust to outliers
Weighted AverageConfidence-weightedLeverages uncertainty
StackingMeta-modelLearns model strengths
Attention FusionCross-model attentionDeep integration

Author

AI Group - Biomedical AI Platform

<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->