OpenClaw-Medical-Skills bulk-rna-seq-deconvolution-with-bulk2single
Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-to-single-deconvolution" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-deconvolution-with-bulk && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-to-single-deconvolution" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulk-rna-seq-deconvolution-with-bulk && rm -rf "$T"
manifest:
skills/bulk-to-single-deconvolution/SKILL.mdsource content
Bulk RNA-seq deconvolution with Bulk2Single
Overview
Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows
, which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.t_bulk2single.ipynb
Instructions
- Load libraries and data
- Import
,omicverse as ov
,scanpy as sc
,scvelo as scv
, andanndata
, then callmatplotlib.pyplot as plt
to match omicverse styling.ov.plot_set() - Read the bulk counts table with
/ov.read(...)
and harmonise gene identifiers viaov.utils.read(...)
.ov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv') - Load the reference scRNA-seq AnnData (e.g.,
) and confirm the cluster labels (stored inscv.datasets.dentategyrus()
).adata.obs['clusters']
- Import
- Initialise the Bulk2Single model
- Instantiate
.ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0) - Explain GPU selection (
forces CPU) and howgpu=-1
names align with column IDs in the bulk matrix.bulk_group
- Instantiate
- Estimate cell fractions
- Call
to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions.model.predicted_fraction() - Encourage saving the fraction table for downstream reporting (
).df.to_csv(...)
- Call
- Preprocess for beta-VAE
- Execute
,model.bulk_preprocess_lazy()
, andmodel.single_preprocess_lazy()
to produce matched feature spaces.model.prepare_input() - Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
- Execute
- Train or load the beta-VAE
- Train with
.model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg') - Mention early stopping via
and how to resume by reloading weights withpatience
.model.load('.../dg_vae.pth') - Use
to monitor convergence.model.plot_loss()
- Train with
- Generate and filter synthetic cells
- Produce an AnnData using
and reduce noise throughmodel.generate()
.model.filtered(generate_adata, leiden_size=25) - Store the filtered AnnData (
) for reuse, noting it contains PCA embeddings in.write_h5ad
.obsm['X_pca']
- Produce an AnnData using
- Benchmark against the reference atlas
- Plot cell-type compositions with
for both generated and reference data.ov.bulk2single.bulk2single_plot_cellprop(...) - Assess correlation using
.ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters') - Embed with
and visualise viagenerate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])
.ov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette())
- Plot cell-type compositions with
- Troubleshooting tips
- If marker selection fails, increase
or provide a curated marker list.top_marker_num - Alignment errors typically stem from mismatched
names—double-check column IDs in the bulk matrix.bulk_group - Training on CPU can take several hours; advise switching
to an available CUDA device for speed.gpu
- If marker selection fails, increase
Examples
- "Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
- "Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
- "Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
References
- Tutorial notebook:
t_bulk2single.ipynb - Example data and weights:
omicverse_guide/docs/Tutorials-bulk2single/data/ - Quick copy/paste commands:
reference.md