OpenClaw-Medical-Skills bulktrajblend-trajectory-interpolation

Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bulk-trajblend-interpolation" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bulktrajblend-trajectory-interpolati && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bulk-trajblend-interpolation" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bulktrajblend-trajectory-interpolati && rm -rf "$T"

manifest: skills/bulk-trajblend-interpolation/SKILL.md

BulkTrajBlend trajectory interpolation

Overview

Invoke this skill when users need to bridge gaps in single-cell developmental trajectories using matched bulk RNA-seq. It follows

t_bulktrajblend.ipynb

, showcasing how BulkTrajBlend deconvolves PDAC bulk samples, identifies overlapping communities with a GNN, and interpolates "interrupted" cell states.

Instructions

Prepare libraries and inputs
- Import
```
omicverse as ov
```
  ,
```
scanpy as sc
```
  ,
```
scvelo as scv
```
  , and helper functions like
```
from omicverse.utils import mde
```
  ; run
```
ov.plot_set()
```
  .
- Load the reference scRNA-seq AnnData (
```
scv.datasets.dentategyrus()
```
  ) and raw bulk counts with
```
ov.utils.read(...)
```
  followed by
```
ov.bulk.Matrix_ID_mapping(...)
```
  for gene ID harmonisation.

Configure BulkTrajBlend

Instantiate

ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters')

Explain that
```
bulk_group
```
names correspond to raw bulk columns and the method expects unscaled counts.

Set beta-VAE expectations
- Call
```
bulktb.vae_configure(cell_target_num=100)
```
  (or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.

Train or load the beta-VAE

Use

bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb')

Highlight resuming with
```
bulktb.vae_load('.../dg_btb_vae.pth')
```
and the need to regenerate cells with consistent random seeds for reproducibility.

Generate synthetic cells
- Produce filtered AnnData via
```
bulktb.vae_generate(leiden_size=25)
```
  and inspect compositions with
```
ov.bulk2single.bulk2single_plot_cellprop(...)
```
  .
- Save outputs to disk for reuse (
```
adata.write_h5ad
```
  ).

Configure and train the GNN

Call

bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...)

to set hyperparameters.

Train using

bulktb.gnn_train()

; reload checkpoints with

bulktb.gnn_load('save_model/gnn.pth')

Generate overlapping community assignments through
```
bulktb.gnn_generate()
```
.

Visualise community structure

Create MDE embeddings:

bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca'])

Plot clusters vs. discovered communities using
```
sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette())
```
and filtered subsets excluding synthetic labels with hyphens.

Interpolate missing states
- Run
```
bulktb.interpolation('OPC')
```
  (replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA).
- Compute embeddings with
```
mde
```
  , visualise with
```
ov.utils.embedding
```
  , and compare to the original atlas.
Analyse trajectories
- Initialise
```
ov.single.pyVIA
```
  on both original and interpolated data to derive pseudotime, followed by
```
get_pseudotime
```
  ,
```
sc.pp.neighbors
```
  ,
```
ov.utils.cal_paga
```
  , and
```
ov.utils.plot_paga
```
  for topology validation.
Troubleshooting tips
- If the VAE collapses (high reconstruction loss), lower
```
learning_rate
```
  or reduce
```
hidden_size
```
  .
- Ensure the same generated dataset is used before calling
```
gnn_train
```
  ; regenerating cells changes the graph and can break checkpoint loading.
- Sparse clusters may need adjusted
```
cell_target_num
```
  thresholds or a smaller
```
leiden_size
```
  filter to retain rare populations.

Examples

"Train BulkTrajBlend on PDAC cohorts, then interpolate missing OPC states in the trajectory."
"Load saved beta-VAE and GNN weights to regenerate overlapping communities and plot cluster vs. nocd labels."
"Run VIA on interpolated cells and compare PAGA graphs with the original scRNA-seq trajectory."

References

Tutorial notebook:
```
t_bulktrajblend.ipynb
```

Example datasets and checkpoints:

omicverse_guide/docs/Tutorials-bulk2single/data/

Quick copy/paste commands:
```
reference.md
```