OpenClaw-Medical-Skills tcga-bulk-data-preprocessing-with-omicverse
Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.
install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tcga-preprocessing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tcga-bulk-data-preprocessing-with-om && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tcga-preprocessing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tcga-bulk-data-preprocessing-with-om && rm -rf "$T"
manifest:
skills/tcga-preprocessing/SKILL.mdsource content
TCGA bulk data preprocessing with omicverse
Overview
Follow this skill to recreate the preprocessing routine from
. It automates loading TCGA downloads, generating raw/normalised matrices, initialising metadata, and running survival analyses through t_tcga.ipynb
ov.bulk.pyTCGA.
Instructions
- Gather required downloads
- Confirm the user has:
from the TCGA Sample Sheet export.gdc_sample_sheet.<date>.tsv- The decompressed
directory containing expression archives.gdc_download_xxxxx - The
directory with clinical XML/JSON files.clinical.cart.<date>
- Mention that sample data are available under
.omicverse_guide/docs/Tutorials-bulk/data/TCGA_OV/
- Confirm the user has:
- Initialise the TCGA helper
- Import
(andomicverse as ov
if plotting) then callscanpy as sc
.ov.plot_set() - Instantiate
.aml_tcga = ov.bulk.pyTCGA(sample_sheet_path, download_dir, clinical_dir) - Run
to build the AnnData object with raw counts, FPKM, and TPM layers.aml_tcga.adata_init()
- Import
- Persist the dataset
- Encourage saving the initial AnnData:
.aml_tcga.adata.write_h5ad('data/TCGA_OV/ov_tcga_raw.h5ad', compression='gzip') - When reloading, reconstruct the class with the same paths and call
.aml_tcga.adata_read(<path>)
- Encourage saving the initial AnnData:
- Initialise metadata and clinical information
- Populate sample metadata using
to convert gene IDs to symbols and attach patient info.aml_tcga.adata_meta_init() - Add survival attributes via
(note the intentional spelling in the API).aml_tcga.survial_init()
- Populate sample metadata using
- Perform survival analyses
- Plot gene-level survival curves with
.aml_tcga.survival_analysis('GENE', layer='deseq_normalize', plot=True) - To process all genes, call
; warn that it may take time.aml_tcga.survial_analysis_all()
- Plot gene-level survival curves with
- Export results
- Save enriched metadata to a new AnnData file (
).aml_tcga.adata.write_h5ad('.../ov_tcga_survial_all.h5ad', compression='gzip') - Suggest exporting summary tables (e.g., survival statistics) if users need to share outputs outside Python.
- Save enriched metadata to a new AnnData file (
- Troubleshooting tips
- Ensure TCGA archives are fully extracted; missing XML/TSV files trigger parsing errors.
- The helper expects matching case IDs between the sample sheet and expression files—direct users to re-download if IDs do not align.
- Survival plots require clinical dates; if absent, instruct users to check the
contents.clinical_cart
Examples
- "Read my TCGA OV download, initialise metadata, and plot MYC survival curves using DESeq-normalised counts."
- "Reload a saved AnnData file, attach survival annotations, and export the updated
.".h5ad - "Run survival analysis for all genes and store the enriched dataset."
References
- Tutorial notebook:
t_tcga.ipynb - Sample dataset:
data/TCGA_OV/ - Quick copy/paste commands:
reference.md