OpenClaw-Medical-Skills tcga-bulk-data-preprocessing-with-omicverse

Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tcga-preprocessing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tcga-bulk-data-preprocessing-with-om && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tcga-preprocessing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tcga-bulk-data-preprocessing-with-om && rm -rf "$T"
manifest: skills/tcga-preprocessing/SKILL.md
source content

TCGA bulk data preprocessing with omicverse

Overview

Follow this skill to recreate the preprocessing routine from

t_tcga.ipynb
. It automates loading TCGA downloads, generating raw/normalised matrices, initialising metadata, and running survival analyses through
ov.bulk.pyTCGA
.

Instructions

  1. Gather required downloads
    • Confirm the user has:
      • gdc_sample_sheet.<date>.tsv
        from the TCGA Sample Sheet export.
      • The decompressed
        gdc_download_xxxxx
        directory containing expression archives.
      • The
        clinical.cart.<date>
        directory with clinical XML/JSON files.
    • Mention that sample data are available under
      omicverse_guide/docs/Tutorials-bulk/data/TCGA_OV/
      .
  2. Initialise the TCGA helper
    • Import
      omicverse as ov
      (and
      scanpy as sc
      if plotting) then call
      ov.plot_set()
      .
    • Instantiate
      aml_tcga = ov.bulk.pyTCGA(sample_sheet_path, download_dir, clinical_dir)
      .
    • Run
      aml_tcga.adata_init()
      to build the AnnData object with raw counts, FPKM, and TPM layers.
  3. Persist the dataset
    • Encourage saving the initial AnnData:
      aml_tcga.adata.write_h5ad('data/TCGA_OV/ov_tcga_raw.h5ad', compression='gzip')
      .
    • When reloading, reconstruct the class with the same paths and call
      aml_tcga.adata_read(<path>)
      .
  4. Initialise metadata and clinical information
    • Populate sample metadata using
      aml_tcga.adata_meta_init()
      to convert gene IDs to symbols and attach patient info.
    • Add survival attributes via
      aml_tcga.survial_init()
      (note the intentional spelling in the API).
  5. Perform survival analyses
    • Plot gene-level survival curves with
      aml_tcga.survival_analysis('GENE', layer='deseq_normalize', plot=True)
      .
    • To process all genes, call
      aml_tcga.survial_analysis_all()
      ; warn that it may take time.
  6. Export results
    • Save enriched metadata to a new AnnData file (
      aml_tcga.adata.write_h5ad('.../ov_tcga_survial_all.h5ad', compression='gzip')
      ).
    • Suggest exporting summary tables (e.g., survival statistics) if users need to share outputs outside Python.
  7. Troubleshooting tips
    • Ensure TCGA archives are fully extracted; missing XML/TSV files trigger parsing errors.
    • The helper expects matching case IDs between the sample sheet and expression files—direct users to re-download if IDs do not align.
    • Survival plots require clinical dates; if absent, instruct users to check the
      clinical_cart
      contents.

Examples

  • "Read my TCGA OV download, initialise metadata, and plot MYC survival curves using DESeq-normalised counts."
  • "Reload a saved AnnData file, attach survival annotations, and export the updated
    .h5ad
    ."
  • "Run survival analysis for all genes and store the enriched dataset."

References