LLMs-Universal-Life-Science-and-Clinical-Skills- splicing-quantification

<!--

install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/alternative-splicing/splicing-quantification" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-splicing-quantific && rm -rf "$T"
manifest: Skills/Transcriptomics/alternative-splicing/splicing-quantification/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-splicing-quantification description: Quantifies alternative splicing events (PSI/percent spliced in) from RNA-seq using SUPPA2 from transcript TPM or rMATS-turbo from BAM files. Calculates inclusion levels for skipped exons, alternative splice sites, mutually exclusive exons, and retained introns. Use when measuring splice site usage or isoform ratios from RNA-seq data. tool_type: python primary_tool: SUPPA2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Splicing Quantification

Quantify alternative splicing events as PSI (percent spliced in) values from RNA-seq data.

Event Types

TypeCodeDescription
Skipped exonSEExon inclusion/exclusion
Alternative 5' splice siteA5SSAlternative donor site
Alternative 3' splice siteA3SSAlternative acceptor site
Mutually exclusive exonsMXEOne of two exons included
Retained intronRIIntron retention

Tool Selection

SUPPA2 (transcript TPM-based)

  • Input: Transcript TPM from Salmon/kallisto
  • Faster, requires transcript quantification
  • Better for isoform-level analysis

rMATS-turbo (BAM-based)

  • Input: Aligned BAM files
  • Junction read counting
  • Better for novel junction discovery

SUPPA2 Workflow

import subprocess
import pandas as pd

gtf_file = 'annotation.gtf'
tpm_file = 'transcript_tpm.tsv'
output_prefix = 'events'

# Step 1: Generate splicing events from annotation
subprocess.run([
    'suppa.py', 'generateEvents',
    '-i', gtf_file,
    '-o', output_prefix,
    '-f', 'ioe',  # IOE format for PSI calculation
    '-e', 'SE', 'SS', 'MX', 'RI', 'FL'  # All event types
], check=True)

# Step 2: Calculate PSI values
for event_type in ['SE', 'A5', 'A3', 'MX', 'RI']:
    ioe_file = f'{output_prefix}_{event_type}_strict.ioe'
    subprocess.run([
        'suppa.py', 'psiPerEvent',
        '-i', ioe_file,
        '-e', tpm_file,
        '-o', f'psi_{event_type}'
    ], check=True)

# Load and examine PSI values
psi_se = pd.read_csv('psi_SE.psi', sep='\t', index_col=0)
print(f'Quantified {len(psi_se)} skipped exon events')
print(psi_se.head())

rMATS-turbo Workflow

# rMATS-turbo for BAM-based quantification
rmats.py \
    --b1 condition1_bams.txt \
    --b2 condition2_bams.txt \
    --gtf annotation.gtf \
    -t paired \
    --readLength 150 \
    --nthread 8 \
    --od output_dir \
    --tmp tmp_dir \
    --statoff  # Use for quantification only, no differential testing
import pandas as pd

# Load rMATS output
se_jc = pd.read_csv('output_dir/SE.MATS.JC.txt', sep='\t')

# Calculate average PSI across samples
# IncLevel columns contain PSI values per sample
inc_cols = [c for c in se_jc.columns if c.startswith('IncLevel')]
se_jc['mean_PSI'] = se_jc[inc_cols].mean(axis=1)

# Filter for reliable events (sufficient junction reads)
# Minimum 10-20 junction reads recommended for reliable PSI
se_jc['total_junction_reads'] = se_jc['IJC_SAMPLE_1'] + se_jc['SJC_SAMPLE_1']
reliable_events = se_jc[se_jc['total_junction_reads'] >= 20]
print(f'{len(reliable_events)} events with sufficient coverage')

Quality Thresholds

MetricThresholdRationale
Junction reads>= 10-20Minimum for reliable PSI estimation
PSI range0.1-0.9Events outside this range are nearly constitutive
Missing values< 50% samplesHigh missingness indicates low expression

Output Interpretation

PSI values range from 0 to 1:

  • PSI = 1.0: Event fully included (e.g., exon always present)
  • PSI = 0.5: Equal inclusion/exclusion
  • PSI = 0.0: Event fully excluded (e.g., exon always skipped)

Related Skills

  • differential-splicing - Compare PSI between conditions
  • rna-quantification/alignment-free-quant - Generate transcript TPM for SUPPA2
  • read-alignment/star-alignment - Align reads with junction detection
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->