install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/alternative-splicing/splicing-quantification" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-splicing-quantific && rm -rf "$T"
manifest:
Skills/Transcriptomics/alternative-splicing/splicing-quantification/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-splicing-quantification description: Quantifies alternative splicing events (PSI/percent spliced in) from RNA-seq using SUPPA2 from transcript TPM or rMATS-turbo from BAM files. Calculates inclusion levels for skipped exons, alternative splice sites, mutually exclusive exons, and retained introns. Use when measuring splice site usage or isoform ratios from RNA-seq data. tool_type: python primary_tool: SUPPA2 measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Splicing Quantification
Quantify alternative splicing events as PSI (percent spliced in) values from RNA-seq data.
Event Types
| Type | Code | Description |
|---|---|---|
| Skipped exon | SE | Exon inclusion/exclusion |
| Alternative 5' splice site | A5SS | Alternative donor site |
| Alternative 3' splice site | A3SS | Alternative acceptor site |
| Mutually exclusive exons | MXE | One of two exons included |
| Retained intron | RI | Intron retention |
Tool Selection
SUPPA2 (transcript TPM-based)
- Input: Transcript TPM from Salmon/kallisto
- Faster, requires transcript quantification
- Better for isoform-level analysis
rMATS-turbo (BAM-based)
- Input: Aligned BAM files
- Junction read counting
- Better for novel junction discovery
SUPPA2 Workflow
import subprocess import pandas as pd gtf_file = 'annotation.gtf' tpm_file = 'transcript_tpm.tsv' output_prefix = 'events' # Step 1: Generate splicing events from annotation subprocess.run([ 'suppa.py', 'generateEvents', '-i', gtf_file, '-o', output_prefix, '-f', 'ioe', # IOE format for PSI calculation '-e', 'SE', 'SS', 'MX', 'RI', 'FL' # All event types ], check=True) # Step 2: Calculate PSI values for event_type in ['SE', 'A5', 'A3', 'MX', 'RI']: ioe_file = f'{output_prefix}_{event_type}_strict.ioe' subprocess.run([ 'suppa.py', 'psiPerEvent', '-i', ioe_file, '-e', tpm_file, '-o', f'psi_{event_type}' ], check=True) # Load and examine PSI values psi_se = pd.read_csv('psi_SE.psi', sep='\t', index_col=0) print(f'Quantified {len(psi_se)} skipped exon events') print(psi_se.head())
rMATS-turbo Workflow
# rMATS-turbo for BAM-based quantification rmats.py \ --b1 condition1_bams.txt \ --b2 condition2_bams.txt \ --gtf annotation.gtf \ -t paired \ --readLength 150 \ --nthread 8 \ --od output_dir \ --tmp tmp_dir \ --statoff # Use for quantification only, no differential testing
import pandas as pd # Load rMATS output se_jc = pd.read_csv('output_dir/SE.MATS.JC.txt', sep='\t') # Calculate average PSI across samples # IncLevel columns contain PSI values per sample inc_cols = [c for c in se_jc.columns if c.startswith('IncLevel')] se_jc['mean_PSI'] = se_jc[inc_cols].mean(axis=1) # Filter for reliable events (sufficient junction reads) # Minimum 10-20 junction reads recommended for reliable PSI se_jc['total_junction_reads'] = se_jc['IJC_SAMPLE_1'] + se_jc['SJC_SAMPLE_1'] reliable_events = se_jc[se_jc['total_junction_reads'] >= 20] print(f'{len(reliable_events)} events with sufficient coverage')
Quality Thresholds
| Metric | Threshold | Rationale |
|---|---|---|
| Junction reads | >= 10-20 | Minimum for reliable PSI estimation |
| PSI range | 0.1-0.9 | Events outside this range are nearly constitutive |
| Missing values | < 50% samples | High missingness indicates low expression |
Output Interpretation
PSI values range from 0 to 1:
- PSI = 1.0: Event fully included (e.g., exon always present)
- PSI = 0.5: Equal inclusion/exclusion
- PSI = 0.0: Event fully excluded (e.g., exon always skipped)
Related Skills
- differential-splicing - Compare PSI between conditions
- rna-quantification/alignment-free-quant - Generate transcript TPM for SUPPA2
- read-alignment/star-alignment - Align reads with junction detection