LLMs-Universal-Life-Science-and-Clinical-Skills- differential-splicing

<!--

install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Transcriptomics/alternative-splicing/differential-splicing" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-differential-splic && rm -rf "$T"
manifest: Skills/Transcriptomics/alternative-splicing/differential-splicing/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-differential-splicing description: Detects differential alternative splicing between conditions using rMATS-turbo (BAM-based) or SUPPA2 diffSplice (TPM-based). Reports events with FDR-corrected significance and delta PSI effect sizes. Use when comparing splicing patterns between treatment groups, tissues, or disease states. tool_type: mixed primary_tool: rMATS-turbo measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Differential Splicing

Detect differential alternative splicing events between experimental conditions.

Tool Comparison

ToolInputApproachStrengths
rMATS-turboBAMJunction countingNovel junctions, statistical model
SUPPA2TPMTranscript ratiosSpeed, isoform-aware
leafcutterBAMIntron clusteringNovel events, no annotation bias

rMATS-turbo Analysis

# Create sample lists (one BAM path per line)
# condition1_bams.txt: /path/to/sample1.bam, /path/to/sample2.bam, ...
# condition2_bams.txt: /path/to/sample3.bam, /path/to/sample4.bam, ...

rmats.py \
    --b1 condition1_bams.txt \
    --b2 condition2_bams.txt \
    --gtf annotation.gtf \
    -t paired \
    --readLength 150 \
    --nthread 8 \
    --od rmats_output \
    --tmp rmats_tmp
import pandas as pd

# Load results for skipped exons
se = pd.read_csv('rmats_output/SE.MATS.JC.txt', sep='\t')

# Filter significant differential splicing events
# |deltaPSI| > 0.1 (lenient) or > 0.2 (stringent)
# FDR < 0.05
significant = se[
    (se['FDR'] < 0.05) &
    (se['IncLevelDifference'].abs() > 0.1)
].copy()

print(f'{len(significant)} significant SE events')
print(significant[['GeneID', 'geneSymbol', 'IncLevelDifference', 'FDR']].head(10))

# Additional filtering by junction read support
# Require at least 10 reads supporting each junction type
significant = significant[
    (significant['IJC_SAMPLE_1'].str.split(',').apply(lambda x: min(map(int, x))) >= 10) |
    (significant['SJC_SAMPLE_1'].str.split(',').apply(lambda x: min(map(int, x))) >= 10)
]

SUPPA2 Differential Analysis

import subprocess

# Requires PSI files from suppa.py psiPerEvent
# TPM file with samples from both conditions

# Run differential splicing
subprocess.run([
    'suppa.py', 'diffSplice',
    '-m', 'empirical',  # Empirical p-value calculation
    '-i', 'events_SE_strict.ioe',
    '-p', 'condition1.psi', 'condition2.psi',
    '-e', 'condition1.tpm', 'condition2.tpm',
    '-o', 'diff_SE'
], check=True)

# Load results
import pandas as pd
diff = pd.read_csv('diff_SE.dpsi', sep='\t', index_col=0)

# SUPPA2 tends to be more stringent
significant = diff[
    (diff['p-value'] < 0.05) &
    (diff['dPSI'].abs() > 0.1)
]

leafcutter Analysis

library(leafcutter)

# Convert BAMs to junction files
# leafcutter_bam_to_junc.sh uses regtools
system('for bam in *.bam; do
    regtools junctions extract -a 8 -m 50 -s 0 $bam -o ${bam%.bam}.junc
done')

# Create junction file list
writeLines(list.files(pattern = '\\.junc$'), 'juncfiles.txt')

# Cluster introns
system('python leafcutter_cluster_regtools.py -j juncfiles.txt -o leafcutter')

# Run differential analysis
groups <- data.frame(
    sample = c('sample1', 'sample2', 'sample3', 'sample4'),
    group = c('control', 'control', 'treatment', 'treatment')
)
write.table(groups, 'groups.txt', sep = '\t', quote = FALSE, row.names = FALSE)

# Differential intron usage
system('leafcutter_ds.R --num_threads 4 leafcutter_perind_numers.counts.gz groups.txt')

Significance Thresholds

StringencydeltaPSIFDRUse Case
Lenient> 0.1< 0.05Discovery, exploratory
Standard> 0.15< 0.05Publication
Stringent> 0.2< 0.01High-confidence set

Result Prioritization

# Prioritize by effect size and significance
significant['score'] = -np.log10(significant['FDR']) * significant['IncLevelDifference'].abs()
top_events = significant.nlargest(50, 'score')

# Annotate with gene function
# Consider protein domain disruption, NMD sensitivity

Related Skills

  • splicing-quantification - Calculate PSI values first
  • isoform-switching - Functional consequence analysis
  • sashimi-plots - Visualize significant events
  • read-alignment/star-alignment - STAR 2-pass alignment required
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->