LLMs-Universal-Life-Science-and-Clinical-Skills- structural-variants

<!--

install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/long-read-sequencing/structural-variants" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-structural-variant-ecf47b && rm -rf "$T"
manifest: Skills/Genomics/long-read-sequencing/structural-variants/SKILL.md
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-longread-structural-variants description: Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods. tool_type: cli primary_tool: sniffles measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Structural Variant Detection

Sniffles2 - Basic SV Calling

# Call SVs from aligned BAM
sniffles --input aligned.bam \
    --vcf structural_variants.vcf \
    --reference reference.fa \
    --threads 4

Sniffles2 - Common Options

sniffles --input aligned.bam \
    --vcf structural_variants.vcf \
    --reference reference.fa \
    --threads 8 \
    --minsupport 3 \               # Min supporting reads
    --minsvlen 50 \                # Min SV length
    --mapq 20 \                    # Min mapping quality
    --output-rnames \              # Include read names
    --mosaic                       # Detect mosaic SVs

Sniffles2 - Population Calling

# Step 1: Call SVs per sample with SNF output
sniffles --input sample1.bam --snf sample1.snf --reference reference.fa
sniffles --input sample2.bam --snf sample2.snf --reference reference.fa

# Step 2: Merge and genotype
sniffles --input sample1.snf sample2.snf \
    --vcf population_svs.vcf \
    --reference reference.fa

cuteSV - Alternative Caller

# cuteSV SV calling
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --min_support 3 \
    --min_size 50 \
    --genotype

cuteSV - ONT Optimized

# Settings optimized for ONT
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --max_cluster_bias_INS 100 \
    --diff_ratio_merging_INS 0.3 \
    --max_cluster_bias_DEL 100 \
    --diff_ratio_merging_DEL 0.3 \
    --genotype

cuteSV - PacBio HiFi Optimized

# Settings optimized for HiFi
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --max_cluster_bias_INS 1000 \
    --diff_ratio_merging_INS 0.9 \
    --max_cluster_bias_DEL 1000 \
    --diff_ratio_merging_DEL 0.5 \
    --genotype

SVIM - Another Alternative

# SVIM for ONT data
svim alignment output_dir/ aligned.bam reference.fa \
    --insertion_sequences \
    --read_names \
    --sample sample_name

pbsv - PacBio Specific

# Discover signatures
pbsv discover aligned.bam signatures.svsig.gz

# Call SVs
pbsv call reference.fa signatures.svsig.gz structural_variants.vcf

Filter SV Calls

# Filter by quality and size
bcftools filter -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf > svs.filtered.vcf

# Keep only PASS
bcftools view -f PASS svs.vcf > svs.pass.vcf

# Filter specific SV types
bcftools view -i 'SVTYPE="DEL"' svs.vcf > deletions.vcf
bcftools view -i 'SVTYPE="INS"' svs.vcf > insertions.vcf

Merge Multiple Callers

# Use SURVIVOR to merge SV callsets
SURVIVOR merge sample_files.txt 1000 2 1 1 0 50 merged_svs.vcf

# sample_files.txt contains VCF paths, one per line
# Parameters: max_distance, min_callers, type_agree, strand_agree, est_distance, min_size

Annotate SVs

# Annotate with AnnotSV
AnnotSV -SVinputFile svs.vcf \
    -genomeBuild GRCh38 \
    -outputFile annotated_svs

# Or with bcftools
bcftools annotate -a gnomad_sv.vcf.gz -c INFO svs.vcf > svs.annotated.vcf

SV Types

TypeCodeDescription
DeletionDELSequence removed
InsertionINSSequence added
InversionINVSequence inverted
DuplicationDUPSequence duplicated
TranslocationBNDBreakend (complex)

Key Parameters - Sniffles2

ParameterDefaultDescription
--minsupportautoMin supporting reads
--minsvlen50Min SV length
--mapq20Min mapping quality
--referencenoneReference (for INS sequences)
--tandem-repeatsnoneBED of tandem repeats
--mosaicoffDetect mosaic SVs

Key Parameters - cuteSV

ParameterDefaultDescription
--min_support10Min supporting reads
--min_size30Min SV length
--max_size100000Max SV length
--genotypeoffOutput genotypes
--report_readidoffReport read IDs

Coverage Guidelines

CoverageSV Detection
5-10xLarge SVs (>1kb)
10-20xMost SVs
20-30xHigh confidence
>30xMosaic/rare SVs

Related Skills

  • long-read-alignment - Generate input BAM
  • medaka-polishing - Polish assembly with SVs
  • variant-calling/structural-variant-calling - Short-read SV comparison
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->