OpenClaw-Medical-Skills tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-structural-variant-analysis" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-structural-variant-anal && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-structural-variant-analysis" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-structural-variant-anal && rm -rf "$T"
skills/tooluniverse-structural-variant-analysis/SKILL.mdStructural Variant Analysis Workflow
Systematic analysis of structural variants (deletions, duplications, inversions, translocations, complex rearrangements) for clinical genomics interpretation using ACMG-adapted criteria.
KEY PRINCIPLES:
- Report-first approach - Create SV_analysis_report.md FIRST, then populate progressively
- ACMG-style classification - Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign with explicit evidence
- Evidence grading - Grade all findings by confidence level (★★★/★★☆/★☆☆)
- Dosage sensitivity critical - Gene dosage effects drive SV pathogenicity
- Breakpoint precision matters - Exact gene disruption vs dosage-only effects
- Population context essential - gnomAD SVs for frequency assessment
- English-first queries - Always use English terms in tool calls (gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
Problem This Skill Solves
Structural variants (SVs) present unique interpretation challenges:
- Complex molecular consequences - SVs can cause gene dosage changes, gene disruption, gene fusions, position effects
- Size matters - Pathogenicity depends on size, gene content, and breakpoint precision
- Limited databases - Fewer curated SVs in ClinVar compared to SNVs
- Dosage sensitivity - Haploinsufficiency and triplosensitivity are critical but gene-specific
- Population frequency - Large benign CNVs are common; distinguishing pathogenic from benign is challenging
This skill provides: A systematic workflow integrating SV classification, gene content analysis, dosage sensitivity assessment, population frequencies, and ACMG-adapted criteria into clinically actionable interpretations.
Triggers
Use this skill when users:
- Ask about structural variant interpretation
- Have CNV data from array or sequencing
- Ask "is this deletion/duplication pathogenic?"
- Need ACMG classification for SVs
- Want to assess gene dosage effects
- Ask about chromosomal rearrangements
- Have large-scale genomic alterations requiring interpretation
Workflow Overview
┌─────────────────────────────────────────────────────────────────┐ │ STRUCTURAL VARIANT INTERPRETATION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Phase 1: SV IDENTITY & CLASSIFICATION │ │ ├── Normalize SV coordinates (hg19/hg38) │ │ ├── Determine SV type (DEL/DUP/INV/TRA/CPX) │ │ ├── Calculate SV size │ │ └── Assess breakpoint precision │ │ │ │ Phase 2: GENE CONTENT ANALYSIS │ │ ├── Identify genes fully contained in SV │ │ ├── Identify genes with breakpoints (disrupted) │ │ ├── Annotate gene function and disease associations │ │ ├── Identify regulatory elements affected │ │ └── Assess gene orientation (for inversions/translocations) │ │ │ │ Phase 3: DOSAGE SENSITIVITY ASSESSMENT │ │ ├── ClinGen dosage sensitivity scores │ │ │ └─ Haploinsufficiency / Triplosensitivity ratings │ │ ├── DECIPHER haploinsufficiency predictions │ │ ├── pLI scores (gnomAD) for loss-of-function intolerance │ │ ├── OMIM gene-disease associations (dominant/recessive) │ │ └── Known dosage-sensitive genes from literature │ │ │ │ Phase 4: POPULATION FREQUENCY CONTEXT │ │ ├── gnomAD SV database (overlapping SVs) │ │ ├── DGV (Database of Genomic Variants) │ │ ├── ClinVar (known pathogenic/benign SVs) │ │ └── Calculate reciprocal overlap with population SVs │ │ │ │ Phase 5: PATHOGENICITY SCORING │ │ ├── Pathogenicity score (0-10 scale) │ │ │ ├─ Gene content weight (40%) │ │ │ ├─ Dosage sensitivity weight (30%) │ │ │ ├─ Population frequency weight (20%) │ │ │ └─ Inheritance/phenotype match weight (10%) │ │ ├── Apply ACMG SV criteria │ │ └── Generate classification recommendation │ │ │ │ Phase 6: LITERATURE & CLINICAL EVIDENCE │ │ ├── PubMed: Similar SVs, gene disruption studies │ │ ├── DECIPHER: Developmental disorder cases │ │ ├── Clinical case reports │ │ └── Functional evidence for gene dosage effects │ │ │ │ Phase 7: ACMG-ADAPTED CLASSIFICATION │ │ ├── Apply SV-specific evidence codes │ │ ├── Calculate final classification │ │ ├── Identify limiting factors │ │ └── Generate clinical recommendations │ │ │ └─────────────────────────────────────────────────────────────────┘
Phase Details
Phase 1: SV Identity & Classification
Goal: Standardize SV notation and classify type
SV Types:
| Type | Abbreviation | Description | Molecular Effect |
|---|---|---|---|
| Deletion | DEL | Loss of genomic segment | Haploinsufficiency, gene disruption |
| Duplication | DUP | Gain of genomic segment | Triplosensitivity, gene dosage imbalance |
| Inversion | INV | Segment flipped in orientation | Gene disruption at breakpoints, position effects |
| Translocation | TRA | Segment moved to different chromosome | Gene fusions, disruption, position effects |
| Complex | CPX | Multiple rearrangement types | Variable effects |
Key Information to Capture:
- Chromosome(s) involved
- Coordinates (start, end) in hg19/hg38
- SV size (bp or Mb)
- SV type (DEL/DUP/INV/TRA/CPX)
- Breakpoint precision (±50bp, ±1kb, etc.)
- Inheritance pattern (de novo, inherited, unknown)
Example:
SV: arr[GRCh38] 17q21.31(44039927-44352659)x1 - Type: Deletion (heterozygous) - Size: 313 kb - Genes: MAPT, KANSL1 (fully contained) - Breakpoints: Well-defined (array resolution ±5kb)
Phase 2: Gene Content Analysis
Goal: Comprehensive annotation of genes affected by SV
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| Gene structure, coordinates | Gene boundaries, exons, transcripts |
| Gene information | Official symbol, aliases, description |
| Gene function | Biological process, molecular function |
, | Disease associations | Inheritance, clinical features |
| Gene-disease associations | Evidence scores |
Gene Categories:
-
Fully contained genes - Entire gene within SV boundaries
- Deletion: Complete loss of one copy (haploinsufficiency)
- Duplication: Extra copy (triplosensitivity)
-
Partially disrupted genes - Breakpoint within gene
- Likely loss-of-function for affected allele
- Check if critical domains disrupted
-
Flanking genes - Within 1 Mb of breakpoints
- May be affected by position effects
- Regulatory disruption possible
Example Gene Content Analysis:
def analyze_gene_content(tu, chrom, sv_start, sv_end, sv_type): """ Identify and annotate all genes within SV region. """ genes = { 'fully_contained': [], 'partially_disrupted': [], 'flanking': [] } # Use Ensembl to find overlapping genes # This is pseudocode - actual implementation depends on available tools for gene in genes_in_region: gene_start = gene['start'] gene_end = gene['end'] # Classify gene relationship to SV if gene_start >= sv_start and gene_end <= sv_end: # Fully contained gene_info = annotate_gene(tu, gene['symbol']) genes['fully_contained'].append(gene_info) elif (gene_start < sv_start < gene_end) or (gene_start < sv_end < gene_end): # Partially disrupted gene_info = annotate_gene(tu, gene['symbol']) genes['partially_disrupted'].append(gene_info) elif abs(gene_start - sv_end) < 1000000 or abs(gene_end - sv_start) < 1000000: # Flanking (within 1 Mb) gene_info = annotate_gene(tu, gene['symbol']) genes['flanking'].append(gene_info) return genes def annotate_gene(tu, gene_symbol): """ Comprehensive gene annotation. """ # OMIM associations omim = tu.tools.OMIM_search( operation="search", query=gene_symbol, limit=5 ) # DisGeNET associations disgenet = tu.tools.DisGeNET_search_gene( operation="search_gene", gene=gene_symbol, limit=10 ) # Gene Ontology # Note: Need gene ID first ncbi = tu.tools.NCBI_gene_search( term=gene_symbol, organism="human" ) return { 'symbol': gene_symbol, 'omim': omim, 'disgenet': disgenet, 'ncbi': ncbi }
Report Section:
### 2.1 Fully Contained Genes (Complete Dosage Effect) | Gene | Function | Disease Association | Inheritance | Evidence | |------|----------|---------------------|-------------|----------| | **MAPT** | Microtubule-associated protein tau | Frontotemporal dementia (AD) | Autosomal Dominant | ★★★ | | **KANSL1** | Histone acetyltransferase complex | Koolen-De Vries syndrome (AD) | Autosomal Dominant | ★★★ | **Interpretation**: Deletion results in haploinsufficiency of two dosage-sensitive genes. KANSL1 haploinsufficiency is the primary cause of pathogenicity. *Sources: OMIM, DisGeNET, Ensembl* ### 2.2 Partially Disrupted Genes (Breakpoint Within Gene) | Gene | Breakpoint Location | Effect | Critical Domains Lost | |------|-------------------|--------|----------------------| | **NF1** | Intron 28 of 58 | 5' portion deleted | Yes - GTPase-activating domain | **Interpretation**: Breakpoint disrupts NF1 coding sequence, likely resulting in loss-of-function. NF1 is haploinsufficient (causes neurofibromatosis type 1). ### 2.3 Flanking Genes (Potential Position Effects) | Gene | Distance from SV | Regulatory Risk | Evidence | |------|------------------|-----------------|----------| | **KCNJ2** | 450 kb upstream | Low | ★☆☆ | **Note**: Position effects are possible but less common. Consider if phenotype unexplained by contained genes.
Phase 3: Dosage Sensitivity Assessment
Goal: Determine if affected genes are dosage-sensitive
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| Gold standard curation | HI/TS scores (0-3) |
| Gene-disease validity | Definitive/Strong/Moderate |
(pLI) | Loss-of-function intolerance | pLI score (0-1) |
| Developmental disorders | Patient phenotypes with similar SVs |
| Inheritance pattern | AD/AR indicates dosage sensitivity |
ClinGen Dosage Sensitivity Scores:
| Score | Haploinsufficiency (HI) | Triplosensitivity (TS) | Interpretation |
|---|---|---|---|
| 3 | Sufficient evidence | Sufficient evidence | Gene IS dosage-sensitive |
| 2 | Emerging evidence | Emerging evidence | Likely dosage-sensitive |
| 1 | Little evidence | Little evidence | Insufficient evidence |
| 0 | No evidence | No evidence | No established dosage sensitivity |
pLI Score Interpretation (gnomAD):
| pLI Range | Interpretation | LoF Intolerance |
|---|---|---|
| ≥0.9 | Extremely intolerant | High - likely haploinsufficient |
| 0.5-0.9 | Moderately intolerant | Moderate |
| <0.5 | Tolerant | Low - likely NOT haploinsufficient |
Implementation:
def assess_dosage_sensitivity(tu, gene_list): """ Assess dosage sensitivity for all genes in SV. Returns dosage scores and interpretation. """ dosage_data = [] for gene_symbol in gene_list: # 1. ClinGen dosage sensitivity (gold standard) clingen = tu.tools.ClinGen_search_dosage_sensitivity( gene=gene_symbol ) hi_score = None ts_score = None if clingen.get('data'): for entry in clingen['data']: hi_score = entry.get('Haploinsufficiency Score') ts_score = entry.get('Triplosensitivity Score') break # 2. ClinGen gene validity (supports dosage sensitivity) validity = tu.tools.ClinGen_search_gene_validity( gene=gene_symbol ) validity_level = None if validity.get('data'): for entry in validity['data']: validity_level = entry.get('Classification') break # 3. pLI score from gnomAD (if available via gene search) # Note: May need to use myvariant or other tools # pli_score = get_pli_score(tu, gene_symbol) # 4. OMIM inheritance pattern omim = tu.tools.OMIM_search( operation="search", query=gene_symbol, limit=3 ) inheritance_pattern = None if omim.get('data', {}).get('entries'): for entry in omim['data']['entries']: mim = entry.get('mimNumber') details = tu.tools.OMIM_get_entry( operation="get_entry", mim_number=str(mim) ) # Extract inheritance from details # inheritance_pattern = parse_inheritance(details) # Integrate evidence dosage_assessment = { 'gene': gene_symbol, 'hi_score': hi_score, 'ts_score': ts_score, 'validity_level': validity_level, 'inheritance': inheritance_pattern, 'is_dosage_sensitive': (hi_score == '3' or ts_score == '3'), 'evidence_grade': calculate_evidence_grade(hi_score, ts_score, validity_level) } dosage_data.append(dosage_assessment) return dosage_data def calculate_evidence_grade(hi_score, ts_score, validity): """ Calculate evidence grade for dosage sensitivity. """ if (hi_score == '3' or ts_score == '3') and validity == 'Definitive': return '★★★' # High confidence elif (hi_score in ['2', '3'] or ts_score in ['2', '3']): return '★★☆' # Moderate confidence else: return '★☆☆' # Low confidence
Report Section:
### 3. Dosage Sensitivity Assessment #### Haploinsufficient Genes (Deletions/Disruptions) | Gene | ClinGen HI Score | pLI | Validity | Disease | Evidence | |------|-----------------|-----|----------|---------|----------| | **KANSL1** | 3 (Sufficient) | 0.99 | Definitive | Koolen-De Vries syndrome | ★★★ | | **MAPT** | 2 (Emerging) | 0.85 | Strong | FTD (rare) | ★★☆ | **Interpretation**: KANSL1 has definitive evidence for haploinsufficiency. Deletion of one copy is expected to cause Koolen-De Vries syndrome (intellectual disability, hypotonia, distinctive facial features). *Sources: ClinGen Dosage Sensitivity Map, gnomAD pLI* #### Triplosensitive Genes (Duplications) | Gene | ClinGen TS Score | Disease Mechanism | Evidence | |------|-----------------|-------------------|----------| | **MECP2** | 3 (Sufficient) | MECP2 duplication syndrome | ★★★ | | **PMP22** | 3 (Sufficient) | Charcot-Marie-Tooth 1A | ★★★ | **Note**: For this deletion, triplosensitivity is not applicable. Listed for reference. #### Non-Dosage-Sensitive Genes | Gene | HI Score | TS Score | Interpretation | |------|----------|----------|----------------| | **GENE_X** | 0 | 0 | No established dosage sensitivity | | **GENE_Y** | 1 | 1 | Insufficient evidence | **Interpretation**: These genes lack evidence for dosage sensitivity. Deletion/duplication less likely to be pathogenic solely due to these genes.
Phase 4: Population Frequency Context
Goal: Determine if SV is common in general population (likely benign) or rare (supports pathogenicity)
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
| Population SV frequencies | Overlapping SVs, frequencies |
| Known pathogenic/benign SVs | Classification, review status |
| Patient SVs with phenotypes | Case reports, phenotype similarity |
Frequency Interpretation (adapted from ACMG):
| SV Frequency | ACMG Code | Interpretation |
|---|---|---|
| ≥1% in gnomAD SVs | BA1 (Stand-alone Benign) | Too common for rare disease |
| 0.1-1% | BS1 (Strong Benign) | Likely benign common variant |
| <0.01% | PM2 (Supporting Pathogenic) | Rare, supports pathogenicity |
| Absent | PM2 (Supporting) | Very rare, supports pathogenicity |
Reciprocal Overlap Calculation:
For proper comparison, calculate reciprocal overlap between query SV and population SV:
Reciprocal Overlap = min(overlap_with_A, overlap_with_B) where: overlap_with_A = (overlap length) / (SV_A length) overlap_with_B = (overlap length) / (SV_B length) Threshold: ≥70% reciprocal overlap = "same" SV
Implementation:
def assess_population_frequency(tu, chrom, sv_start, sv_end, sv_type): """ Check population databases for overlapping SVs. """ # 1. Check ClinVar for known pathogenic/benign SVs clinvar = tu.tools.ClinVar_search_variants( chromosome=str(chrom), start=sv_start, stop=sv_end, variant_type=sv_type.upper() ) known_svs = [] if clinvar.get('data'): for variant in clinvar['data']: classification = variant.get('clinical_significance') known_svs.append({ 'database': 'ClinVar', 'classification': classification, 'review_status': variant.get('review_status'), 'coordinates': f"{variant.get('chromosome')}:{variant.get('start')}-{variant.get('stop')}" }) # 2. gnomAD SVs (if available) # Note: gnomAD SV database may not have direct API access via ToolUniverse # May need to use genomic coordinate search # 3. DECIPHER for similar patient cases decipher_search = tu.tools.DECIPHER_search( query=f"chr{chrom}:{sv_start}-{sv_end}", search_type="region" ) patient_cases = [] if decipher_search.get('data'): patient_cases = decipher_search['data'] return { 'clinvar_matches': known_svs, 'decipher_cases': patient_cases, 'frequency_interpretation': interpret_frequency(known_svs) } def interpret_frequency(known_svs): """ Interpret frequency based on ClinVar matches. """ if any(sv['classification'] == 'Benign' for sv in known_svs): return { 'acmg_code': 'BA1 or BS1', 'interpretation': 'Likely benign based on ClinVar benign classification', 'evidence_grade': '★★★' } elif any(sv['classification'] == 'Pathogenic' for sv in known_svs): return { 'acmg_code': 'PS1', 'interpretation': 'Pathogenic based on ClinVar pathogenic classification', 'evidence_grade': '★★★' } else: return { 'acmg_code': 'PM2', 'interpretation': 'Rare variant, not found in ClinVar or population databases', 'evidence_grade': '★★☆' }
Report Section:
### 4. Population Frequency Context #### ClinVar Matches (Overlapping SVs) | VCV ID | Classification | Size | Overlap | Review Status | Genes | |--------|----------------|------|---------|---------------|-------| | VCV000012345 | Pathogenic | 320 kb | 95% reciprocal | ★★★ Reviewed by expert panel | KANSL1, MAPT | **Match Found**: Query deletion has 95% reciprocal overlap with known pathogenic deletion in ClinVar (VCV000012345). This is the Koolen-De Vries syndrome deletion. **ACMG Code**: **PS1** (Strong) - Same genomic region as established pathogenic SV *Source: ClinVar via `ClinVar_search_variants`* #### gnomAD SV Database **Search Result**: No overlapping deletions found in gnomAD SV v4.0 (>10,000 genomes) **Interpretation**: Absence from gnomAD supports rarity and pathogenic potential. **ACMG Code**: **PM2** (Moderate) - Absent from population databases *Note: gnomAD SVs queried via browser (no direct API access)* #### DECIPHER Patient Cases | Case ID | Phenotype | SV Type | Size | Overlap | Similarity | |---------|-----------|---------|------|---------|------------| | 12345 | Intellectual disability, hypotonia | DEL | 315 kb | 98% | High | | 67890 | Developmental delay, facial dysmorphism | DEL | 305 kb | 92% | High | **Phenotype Match**: 8/10 DECIPHER patients have intellectual disability and hypotonia, consistent with Koolen-De Vries syndrome. **ACMG Support**: **PP4** (Supporting) - Patient phenotype consistent with gene's disease association *Source: DECIPHER via `DECIPHER_search`*
Phase 5: Pathogenicity Scoring
Goal: Quantitative pathogenicity assessment (0-10 scale)
Scoring Components:
-
Gene Content (40 points max):
- 10 points per dosage-sensitive gene (HI/TS score 3)
- 5 points per likely dosage-sensitive gene (score 2)
- 2 points per gene with disease association
- Cap at 40 points
-
Dosage Sensitivity Evidence (30 points max):
- 30 points: Multiple genes with definitive HI/TS (score 3)
- 20 points: One gene with definitive HI/TS
- 10 points: Genes with emerging evidence (score 2)
- 5 points: Predicted haploinsufficiency (pLI >0.9)
-
Population Frequency (20 points max):
- 20 points: Absent from gnomAD, DGV
- 10 points: Rare (<0.01%)
- 0 points: Common (>0.1%)
- -20 points: Very common (>1%) - likely benign
-
Clinical Evidence (10 points max):
- 10 points: Matching ClinVar pathogenic SV
- 8 points: DECIPHER cases with matching phenotype
- 5 points: Literature support for gene dosage effects
- 3 points: Phenotype consistent with genes
Pathogenicity Score Interpretation:
| Score | Classification | Confidence | Interpretation |
|---|---|---|---|
| 9-10 | Pathogenic | ★★★ | High confidence pathogenic |
| 7-8 | Likely Pathogenic | ★★☆ | Strong evidence for pathogenicity |
| 4-6 | VUS | ★☆☆ | Uncertain significance |
| 2-3 | Likely Benign | ★★☆ | Strong evidence for benign |
| 0-1 | Benign | ★★★ | High confidence benign |
Implementation:
def calculate_pathogenicity_score(gene_content, dosage_data, frequency_data, clinical_data): """ Calculate comprehensive pathogenicity score (0-10 scale). """ score = 0 breakdown = {} # 1. Gene content scoring (40 points max) gene_score = 0 for gene in gene_content['fully_contained'] + gene_content['partially_disrupted']: dosage_info = next((d for d in dosage_data if d['gene'] == gene['symbol']), None) if dosage_info: if dosage_info['hi_score'] == '3': gene_score += 10 elif dosage_info['hi_score'] == '2': gene_score += 5 elif gene.get('omim_disease'): gene_score += 2 gene_score = min(gene_score, 40) # Cap at 40 breakdown['gene_content'] = gene_score / 40 * 4 # Scale to 0-4 # 2. Dosage sensitivity scoring (30 points max) dosage_score = 0 definitive_genes = sum(1 for d in dosage_data if d['hi_score'] == '3') if definitive_genes >= 2: dosage_score = 30 elif definitive_genes == 1: dosage_score = 20 else: emerging_genes = sum(1 for d in dosage_data if d['hi_score'] == '2') dosage_score = emerging_genes * 5 dosage_score = min(dosage_score, 30) breakdown['dosage_sensitivity'] = dosage_score / 30 * 3 # Scale to 0-3 # 3. Population frequency scoring (20 points max) freq_score = 0 if frequency_data.get('frequency') is None: freq_score = 20 # Absent elif frequency_data['frequency'] < 0.0001: freq_score = 10 # Rare elif frequency_data['frequency'] < 0.001: freq_score = 5 # Uncommon elif frequency_data['frequency'] > 0.01: freq_score = -20 # Common - likely benign breakdown['population_frequency'] = freq_score / 20 * 2 # Scale to -2 to 2 # 4. Clinical evidence scoring (10 points max) clinical_score = 0 if clinical_data.get('clinvar_pathogenic'): clinical_score = 10 elif clinical_data.get('decipher_matching_phenotype'): clinical_score = 8 elif clinical_data.get('literature_support'): clinical_score = 5 clinical_score = min(clinical_score, 10) breakdown['clinical_evidence'] = clinical_score / 10 * 1 # Scale to 0-1 # Total score (0-10 scale) total_score = breakdown['gene_content'] + breakdown['dosage_sensitivity'] + \ breakdown['population_frequency'] + breakdown['clinical_evidence'] total_score = max(0, min(10, total_score)) # Ensure 0-10 range return { 'total_score': round(total_score, 1), 'breakdown': breakdown, 'classification': classify_score(total_score) } def classify_score(score): """Map score to ACMG-style classification.""" if score >= 9: return 'Pathogenic' elif score >= 7: return 'Likely Pathogenic' elif score >= 4: return 'VUS' elif score >= 2: return 'Likely Benign' else: return 'Benign'
Report Section:
### 5. Pathogenicity Scoring #### Quantitative Assessment (0-10 Scale) | Component | Points | Max | Contribution | Rationale | |-----------|--------|-----|-------------|-----------| | **Gene Content** | 4.0 | 4 | 40% | KANSL1 (HI score 3), MAPT (HI score 2) | | **Dosage Sensitivity** | 2.5 | 3 | 25% | One definitive HI gene (KANSL1) | | **Population Frequency** | 2.0 | 2 | 20% | Absent from gnomAD SVs | | **Clinical Evidence** | 1.0 | 1 | 10% | ClinVar pathogenic match | | **Total Score** | **9.5** | 10 | 100% | | **Classification**: **Pathogenic** (★★★ High Confidence) **Interpretation**: Score of 9.5/10 indicates high confidence pathogenic SV. Deletion encompasses established haploinsufficient gene (KANSL1), absent from population databases, and matches known pathogenic ClinVar variant. #### Score Breakdown Visualization
Gene Content: ████████████████████████████████████████ 4.0/4 Dosage Sensitivity: ██████████████████████████░░░░░░░░░░░░░ 2.5/3 Population Freq: ████████████████████████████████████████ 2.0/2 Clinical Evidence: ██████████████████████████████████████░░ 1.0/1 ───────────────────────────────────────── Total: ██████████████████████████████████████░░ 9.5/10
**Key Drivers of Pathogenicity**: 1. KANSL1 haploinsufficiency (definitive evidence) 2. Exact match to known pathogenic deletion 3. Absence from population databases 4. Phenotype consistency with Koolen-De Vries syndrome
Phase 6: Literature & Clinical Evidence
Goal: Find case reports, functional studies, and clinical validation
Tools:
| Tool | Purpose | Coverage |
|---|---|---|
| Peer-reviewed literature | Comprehensive |
| Patient case database | Developmental disorders |
| European literature | Additional coverage |
Search Strategies:
def comprehensive_literature_search(tu, genes, sv_type, phenotype): """ Search literature for SV evidence. """ # 1. Gene-specific searches literature = [] for gene in genes: # Dosage sensitivity literature dosage_papers = tu.tools.PubMed_search( query=f'"{gene}" AND (haploinsufficiency OR dosage sensitivity OR deletion syndrome)', max_results=20 ) # Case reports case_papers = tu.tools.PubMed_search( query=f'"{gene}" AND deletion AND {phenotype}', max_results=15 ) literature.append({ 'gene': gene, 'dosage_papers': dosage_papers, 'case_reports': case_papers }) # 2. SV-specific searches if sv_type == 'DEL': sv_papers = tu.tools.PubMed_search( query=f'deletion AND {" AND ".join(genes[:3])} AND syndrome', max_results=25 ) # 3. DECIPHER cases decipher_cases = [] for gene in genes: cases = tu.tools.DECIPHER_search( query=gene, search_type="gene" ) decipher_cases.append(cases) return { 'gene_literature': literature, 'sv_literature': sv_papers, 'decipher_cases': decipher_cases }
Report Section:
### 6. Literature & Clinical Evidence #### Key Publications | Study | Finding | Evidence Type | PMID | |-------|---------|---------------|------| | Koolen et al., 2006 | Described 17q21.31 microdeletion syndrome | Original description | 16222315 | | Koolen et al., 2008 | KANSL1 haploinsufficiency confirmed | Functional validation | 18394581 | | Zollino et al., 2012 | Phenotype characterization (n=52) | Clinical series | 22736773 | **Key Findings**: - 17q21.31 deletion is recurrent (mediated by LCRs) - KANSL1 haploinsufficiency is primary mechanism - Phenotype: ID (100%), hypotonia (95%), friendly demeanor (85%) - Penetrance: >95% for developmental features *Source: PubMed via `PubMed_search`* #### DECIPHER Patient Cases (n=45) **Phenotype Frequency in DECIPHER Cohort**: | Feature | Frequency | Match to Patient | |---------|-----------|------------------| | Intellectual disability | 45/45 (100%) | ✓ Yes | | Hypotonia | 42/45 (93%) | ✓ Yes | | Feeding difficulties | 38/45 (84%) | ✓ Yes | | Distinctive facies | 40/45 (89%) | ✓ Yes | | Friendly personality | 35/45 (78%) | Unknown | **Phenotype Match**: Patient phenotype highly consistent with DECIPHER cohort (4/4 assessable features present). **ACMG Code**: **PP4** (Supporting) - Patient's clinical features consistent with gene's known phenotype *Source: DECIPHER via `DECIPHER_search`* #### Functional Evidence for KANSL1 Dosage Sensitivity | Study | Model | Finding | PMID | |-------|-------|---------|------| | Koolen et al., 2012 | Patient cells | Reduced KANSL1 protein | 22736773 | | Zollino et al., 2015 | Mouse model | Kansl1+/- recapitulates phenotype | 25607366 | | Arbogast et al., 2017 | Zebrafish | kansl1 knockdown → developmental defects | 28666126 | **Strength of Evidence**: ★★★ (High) - Multiple independent studies confirm haploinsufficiency mechanism **ACMG Code**: **PS3_Moderate** - Well-established functional studies showing dosage sensitivity
Phase 7: ACMG-Adapted Classification
Goal: Apply ACMG/ClinGen criteria adapted for SVs
SV-Specific ACMG Criteria:
Pathogenic Evidence Codes
| Code | Strength | Criteria | SV Application |
|---|---|---|---|
| PVS1 | Very Strong | Null variant in HI gene | Complete deletion of HI gene |
| PS1 | Strong | Same SV as known pathogenic | ≥70% reciprocal overlap with ClinVar pathogenic |
| PS2 | Strong | De novo (maternity/paternity confirmed) | De novo SV in patient with matching phenotype |
| PS3 | Strong | Functional studies | Gene dosage effects demonstrated |
| PS4 | Strong | Case-control enrichment | SV enriched in cases vs controls |
| PM1 | Moderate | Critical region | Deletion of exons in HI gene |
| PM2 | Moderate | Absent from controls | Not in gnomAD SVs, DGV |
| PM3 | Moderate | Recessive: homozygous or compound het | Both alleles affected (rare for SVs) |
| PM4 | Moderate | Protein length change | In-frame deletion/duplication |
| PM5 | Moderate | Similar SVs pathogenic | Nearby SVs in ClinVar pathogenic |
| PM6 | Moderate | De novo (no confirmation) | De novo SV, phenotype consistent |
| PP1 | Supporting | Segregation in family | SV segregates with phenotype |
| PP2 | Supporting | Gene/pathway relevant | Genes in SV match phenotype |
| PP3 | Supporting | Computational evidence | Multiple predictors support haploinsufficiency |
| PP4 | Supporting | Phenotype consistent | Patient phenotype matches gene-disease |
Benign Evidence Codes
| Code | Strength | Criteria | SV Application |
|---|---|---|---|
| BA1 | Stand-Alone | MAF >5% | SV frequency >5% in gnomAD |
| BS1 | Strong | MAF too high for disease | SV frequency >1% |
| BS2 | Strong | Healthy adult with phenotype-associated genotype | SV in healthy individual (careful - reduced penetrance) |
| BS3 | Strong | Functional studies show no effect | No dosage sensitivity demonstrated |
| BS4 | Strong | Non-segregation | SV doesn't segregate with phenotype |
| BP1 | Supporting | Missense in gene without known LOF | N/A for SVs |
| BP2 | Supporting | Observed in trans with pathogenic | SV + pathogenic SNV = compound het (patient unaffected) |
| BP4 | Supporting | Computational evidence benign | Predictors suggest no haploinsufficiency |
| BP5 | Supporting | Found in case with alt cause | Phenotype explained by different variant |
| BP7 | Supporting | Synonymous with no splice effect | N/A for SVs |
Classification Algorithm (ACMG SV Criteria):
| Classification | Evidence Required |
|---|---|
| Pathogenic | PVS1 + PS1; OR 2 Strong; OR 1 Strong + 3 Moderate |
| Likely Pathogenic | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 3 Moderate |
| VUS | Criteria not met; OR conflicting evidence |
| Likely Benign | 1 Strong + 1 Supporting; OR 2 Supporting |
| Benign | BA1; OR BS1 + BS2; OR 2 Strong |
Implementation:
def apply_acmg_criteria(gene_content, dosage_data, frequency_data, clinical_data, inheritance): """ Apply ACMG SV criteria and calculate classification. """ evidence = { 'pathogenic': [], 'benign': [] } # PVS1: Complete deletion of HI gene hi_genes = [d for d in dosage_data if d['hi_score'] == '3'] if len(hi_genes) > 0 and len(gene_content['fully_contained']) > 0: evidence['pathogenic'].append({ 'code': 'PVS1', 'strength': 'Very Strong', 'rationale': f"Complete deletion of haploinsufficient gene(s): {', '.join(g['gene'] for g in hi_genes)}" }) # PS1: Same as known pathogenic SV if clinical_data.get('clinvar_pathogenic_match'): evidence['pathogenic'].append({ 'code': 'PS1', 'strength': 'Strong', 'rationale': f"≥70% overlap with ClinVar pathogenic SV: {clinical_data['clinvar_id']}" }) # PS2: De novo with phenotype match if inheritance == 'de_novo' and clinical_data.get('phenotype_match'): evidence['pathogenic'].append({ 'code': 'PS2', 'strength': 'Strong', 'rationale': "De novo occurrence in patient with consistent phenotype" }) # PS3: Functional studies if clinical_data.get('functional_evidence'): evidence['pathogenic'].append({ 'code': 'PS3', 'strength': 'Strong', 'rationale': "Well-established functional studies demonstrate dosage sensitivity" }) # PM2: Absent from controls if frequency_data.get('frequency') == 0 or frequency_data.get('frequency') is None: evidence['pathogenic'].append({ 'code': 'PM2', 'strength': 'Moderate', 'rationale': "Absent from gnomAD SV database and DGV" }) # PP4: Phenotype consistent if clinical_data.get('phenotype_consistent'): evidence['pathogenic'].append({ 'code': 'PP4', 'strength': 'Supporting', 'rationale': "Patient phenotype highly consistent with gene-disease association" }) # BA1: Common variant if frequency_data.get('frequency', 0) > 0.05: evidence['benign'].append({ 'code': 'BA1', 'strength': 'Stand-Alone', 'rationale': f"Frequency {frequency_data['frequency']:.3f} too high for rare disease" }) # BS1: High frequency if 0.01 < frequency_data.get('frequency', 0) <= 0.05: evidence['benign'].append({ 'code': 'BS1', 'strength': 'Strong', 'rationale': f"Frequency {frequency_data['frequency']:.3f} exceeds expected for disease" }) # Calculate classification classification = determine_classification(evidence) return { 'evidence': evidence, 'classification': classification['class'], 'confidence': classification['confidence'] } def determine_classification(evidence): """ Apply ACMG classification rules. """ path = evidence['pathogenic'] ben = evidence['benign'] # Count evidence by strength very_strong = len([e for e in path if e['strength'] == 'Very Strong']) strong_path = len([e for e in path if e['strength'] == 'Strong']) moderate_path = len([e for e in path if e['strength'] == 'Moderate']) supporting_path = len([e for e in path if e['strength'] == 'Supporting']) standalone_ben = len([e for e in ben if e['strength'] == 'Stand-Alone']) strong_ben = len([e for e in ben if e['strength'] == 'Strong']) supporting_ben = len([e for e in ben if e['strength'] == 'Supporting']) # Benign criteria (takes precedence if strong) if standalone_ben >= 1: return {'class': 'Benign', 'confidence': '★★★'} if strong_ben >= 2: return {'class': 'Benign', 'confidence': '★★★'} if strong_ben >= 1 and supporting_ben >= 1: return {'class': 'Likely Benign', 'confidence': '★★☆'} if supporting_ben >= 2: return {'class': 'Likely Benign', 'confidence': '★★☆'} # Pathogenic criteria if very_strong >= 1 and strong_path >= 1: return {'class': 'Pathogenic', 'confidence': '★★★'} if strong_path >= 2: return {'class': 'Pathogenic', 'confidence': '★★★'} if very_strong >= 1 and moderate_path >= 1: return {'class': 'Likely Pathogenic', 'confidence': '★★☆'} if strong_path >= 1 and moderate_path >= 2: return {'class': 'Likely Pathogenic', 'confidence': '★★☆'} if strong_path >= 1 and moderate_path >= 1 and supporting_path >= 1: return {'class': 'Likely Pathogenic', 'confidence': '★★☆'} if moderate_path >= 3: return {'class': 'Likely Pathogenic', 'confidence': '★☆☆'} # Default to VUS return {'class': 'VUS', 'confidence': '★☆☆'}
Report Section:
### 7. ACMG-Adapted Classification #### Evidence Codes Applied **Pathogenic Evidence**: | Code | Strength | Rationale | |------|----------|-----------| | **PVS1** | Very Strong | Complete deletion of haploinsufficient gene (KANSL1, HI score 3) | | **PS1** | Strong | ≥95% overlap with ClinVar pathogenic deletion (VCV000012345) | | **PM2** | Moderate | Absent from gnomAD SV database (>10,000 genomes) | | **PP4** | Supporting | Patient phenotype consistent with Koolen-De Vries syndrome | **Benign Evidence**: None #### Evidence Summary | Pathogenic | Benign | |------------|--------| | 1 Very Strong (PVS1) | None | | 1 Strong (PS1) | | | 1 Moderate (PM2) | | | 1 Supporting (PP4) | | #### Classification: **PATHOGENIC** ★★★ **Rationale**: Meets ACMG criteria for Pathogenic (1 Very Strong + 1 Strong). Complete deletion of established haploinsufficient gene (KANSL1) with exact match to known pathogenic deletion. **Confidence**: ★★★ (High) - Multiple independent lines of strong evidence #### Classification Certainty Factors ✅ **Strengths**: - Exact match to well-characterized pathogenic deletion - Complete deletion of definitive HI gene (KANSL1) - Absent from population databases - Phenotype highly consistent with gene-disease ⚠ **Limitations**: - None significant - this is a well-established pathogenic SV
Output Structure
Report File: SV_analysis_report.md
SV_analysis_report.md# Structural Variant Analysis Report: [SV_IDENTIFIER] **Generated**: [Date] | **Analyst**: ToolUniverse SV Interpreter --- ## Executive Summary | Field | Value | |-------|-------| | **SV Type** | Deletion / Duplication / Inversion / Translocation | | **Coordinates** | chr17:44039927-44352659 (GRCh38) | | **Size** | 313 kb | | **Gene Content** | 2 genes fully contained, 0 partially disrupted | | **Classification** | Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign | | **Pathogenicity Score** | X.X / 10 | | **Confidence** | ★★★ / ★★☆ / ★☆☆ | | **Key Finding** | [One-sentence summary] | **Clinical Action**: [Required / Recommended / None] --- ## 1. SV Identity & Classification {SV type, coordinates, size, breakpoint precision, inheritance} --- ## 2. Gene Content Analysis ### 2.1 Fully Contained Genes {Table of genes with functions, disease associations} ### 2.2 Partially Disrupted Genes {Genes with breakpoints, domains affected} ### 2.3 Flanking Genes {Genes near breakpoints, position effect risk} --- ## 3. Dosage Sensitivity Assessment ### 3.1 Haploinsufficient Genes {ClinGen HI scores, pLI, evidence} ### 3.2 Triplosensitive Genes {ClinGen TS scores, duplication syndromes} ### 3.3 Non-Dosage-Sensitive Genes {Genes without established dosage effects} --- ## 4. Population Frequency Context ### 4.1 ClinVar Matches {Known pathogenic/benign SVs} ### 4.2 gnomAD SV Database {Population frequencies} ### 4.3 DECIPHER Patient Cases {Similar SVs, phenotype matching} --- ## 5. Pathogenicity Scoring ### 5.1 Quantitative Assessment {0-10 score with breakdown} ### 5.2 Score Components {Gene content, dosage, frequency, clinical} --- ## 6. Literature & Clinical Evidence ### 6.1 Key Publications {Functional studies, case series} ### 6.2 DECIPHER Cohort Analysis {Phenotype frequencies, matching} ### 6.3 Functional Evidence {Gene dosage studies} --- ## 7. ACMG-Adapted Classification ### 7.1 Evidence Codes Applied {Pathogenic and benign codes with rationale} ### 7.2 Classification {Final classification with confidence} ### 7.3 Certainty Factors {Strengths and limitations} --- ## 8. Clinical Recommendations ### 8.1 For Affected Individual {Testing, management, surveillance} ### 8.2 For Family Members {Cascade testing, genetic counseling} ### 8.3 Reproductive Considerations {Recurrence risk, prenatal testing} --- ## 9. Limitations & Uncertainties {Missing data, conflicting evidence, knowledge gaps} --- ## Data Sources {All tools and databases queried with results}
Evidence Grading System
| Symbol | Confidence | Criteria |
|---|---|---|
| ★★★ | High | ClinGen definitive, ClinVar expert reviewed, multiple independent studies |
| ★★☆ | Moderate | ClinGen strong/moderate, single good study, DECIPHER cohort support |
| ★☆☆ | Limited | Computational predictions only, case reports, emerging evidence |
Special Scenarios
Scenario 1: Recurrent Microdeletion Syndrome
Additional considerations:
- Check for recurrence mechanism (LCRs, NAHR)
- Look for founder effects
- Population-specific frequencies
- Incomplete penetrance
- Variable expressivity
Example: 22q11.2 deletion, 17q21.31 deletion (Koolen-De Vries)
Scenario 2: Balanced Translocation (No Gene Disruption)
Assessment approach:
- If no genes disrupted: Likely benign (in most cases)
- Check for cryptic imbalances
- Consider position effects (rare)
- Reproductive risk (unbalanced offspring)
Classification: Usually VUS or Likely Benign unless offspring affected
Scenario 3: Complex Rearrangement
Analysis strategy:
- Break down into component SVs
- Assess each breakpoint independently
- Look for chromothripsis pattern
- Consider cumulative gene dosage effects
- Check for DNA repair defects
Scenario 4: Small In-Frame Deletion/Duplication
Special considerations:
- May not cause haploinsufficiency
- Check if critical domain affected
- Look for similar variants in ClinVar
- Consider protein structural impact
- May need functional studies
Quantified Minimums
| Section | Requirement |
|---|---|
| Gene content | All genes in SV region annotated |
| Dosage sensitivity | ClinGen scores for all genes (if available) |
| Population frequency | Check gnomAD SV + ClinVar + DGV |
| Literature search | ≥2 search strategies (PubMed + DECIPHER) |
| ACMG codes | All applicable codes listed |
Tools Reference
Core Tools for SV Analysis
| Tool | Purpose | Required? |
|---|---|---|
| HI/TS scores | Required |
| Gene-disease validity | Required |
| Known pathogenic/benign SVs | Required |
| Patient cases, phenotypes | Highly recommended |
| Gene coordinates, structure | Required |
, | Gene-disease associations | Required |
| Additional disease associations | Recommended |
| Literature evidence | Recommended |
| Gene function | Supporting |
Report File Naming
SV_analysis_[TYPE]_chr[CHR]_[START]_[END]_[GENES].md Examples: SV_analysis_DEL_chr17_44039927_44352659_KANSL1_MAPT.md SV_analysis_DUP_chr22_17400000_17800000_TBX1.md SV_analysis_INV_chr11_2100000_2400000_complex.md
Clinical Recommendations Framework
For Pathogenic/Likely Pathogenic SVs
| SV Type | Recommendations |
|---|---|
| Deletion (HI gene) | Genetic counseling, cascade testing, phenotype-specific surveillance |
| Duplication (TS gene) | Same as deletion; check for dosage-specific syndrome |
| Translocation (disruption) | Assess both breakpoints, consider reproductive counseling |
| Complex | Multidisciplinary evaluation, research enrollment |
For VUS
| Action | Details |
|---|---|
| Clinical management | Base on phenotype, not genotype |
| Follow-up | Reinterpret in 1-2 years or when phenotype evolves |
| Research | Functional studies if research-grade samples available |
| Family studies | Segregation analysis can reclassify |
For Benign/Likely Benign
| Action | Details |
|---|---|
| Clinical | Not expected to cause rare disease |
| Family | No cascade testing needed (unless recurrent/reproductive risk) |
| Reproductive | Balanced translocation carriers may have offspring risk |
When NOT to Use This Skill
- Single nucleotide variants (SNVs) → Use
skilltooluniverse-variant-interpretation - Small indels (<50 bp) → Use variant interpretation skill
- Somatic variants in cancer → Different framework needed
- Mitochondrial variants → Specialized interpretation required
- Repeat expansions → Different mechanism
Use this skill for structural variants ≥50 bp requiring dosage sensitivity assessment and ACMG-adapted classification.
See Also
- Sample SV interpretationsEXAMPLES.md
- Quick start guideREADME.md
- For SNVs and small indelstooluniverse-variant-interpretation- ClinGen Dosage Sensitivity Map: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/
- ACMG SV Guidelines: Riggs et al., Genet Med 2020 (PMID: 31690835)