LLMs-Universal-Life-Science-and-Clinical-Skills- genomics-cnv-calling
install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Genomics/genomics-cnv-calling" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-genomics-cnv-calli && rm -rf "$T"
manifest:
Skills/Genomics/genomics-cnv-calling/SKILL.mdsource content
📊 Copy Number Variant Calling
Detect copy number variants from targeted/exome/WGS sequencing data.
Core Capabilities
- CNVkit: Read-depth-based CNV detection for exome/targeted/WGS data
- GATK gCNV: GATK's germline CNV discovery tool
- Control-FREEC: Control-free copy number and LOH caller
CLI Reference
python omicsclaw.py run cnv-calling --demo python omicsclaw.py run cnv-calling --input <tumor.bam> --output <dir>
Algorithm / Methodology
CNVkit Basic Workflow
Goal: Run the complete CNVkit pipeline on a tumor-normal pair.
# Complete pipeline for tumor-normal pair cnvkit.py batch tumor.bam \ --normal normal.bam \ --targets targets.bed \ --fasta reference.fa \ --output-reference my_reference.cnn \ --output-dir results/
Build Reference from Panel of Normals
# Step 1: Build reference from multiple normals (recommended) cnvkit.py batch \ --normal normal1.bam normal2.bam normal3.bam \ --targets targets.bed \ --fasta reference.fa \ --output-reference pooled_reference.cnn # Step 2: Run on tumor samples using pre-built reference cnvkit.py batch tumor1.bam tumor2.bam \ --reference pooled_reference.cnn \ --output-dir results/
Flat Reference (No Matched Normal)
cnvkit.py batch tumor.bam \ --targets targets.bed \ --fasta reference.fa \ --output-reference flat_reference.cnn \ --output-dir results/
WGS Mode
cnvkit.py batch tumor.bam \ --normal normal.bam \ --fasta reference.fa \ --method wgs \ --output-dir results/
Step-by-Step Pipeline
# 1. Generate target and antitarget regions cnvkit.py target targets.bed --annotate refFlat.txt -o targets.target.bed cnvkit.py antitarget targets.bed -o targets.antitarget.bed # 2. Calculate coverage cnvkit.py coverage tumor.bam targets.target.bed -o tumor.targetcoverage.cnn cnvkit.py coverage tumor.bam targets.antitarget.bed -o tumor.antitargetcoverage.cnn cnvkit.py coverage normal.bam targets.target.bed -o normal.targetcoverage.cnn cnvkit.py coverage normal.bam targets.antitarget.bed -o normal.antitargetcoverage.cnn # 3. Build reference cnvkit.py reference normal.targetcoverage.cnn normal.antitargetcoverage.cnn \ --fasta reference.fa -o reference.cnn # 4. Fix, segment, and call cnvkit.py fix tumor.targetcoverage.cnn tumor.antitargetcoverage.cnn reference.cnn -o tumor.cnr cnvkit.py segment tumor.cnr -o tumor.cns cnvkit.py call tumor.cns -o tumor.call.cns
Segmentation Options
# Default CBS (Circular Binary Segmentation) cnvkit.py segment sample.cnr -o sample.cns # HMM for tumor samples (broader state transitions) cnvkit.py segment sample.cnr --method hmm-tumor -o sample.cns # HMM for germline (tighter priors around diploid) cnvkit.py segment sample.cnr --method hmm-germline -o sample.cns
CNV Calling with Ploidy/Purity
cnvkit.py call sample.cns --purity 0.7 --ploidy 2 -o sample.call.cns # With B-allele frequencies (from VCF) cnvkit.py call sample.cns --vcf sample.vcf --purity 0.7 -o sample.call.cns
Visualization
# Scatter plot with segments cnvkit.py scatter sample.cnr -s sample.cns -o sample_scatter.png # Single chromosome cnvkit.py scatter sample.cnr -s sample.cns -c chr17 -o sample_chr17.png # Diagram (ideogram style) cnvkit.py diagram sample.cnr -s sample.cns -o sample_diagram.pdf # Heatmap across samples cnvkit.py heatmap *.cns -o heatmap.pdf
Export Results
cnvkit.py export bed sample.call.cns -o sample.cnv.bed cnvkit.py export vcf sample.call.cns -o sample.cnv.vcf cnvkit.py export seg *.cns -o samples.seg # For GISTIC2
Python API
import cnvlib # Load data cnr = cnvlib.read('sample.cnr') cns = cnvlib.read('sample.cns') # Filter by chromosome chr17 = cnr[cnr.chromosome == 'chr17'] # log2 > 0.5 (~3+ copies): moderate amplification amps = cns[cns['log2'] > 0.5] # log2 < -0.5 (~1 copy): moderate deletion dels = cns[cns['log2'] < -0.5]
Quality Control
cnvkit.py metrics *.cnr -s *.cns cnvkit.py sex *.cnr *.cnn cnvkit.py segmetrics sample.cnr -s sample.cns --ci --pi -o sample.segmetrics.cns cnvkit.py genemetrics sample.cnr -s sample.cns --threshold 0.2 --ci -o sample.genemetrics.tsv
Key Output Files
| Extension | Description |
|---|---|
| Reference or coverage file |
| Copy ratios (log2) per bin |
| Segmented copy ratios |
| Called copy number states |
Key Parameters
| Parameter | Default | Description |
|---|---|---|
| | hybrid, wgs, amplicon |
| | cbs, hmm, hmm-tumor, hmm-germline |
| | Tumor purity (0-1) |
| | Sample ploidy |
Why This Exists
- Without it: Raw coverage depth is noisy due to GC-bias causing false structural variants
- With it: Strict background normalization and segmentation reveals true CNV events
- Why OmicsClaw: Wraps complex tools like CNVkit into reproducible one-shot commands
Workflow
- Calculate: Map out local depth coverage and background noise.
- Execute: Evaluate log-ratio changes over target intervals.
- Assess: Perform segmentation algorithms (e.g., CBS).
- Generate: Output structural segments and variation boundaries.
- Report: Tabulate key amplification/deletion events.
Example Queries
- "Call copy number variants on this bam using CNVkit"
- "Detect amplifications in the tumor matched normal pair"
Output Structure
output_directory/ ├── report.md ├── result.json ├── segments.cns ├── figures/ │ └── scatter_diagram.png ├── tables/ │ └── cnv_calls.csv └── reproducibility/ ├── commands.sh ├── environment.yml └── checksums.sha256
Safety
- Local-first: Strict offline processing without external upload.
- Disclaimer: Requires OmicsClaw reporting structures and disclaimers.
- Audit trail: Hyperparameters and operational flow states are logged fully.
Integration with Orchestrator
Trigger conditions:
- Automatically invoked dynamically based on tool metadata and user intent matching.
Chaining partners:
— Upstream BAM processingalign
— Downstream gene mapping of duplicationsannotation
Version Compatibility
Reference examples tested with: CNVkit 0.9+, GATK 4.5+
Dependencies
Required: CNVkit (cnvkit.py) Optional: GATK (for gCNV), Control-FREEC
Citations
- CNVkit — Talevich et al., PLoS Computational Biology 2016
- GATK gCNV — Broad Institute
- Control-FREEC — Boeva et al., Bioinformatics 2012
Related Skills
— SNV/Indel calling in same samplesvariant-call
— Structural variant detectionsv-detect
— Annotate CNV regionsvariant-annotate