Claude-skill-registry bio-copy-number-gatk-cnv

Call copy number variants using GATK best practices workflow. Supports both somatic (tumor-normal) and germline CNV detection from WGS or WES data. Use when following GATK best practices or integrating CNV calling with other GATK variant pipelines.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gatk-cnv" ~/.claude/skills/majiayu000-claude-skill-registry-bio-copy-number-gatk-cnv && rm -rf "$T"
manifest: skills/data/gatk-cnv/SKILL.md
source content

GATK CNV Workflow

Somatic CNV Workflow Overview

1. PreprocessIntervals → intervals.interval_list
2. CollectReadCounts → sample.counts.hdf5
3. CreateReadCountPanelOfNormals → pon.hdf5
4. DenoiseReadCounts → sample.denoised.tsv
5. CollectAllelicCounts → sample.allelicCounts.tsv
6. ModelSegments → sample.modelFinal.seg
7. CallCopyRatioSegments → sample.called.seg

Step 1: Preprocess Intervals

# For WES/targeted
gatk PreprocessIntervals \
    -R reference.fa \
    -L targets.interval_list \
    --bin-length 0 \
    --interval-merging-rule OVERLAPPING_ONLY \
    -O preprocessed.interval_list

# For WGS
gatk PreprocessIntervals \
    -R reference.fa \
    --bin-length 1000 \
    --padding 0 \
    -O wgs.interval_list

Step 2: Collect Read Counts

# For each sample
gatk CollectReadCounts \
    -R reference.fa \
    -I sample.bam \
    -L preprocessed.interval_list \
    --interval-merging-rule OVERLAPPING_ONLY \
    -O sample.counts.hdf5

Step 3: Create Panel of Normals

# Combine multiple normal samples
gatk CreateReadCountPanelOfNormals \
    -I normal1.counts.hdf5 \
    -I normal2.counts.hdf5 \
    -I normal3.counts.hdf5 \
    --minimum-interval-median-percentile 5.0 \
    -O cnv_pon.hdf5

Step 4: Denoise Read Counts

# Using panel of normals
gatk DenoiseReadCounts \
    -I tumor.counts.hdf5 \
    --count-panel-of-normals cnv_pon.hdf5 \
    --standardized-copy-ratios tumor.standardized.tsv \
    --denoised-copy-ratios tumor.denoised.tsv

Step 5: Collect Allelic Counts

# From known SNP sites (for LOH detection)
gatk CollectAllelicCounts \
    -R reference.fa \
    -I tumor.bam \
    -L common_snps.vcf \
    -O tumor.allelicCounts.tsv

Step 6: Model Segments

# Somatic with matched normal allelic counts
gatk ModelSegments \
    --denoised-copy-ratios tumor.denoised.tsv \
    --allelic-counts tumor.allelicCounts.tsv \
    --normal-allelic-counts normal.allelicCounts.tsv \
    --output-prefix tumor \
    -O results/

# Output files: tumor.cr.seg, tumor.modelFinal.seg, tumor.hets.tsv

Step 7: Call Copy Ratio Segments

gatk CallCopyRatioSegments \
    -I results/tumor.cr.seg \
    -O results/tumor.called.seg

Plotting

# Plot copy ratios and segments
gatk PlotDenoisedCopyRatios \
    --standardized-copy-ratios tumor.standardized.tsv \
    --denoised-copy-ratios tumor.denoised.tsv \
    --sequence-dictionary reference.dict \
    --minimum-contig-length 46709983 \
    --output-prefix tumor \
    -O plots/

# Plot segments with allelic information
gatk PlotModeledSegments \
    --denoised-copy-ratios tumor.denoised.tsv \
    --allelic-counts results/tumor.hets.tsv \
    --segments results/tumor.modelFinal.seg \
    --sequence-dictionary reference.dict \
    --minimum-contig-length 46709983 \
    --output-prefix tumor \
    -O plots/

Germline CNV Workflow

# For germline: use cohort mode
# 1. Collect counts (same as above)

# 2. Determine contig ploidy
gatk DetermineGermlineContigPloidy \
    -I sample1.counts.hdf5 \
    -I sample2.counts.hdf5 \
    --model cohort_ploidy_model \
    --contig-ploidy-priors ploidy_priors.tsv \
    -O ploidy-calls/

# 3. Call germline CNVs
gatk GermlineCNVCaller \
    --run-mode COHORT \
    -I sample1.counts.hdf5 \
    -I sample2.counts.hdf5 \
    --contig-ploidy-calls ploidy-calls/ploidy_calls \
    --annotated-intervals annotated_intervals.tsv \
    --output-prefix cohort \
    -O germline_cnv_calls/

# 4. Post-process calls per sample
gatk PostprocessGermlineCNVCalls \
    --calls-shard-path germline_cnv_calls/cohort-calls \
    --model-shard-path germline_cnv_calls/cohort-model \
    --sample-index 0 \
    --contig-ploidy-calls ploidy-calls/ploidy_calls \
    --sequence-dictionary reference.dict \
    --output-genotyped-intervals sample1.genotyped.tsv \
    --output-denoised-copy-ratios sample1.denoised.tsv \
    -O sample1_segments.vcf

Complete Somatic Pipeline Script

#!/bin/bash
REFERENCE=reference.fa
INTERVALS=targets.interval_list
PON=cnv_pon.hdf5
SNP_SITES=common_snps.vcf
TUMOR=$1
NORMAL=$2
OUTDIR=$3

mkdir -p $OUTDIR

# Collect read counts
gatk CollectReadCounts -R $REFERENCE -I $TUMOR -L $INTERVALS \
    -O $OUTDIR/tumor.counts.hdf5
gatk CollectReadCounts -R $REFERENCE -I $NORMAL -L $INTERVALS \
    -O $OUTDIR/normal.counts.hdf5

# Denoise
gatk DenoiseReadCounts -I $OUTDIR/tumor.counts.hdf5 \
    --count-panel-of-normals $PON \
    --standardized-copy-ratios $OUTDIR/tumor.standardized.tsv \
    --denoised-copy-ratios $OUTDIR/tumor.denoised.tsv

# Allelic counts
gatk CollectAllelicCounts -R $REFERENCE -I $TUMOR -L $SNP_SITES \
    -O $OUTDIR/tumor.allelicCounts.tsv
gatk CollectAllelicCounts -R $REFERENCE -I $NORMAL -L $SNP_SITES \
    -O $OUTDIR/normal.allelicCounts.tsv

# Model and call
gatk ModelSegments \
    --denoised-copy-ratios $OUTDIR/tumor.denoised.tsv \
    --allelic-counts $OUTDIR/tumor.allelicCounts.tsv \
    --normal-allelic-counts $OUTDIR/normal.allelicCounts.tsv \
    --output-prefix tumor -O $OUTDIR/

gatk CallCopyRatioSegments -I $OUTDIR/tumor.cr.seg -O $OUTDIR/tumor.called.seg

Key Output Files

FileDescription
.counts.hdf5Raw read counts per interval
.denoised.tsvDenoised log2 copy ratios
.modelFinal.segSegmented copy ratios with confidence
.called.segFinal called segments with CN state
.hets.tsvHeterozygous SNP allelic counts

Related Skills

  • copy-number/cnvkit-analysis - Alternative CNV caller
  • copy-number/cnv-visualization - Plotting results
  • alignment-files/bam-statistics - Input BAM QC
  • variant-calling/variant-calling - SNP calling for allelic counts