BioSkills bio-workflows-cnv-pipeline

End-to-end copy number variant detection workflow from BAM files. Covers CNVkit analysis for exome/targeted sequencing with visualization and annotation. Use when detecting copy number alterations from sequencing data.

install
source · Clone the upstream repo
git clone https://github.com/GPTomics/bioSkills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/workflows/cnv-pipeline" ~/.claude/skills/gptomics-bioskills-bio-workflows-cnv-pipeline && rm -rf "$T"
manifest: workflows/cnv-pipeline/SKILL.md
source content

Version Compatibility

Reference examples tested with: CNVkit 0.9+, GATK 4.5+

Before using code patterns, verify installed versions match. If versions differ:

  • CLI:
    <tool> --version
    then
    <tool> --help
    to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

CNV Pipeline

"Detect copy number variants from my sequencing data" → Orchestrate CNVkit coverage analysis, segmentation, calling, visualization, and annotation for exome or targeted sequencing panels.

Complete workflow for detecting copy number variants from exome or targeted sequencing data.

Workflow Overview

BAM files (tumor/normal or germline)
    |
    v
[1. Target Preparation] --> Create/access target BED
    |
    v
[2. Coverage Calculation] --> Read depth per target
    |
    v
[3. Reference Creation] --> Pool of normals
    |
    v
[4. CNV Calling] --------> Log2 ratios, segmentation
    |
    v
[5. Visualization] ------> Scatter plots, heatmaps
    |
    v
[6. Annotation] ---------> Gene-level CNVs
    |
    v
CNV calls with gene annotations

Primary Path: CNVkit

Step 1: Prepare Target Regions

# If using exome capture kit BED
cnvkit.py target capture_targets.bed \
    --annotate refFlat.txt \
    --split \
    -o targets.bed

# Access regions (off-target for WGS-like sensitivity)
cnvkit.py access genome.fa \
    -o access.bed

cnvkit.py antitarget targets.bed \
    --access access.bed \
    -o antitargets.bed

Step 2: Calculate Coverage

# For each sample
for bam in *.bam; do
    sample=$(basename $bam .bam)

    # Target coverage
    cnvkit.py coverage $bam targets.bed \
        -o coverage/${sample}.targetcoverage.cnn

    # Antitarget coverage
    cnvkit.py coverage $bam antitargets.bed \
        -o coverage/${sample}.antitargetcoverage.cnn
done

Step 3: Create Reference (Pool of Normals)

# From normal samples
cnvkit.py reference \
    coverage/normal*.targetcoverage.cnn \
    coverage/normal*.antitargetcoverage.cnn \
    --fasta genome.fa \
    -o reference.cnn

# Or flat reference (no normals available)
cnvkit.py reference \
    --fasta genome.fa \
    --targets targets.bed \
    --antitargets antitargets.bed \
    -o flat_reference.cnn

Step 4: Call CNVs

for bam in tumor*.bam; do
    sample=$(basename $bam .bam)

    # Fix and segment
    cnvkit.py fix \
        coverage/${sample}.targetcoverage.cnn \
        coverage/${sample}.antitargetcoverage.cnn \
        reference.cnn \
        -o cnv/${sample}.cnr

    # Segment
    cnvkit.py segment cnv/${sample}.cnr \
        -o cnv/${sample}.cns

    # Call integer copy numbers
    cnvkit.py call cnv/${sample}.cns \
        -o cnv/${sample}.call.cns
done

Step 5: Visualization

# Scatter plot for single sample
cnvkit.py scatter cnv/tumor1.cnr \
    -s cnv/tumor1.cns \
    -o plots/tumor1_scatter.pdf

# Chromosome-specific
cnvkit.py scatter cnv/tumor1.cnr \
    -s cnv/tumor1.cns \
    -c chr17 \
    -o plots/tumor1_chr17.pdf

# Diagram (chromosome ideogram)
cnvkit.py diagram cnv/tumor1.cnr \
    -s cnv/tumor1.cns \
    -o plots/tumor1_diagram.pdf

# Heatmap for multiple samples
cnvkit.py heatmap cnv/*.cns \
    -o plots/cohort_heatmap.pdf

Step 6: Export and Annotation

# Export to various formats
cnvkit.py export seg cnv/*.cns -o cnv/cohort.seg
cnvkit.py export vcf cnv/tumor1.call.cns -o cnv/tumor1.vcf

# Gene-level summary
cnvkit.py genemetrics cnv/tumor1.cnr \
    -s cnv/tumor1.cns \
    --threshold 0.2 \
    -o cnv/tumor1_genes.tsv

# Filter for significant CNVs
awk '$6 < -0.4 || $6 > 0.3' cnv/tumor1_genes.tsv > cnv/tumor1_significant_genes.tsv

Batch Processing Script

#!/bin/bash
set -e

TARGETS="targets.bed"
REFERENCE="reference.cnn"
OUTDIR="cnv_results"

mkdir -p ${OUTDIR}/{coverage,cnv,plots}

# Process all tumor samples
for bam in tumor*.bam; do
    sample=$(basename $bam .bam)
    echo "Processing ${sample}..."

    # Coverage
    cnvkit.py coverage $bam ${TARGETS} \
        -o ${OUTDIR}/coverage/${sample}.targetcoverage.cnn

    # Fix
    cnvkit.py fix \
        ${OUTDIR}/coverage/${sample}.targetcoverage.cnn \
        ${OUTDIR}/coverage/${sample}.antitargetcoverage.cnn \
        ${REFERENCE} \
        -o ${OUTDIR}/cnv/${sample}.cnr

    # Segment
    cnvkit.py segment ${OUTDIR}/cnv/${sample}.cnr \
        -o ${OUTDIR}/cnv/${sample}.cns

    # Call
    cnvkit.py call ${OUTDIR}/cnv/${sample}.cns \
        -o ${OUTDIR}/cnv/${sample}.call.cns

    # Plot
    cnvkit.py scatter ${OUTDIR}/cnv/${sample}.cnr \
        -s ${OUTDIR}/cnv/${sample}.cns \
        -o ${OUTDIR}/plots/${sample}.pdf
done

# Cohort heatmap
cnvkit.py heatmap ${OUTDIR}/cnv/*.cns -o ${OUTDIR}/plots/heatmap.pdf

Germline CNV Calling

# For germline analysis (no tumor-normal)
cnvkit.py batch sample*.bam \
    --normal normal*.bam \
    --targets targets.bed \
    --fasta genome.fa \
    --output-reference reference.cnn \
    --output-dir cnv_output \
    --scatter --diagram

# Or use flat reference
cnvkit.py batch sample.bam \
    --method hybrid \
    --targets targets.bed \
    --fasta genome.fa \
    --output-dir cnv_output

Parameter Recommendations

StepParameterValue
target--splitYes (for WES)
segment--methodcbs (default)
call--ploidy2 (adjust if known)
call--purityEstimate if tumor
genemetrics--threshold0.2

Troubleshooting

IssueLikely CauseSolution
Noisy signalLow coverageIncrease sequencing depth
No CNVsFlat reference, normal sampleCheck reference creation
Many small CNVsOver-segmentationIncrease segment min size
Batch effectsDifferent capture kitsMatch samples to correct reference

Complete Pipeline Script

#!/bin/bash
set -e

GENOME="genome.fa"
TARGETS="capture_targets.bed"
REFFLAT="refFlat.txt"
NORMAL_BAMS="normal*.bam"
TUMOR_BAMS="tumor*.bam"
OUTDIR="cnv_results"

mkdir -p ${OUTDIR}/{coverage,cnv,plots,annotation}

# Step 1: Prepare targets
cnvkit.py target ${TARGETS} --annotate ${REFFLAT} --split -o ${OUTDIR}/targets.bed
cnvkit.py access ${GENOME} -o ${OUTDIR}/access.bed
cnvkit.py antitarget ${OUTDIR}/targets.bed --access ${OUTDIR}/access.bed -o ${OUTDIR}/antitargets.bed

# Step 2: Coverage (normals)
for bam in ${NORMAL_BAMS}; do
    sample=$(basename $bam .bam)
    cnvkit.py coverage $bam ${OUTDIR}/targets.bed -o ${OUTDIR}/coverage/${sample}.targetcoverage.cnn
    cnvkit.py coverage $bam ${OUTDIR}/antitargets.bed -o ${OUTDIR}/coverage/${sample}.antitargetcoverage.cnn
done

# Step 3: Reference
cnvkit.py reference ${OUTDIR}/coverage/normal*.cnn --fasta ${GENOME} -o ${OUTDIR}/reference.cnn

# Step 4-5: Process tumors
for bam in ${TUMOR_BAMS}; do
    sample=$(basename $bam .bam)
    cnvkit.py coverage $bam ${OUTDIR}/targets.bed -o ${OUTDIR}/coverage/${sample}.targetcoverage.cnn
    cnvkit.py coverage $bam ${OUTDIR}/antitargets.bed -o ${OUTDIR}/coverage/${sample}.antitargetcoverage.cnn
    cnvkit.py fix ${OUTDIR}/coverage/${sample}.targetcoverage.cnn \
        ${OUTDIR}/coverage/${sample}.antitargetcoverage.cnn \
        ${OUTDIR}/reference.cnn -o ${OUTDIR}/cnv/${sample}.cnr
    cnvkit.py segment ${OUTDIR}/cnv/${sample}.cnr -o ${OUTDIR}/cnv/${sample}.cns
    cnvkit.py call ${OUTDIR}/cnv/${sample}.cns -o ${OUTDIR}/cnv/${sample}.call.cns
    cnvkit.py scatter ${OUTDIR}/cnv/${sample}.cnr -s ${OUTDIR}/cnv/${sample}.cns -o ${OUTDIR}/plots/${sample}.pdf
    cnvkit.py genemetrics ${OUTDIR}/cnv/${sample}.cnr -s ${OUTDIR}/cnv/${sample}.cns -o ${OUTDIR}/annotation/${sample}_genes.tsv
done

echo "Pipeline complete. Results in ${OUTDIR}/"

Related Skills

  • copy-number/cnvkit-analysis - CNVkit details
  • copy-number/cnv-visualization - Plotting options
  • copy-number/cnv-annotation - Gene annotations
  • copy-number/gatk-cnv - GATK alternative