Claude-skill-registry bio-variant-calling-deepvariant
Deep learning-based variant calling with Google DeepVariant. Provides high accuracy for germline SNPs and indels from Illumina, PacBio, and ONT data. Use when calling variants with DeepVariant deep learning caller.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/deepvariant" ~/.claude/skills/majiayu000-claude-skill-registry-bio-variant-calling-deepvariant && rm -rf "$T"
manifest:
skills/data/deepvariant/SKILL.mdsource content
DeepVariant Variant Calling
Installation
Docker (Recommended)
docker pull google/deepvariant:1.6.1 # Or with GPU support docker pull google/deepvariant:1.6.1-gpu
Singularity
singularity pull docker://google/deepvariant:1.6.1
Basic Usage
One-Step Run (run_deepvariant)
docker run -v "${PWD}:/input" -v "${PWD}/output:/output" \ google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/input/reference.fa \ --reads=/input/sample.bam \ --output_vcf=/output/sample.vcf.gz \ --output_gvcf=/output/sample.g.vcf.gz \ --num_shards=16
Model Types
| Model | Data Type | Use Case |
|---|---|---|
| Illumina WGS | Whole genome sequencing |
| Illumina WES | Whole exome/targeted |
| PacBio HiFi | Long-read HiFi |
| ONT R10.4 | Oxford Nanopore |
| Mixed | Hybrid assemblies |
Step-by-Step Workflow
For more control, run each step separately:
Step 1: Make Examples
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/make_examples \ --mode calling \ --ref /data/reference.fa \ --reads /data/sample.bam \ --examples /data/examples.tfrecord.gz \ --gvcf /data/gvcf.tfrecord.gz
Step 2: Call Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/call_variants \ --outfile /data/call_variants.tfrecord.gz \ --examples /data/examples.tfrecord.gz \ --checkpoint /opt/models/wgs/model.ckpt
Step 3: Postprocess Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/postprocess_variants \ --ref /data/reference.fa \ --infile /data/call_variants.tfrecord.gz \ --outfile /data/output.vcf.gz \ --gvcf_outfile /data/output.g.vcf.gz \ --nonvariant_site_tfrecord_path /data/gvcf.tfrecord.gz
GPU Acceleration
docker run --gpus all -v "${PWD}:/data" \ google/deepvariant:1.6.1-gpu \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/data/reference.fa \ --reads=/data/sample.bam \ --output_vcf=/data/output.vcf.gz \ --num_shards=16
PacBio HiFi Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=PACBIO \ --ref=/data/reference.fa \ --reads=/data/hifi_aligned.bam \ --output_vcf=/data/hifi_variants.vcf.gz \ --num_shards=16
ONT Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=ONT_R104 \ --ref=/data/reference.fa \ --reads=/data/ont_aligned.bam \ --output_vcf=/data/ont_variants.vcf.gz \ --num_shards=16
Exome/Targeted Sequencing
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WES \ --ref=/data/reference.fa \ --reads=/data/exome.bam \ --regions=/data/targets.bed \ --output_vcf=/data/exome_variants.vcf.gz \ --num_shards=8
Joint Calling with GLnexus
For multi-sample cohorts, use gVCFs with GLnexus:
# Generate gVCFs for each sample for bam in *.bam; do sample=$(basename $bam .bam) docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/data/reference.fa \ --reads=/data/$bam \ --output_vcf=/data/${sample}.vcf.gz \ --output_gvcf=/data/${sample}.g.vcf.gz \ --num_shards=16 done # Joint genotyping with GLnexus docker run -v "${PWD}:/data" quay.io/mlin/glnexus:v1.4.1 \ /usr/local/bin/glnexus_cli \ --config DeepVariantWGS \ /data/*.g.vcf.gz \ | bcftools view - -Oz -o cohort.vcf.gz
GLnexus Configurations
| Config | Use Case |
|---|---|
| Illumina WGS |
| Illumina exome |
| Keep all variants |
Output Quality Metrics
# Variant statistics bcftools stats output.vcf.gz > stats.txt # Filter by quality bcftools view -i 'QUAL>20 && FMT/GQ>20' output.vcf.gz -Oz -o filtered.vcf.gz # Ti/Tv ratio (expect ~2.0-2.1 for WGS) bcftools stats output.vcf.gz | grep TSTV
Benchmarking Against Truth Set
# Using hap.py for GIAB benchmarking docker run -v "${PWD}:/data" jmcdani20/hap.py:latest \ /opt/hap.py/bin/hap.py \ /data/HG002_GRCh38_truth.vcf.gz \ /data/deepvariant_output.vcf.gz \ -r /data/reference.fa \ -o /data/benchmark \ --threads 16
Complete Workflow Script
#!/bin/bash set -euo pipefail BAM=$1 REFERENCE=$2 OUTPUT_PREFIX=$3 MODEL_TYPE=${4:-WGS} THREADS=${5:-16} echo "=== DeepVariant: ${MODEL_TYPE} mode ===" docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=${MODEL_TYPE} \ --ref=/data/${REFERENCE} \ --reads=/data/${BAM} \ --output_vcf=/data/${OUTPUT_PREFIX}.vcf.gz \ --output_gvcf=/data/${OUTPUT_PREFIX}.g.vcf.gz \ --intermediate_results_dir=/data/${OUTPUT_PREFIX}_tmp \ --num_shards=${THREADS} echo "=== Indexing ===" bcftools index -t ${OUTPUT_PREFIX}.vcf.gz bcftools index -t ${OUTPUT_PREFIX}.g.vcf.gz echo "=== Statistics ===" bcftools stats ${OUTPUT_PREFIX}.vcf.gz > ${OUTPUT_PREFIX}_stats.txt echo "=== Complete ===" echo "VCF: ${OUTPUT_PREFIX}.vcf.gz" echo "gVCF: ${OUTPUT_PREFIX}.g.vcf.gz"
Comparison with Other Callers
| Caller | Speed | Accuracy | Best For |
|---|---|---|---|
| DeepVariant | Moderate | Highest | Production, benchmarking |
| GATK HaplotypeCaller | Moderate | High | GATK ecosystem |
| bcftools | Fast | Good | Quick analysis |
| Clair3 | Fast | High | Long reads |
Resource Requirements
| Data Type | Memory | CPU Time (30x WGS) |
|---|---|---|
| WGS | 64 GB | ~4-6 hours |
| WES | 32 GB | ~30 min |
| With GPU | 32 GB | ~1-2 hours (WGS) |
Related Skills
- variant-calling/gatk-variant-calling - GATK alternative
- variant-calling/variant-calling - bcftools calling
- long-read-sequencing/clair3-variants - Long-read alternative
- variant-calling/filtering-best-practices - Post-calling filtering