install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/variant-interpretation-acmg/bioSkills/deepvariant" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-deepvariant && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/variant-interpretation-acmg/bioSkills/deepvariant" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-deepvariant && rm -rf "$T"
manifest:
skills/variant-interpretation-acmg/bioSkills/deepvariant/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-variant-calling-deepvariant description: Deep learning-based variant calling with Google DeepVariant. Provides high accuracy for germline SNPs and indels from Illumina, PacBio, and ONT data. Use when calling variants with DeepVariant deep learning caller. tool_type: cli primary_tool: DeepVariant measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
DeepVariant Variant Calling
Installation
Docker (Recommended)
docker pull google/deepvariant:1.6.1 # Or with GPU support docker pull google/deepvariant:1.6.1-gpu
Singularity
singularity pull docker://google/deepvariant:1.6.1
Basic Usage
One-Step Run (run_deepvariant)
docker run -v "${PWD}:/input" -v "${PWD}/output:/output" \ google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/input/reference.fa \ --reads=/input/sample.bam \ --output_vcf=/output/sample.vcf.gz \ --output_gvcf=/output/sample.g.vcf.gz \ --num_shards=16
Model Types
| Model | Data Type | Use Case |
|---|---|---|
| Illumina WGS | Whole genome sequencing |
| Illumina WES | Whole exome/targeted |
| PacBio HiFi | Long-read HiFi |
| ONT R10.4 | Oxford Nanopore |
| Mixed | Hybrid assemblies |
Step-by-Step Workflow
For more control, run each step separately:
Step 1: Make Examples
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/make_examples \ --mode calling \ --ref /data/reference.fa \ --reads /data/sample.bam \ --examples /data/examples.tfrecord.gz \ --gvcf /data/gvcf.tfrecord.gz
Step 2: Call Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/call_variants \ --outfile /data/call_variants.tfrecord.gz \ --examples /data/examples.tfrecord.gz \ --checkpoint /opt/models/wgs/model.ckpt
Step 3: Postprocess Variants
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/postprocess_variants \ --ref /data/reference.fa \ --infile /data/call_variants.tfrecord.gz \ --outfile /data/output.vcf.gz \ --gvcf_outfile /data/output.g.vcf.gz \ --nonvariant_site_tfrecord_path /data/gvcf.tfrecord.gz
GPU Acceleration
docker run --gpus all -v "${PWD}:/data" \ google/deepvariant:1.6.1-gpu \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/data/reference.fa \ --reads=/data/sample.bam \ --output_vcf=/data/output.vcf.gz \ --num_shards=16
PacBio HiFi Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=PACBIO \ --ref=/data/reference.fa \ --reads=/data/hifi_aligned.bam \ --output_vcf=/data/hifi_variants.vcf.gz \ --num_shards=16
ONT Calling
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=ONT_R104 \ --ref=/data/reference.fa \ --reads=/data/ont_aligned.bam \ --output_vcf=/data/ont_variants.vcf.gz \ --num_shards=16
Exome/Targeted Sequencing
docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WES \ --ref=/data/reference.fa \ --reads=/data/exome.bam \ --regions=/data/targets.bed \ --output_vcf=/data/exome_variants.vcf.gz \ --num_shards=8
Joint Calling with GLnexus
For multi-sample cohorts, use gVCFs with GLnexus:
# Generate gVCFs for each sample for bam in *.bam; do sample=$(basename $bam .bam) docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/data/reference.fa \ --reads=/data/$bam \ --output_vcf=/data/${sample}.vcf.gz \ --output_gvcf=/data/${sample}.g.vcf.gz \ --num_shards=16 done # Joint genotyping with GLnexus docker run -v "${PWD}:/data" quay.io/mlin/glnexus:v1.4.1 \ /usr/local/bin/glnexus_cli \ --config DeepVariantWGS \ /data/*.g.vcf.gz \ | bcftools view - -Oz -o cohort.vcf.gz
GLnexus Configurations
| Config | Use Case |
|---|---|
| Illumina WGS |
| Illumina exome |
| Keep all variants |
Output Quality Metrics
# Variant statistics bcftools stats output.vcf.gz > stats.txt # Filter by quality bcftools view -i 'QUAL>20 && FMT/GQ>20' output.vcf.gz -Oz -o filtered.vcf.gz # Ti/Tv ratio (expect ~2.0-2.1 for WGS) bcftools stats output.vcf.gz | grep TSTV
Benchmarking Against Truth Set
# Using hap.py for GIAB benchmarking docker run -v "${PWD}:/data" jmcdani20/hap.py:latest \ /opt/hap.py/bin/hap.py \ /data/HG002_GRCh38_truth.vcf.gz \ /data/deepvariant_output.vcf.gz \ -r /data/reference.fa \ -o /data/benchmark \ --threads 16
Complete Workflow Script
#!/bin/bash set -euo pipefail BAM=$1 REFERENCE=$2 OUTPUT_PREFIX=$3 MODEL_TYPE=${4:-WGS} THREADS=${5:-16} echo "=== DeepVariant: ${MODEL_TYPE} mode ===" docker run -v "${PWD}:/data" google/deepvariant:1.6.1 \ /opt/deepvariant/bin/run_deepvariant \ --model_type=${MODEL_TYPE} \ --ref=/data/${REFERENCE} \ --reads=/data/${BAM} \ --output_vcf=/data/${OUTPUT_PREFIX}.vcf.gz \ --output_gvcf=/data/${OUTPUT_PREFIX}.g.vcf.gz \ --intermediate_results_dir=/data/${OUTPUT_PREFIX}_tmp \ --num_shards=${THREADS} echo "=== Indexing ===" bcftools index -t ${OUTPUT_PREFIX}.vcf.gz bcftools index -t ${OUTPUT_PREFIX}.g.vcf.gz echo "=== Statistics ===" bcftools stats ${OUTPUT_PREFIX}.vcf.gz > ${OUTPUT_PREFIX}_stats.txt echo "=== Complete ===" echo "VCF: ${OUTPUT_PREFIX}.vcf.gz" echo "gVCF: ${OUTPUT_PREFIX}.g.vcf.gz"
Comparison with Other Callers
| Caller | Speed | Accuracy | Best For |
|---|---|---|---|
| DeepVariant | Moderate | Highest | Production, benchmarking |
| GATK HaplotypeCaller | Moderate | High | GATK ecosystem |
| bcftools | Fast | Good | Quick analysis |
| Clair3 | Fast | High | Long reads |
Resource Requirements
| Data Type | Memory | CPU Time (30x WGS) |
|---|---|---|
| WGS | 64 GB | ~4-6 hours |
| WES | 32 GB | ~30 min |
| With GPU | 32 GB | ~1-2 hours (WGS) |
Related Skills
- variant-calling/gatk-variant-calling - GATK alternative
- variant-calling/variant-calling - bcftools calling
- long-read-sequencing/clair3-variants - Long-read alternative
- variant-calling/filtering-best-practices - Post-calling filtering