install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/variant-interpretation-acmg/bioSkills/vcf-manipulation" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-vcf-manipulation && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/variant-interpretation-acmg/bioSkills/vcf-manipulation" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-vcf-manipulation && rm -rf "$T"
manifest:
skills/variant-interpretation-acmg/bioSkills/vcf-manipulation/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-vcf-manipulation description: Merge, concatenate, sort, intersect, and subset VCF files using bcftools. Use when combining variant files, comparing call sets, or restructuring VCF data. tool_type: cli primary_tool: bcftools measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
VCF Manipulation
Merge, concat, sort, and compare VCF files using bcftools.
Operations Overview
| Operation | Command | Use Case |
|---|---|---|
| Merge | | Combine samples from multiple VCFs |
| Concat | | Combine regions from multiple VCFs |
| Sort | | Sort unsorted VCF |
| Intersect | | Compare/intersect call sets |
| Subset | | Extract samples or regions |
bcftools merge
Combine multiple VCF files with different samples at the same positions.
Basic Merge
bcftools merge sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz
Merge Multiple Files
bcftools merge *.vcf.gz -Oz -o all_samples.vcf.gz
Merge from File List
# files.txt: one VCF path per line bcftools merge -l files.txt -Oz -o merged.vcf.gz
Handle Missing Genotypes
# Output missing genotypes as ./. (default) bcftools merge sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz # Output missing as reference (0/0) bcftools merge --missing-to-ref sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz
Force Sample Names
When sample names conflict:
bcftools merge --force-samples sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz
Merge Specific Regions
bcftools merge -r chr1:1000000-2000000 sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz
bcftools concat
Combine VCF files with same samples from different regions.
Concatenate Chromosomes
bcftools concat chr1.vcf.gz chr2.vcf.gz chr3.vcf.gz -Oz -o genome.vcf.gz
Concatenate All Chromosomes
bcftools concat chr*.vcf.gz -Oz -o genome.vcf.gz
From File List
# files.txt: one VCF path per line (in order) bcftools concat -f files.txt -Oz -o concatenated.vcf.gz
Allow Overlapping Regions
bcftools concat -a chr1_part1.vcf.gz chr1_part2.vcf.gz -Oz -o chr1.vcf.gz
Remove Duplicates
bcftools concat -a -d all file1.vcf.gz file2.vcf.gz -Oz -o merged.vcf.gz
Options for
-d:
- Remove duplicate SNPssnps
- Remove duplicate indelsindels
- Remove duplicate SNPs and indelsboth
- Remove all duplicatesall
- Remove exact duplicates onlyexact
bcftools sort
Sort VCF by chromosome and position.
Basic Sort
bcftools sort input.vcf -Oz -o sorted.vcf.gz
With Temporary Directory
For large files:
bcftools sort -T /tmp input.vcf.gz -Oz -o sorted.vcf.gz
Memory Limit
bcftools sort -m 4G input.vcf.gz -Oz -o sorted.vcf.gz
bcftools isec
Intersect and compare VCF files.
Find Shared Variants
bcftools isec -p output_dir sample1.vcf.gz sample2.vcf.gz
Creates:
- Private to sample10000.vcf
- Private to sample20001.vcf
- Shared (sample1 records)0002.vcf
- Shared (sample2 records)0003.vcf
Output Compressed
bcftools isec -p output_dir -Oz sample1.vcf.gz sample2.vcf.gz
Intersection Only
bcftools isec -p output_dir -n=2 sample1.vcf.gz sample2.vcf.gz # Only outputs variants present in exactly 2 files
Comparison Options
| Flag | Description |
|---|---|
| Present in exactly 2 files |
| Present in 2 or more files |
| Present in fewer than 2 files |
| Boolean: file1 AND file2 |
| Boolean: file1 AND NOT file2 |
Two-File Intersection
# Variants in both files bcftools isec -n=2 -w1 sample1.vcf.gz sample2.vcf.gz -Oz -o shared.vcf.gz # Variants only in sample1 bcftools isec -n~10 -w1 sample1.vcf.gz sample2.vcf.gz -Oz -o only_sample1.vcf.gz
Complement Mode
# Variants in file1 not in file2 bcftools isec -C sample1.vcf.gz sample2.vcf.gz -Oz -o unique.vcf.gz
Subsetting VCF Files
Extract Samples
bcftools view -s sample1,sample2 input.vcf.gz -Oz -o subset.vcf.gz
Exclude Samples
bcftools view -s ^sample3 input.vcf.gz -Oz -o without_sample3.vcf.gz
From Sample List File
# samples.txt: one sample name per line bcftools view -S samples.txt input.vcf.gz -Oz -o subset.vcf.gz
Extract Region
bcftools view -r chr1:1000000-2000000 input.vcf.gz -Oz -o region.vcf.gz
Extract Multiple Regions
bcftools view -R regions.bed input.vcf.gz -Oz -o targets.vcf.gz
Renaming Samples
Single Sample
echo "old_name new_name" > rename.txt bcftools reheader -s rename.txt input.vcf.gz -o renamed.vcf.gz
Multiple Samples
# rename.txt format: old_name new_name cat > rename.txt << EOF sample1 patient_001 sample2 patient_002 sample3 patient_003 EOF bcftools reheader -s rename.txt input.vcf.gz -o renamed.vcf.gz
Splitting VCF Files
Split by Sample
for sample in $(bcftools query -l input.vcf.gz); do bcftools view -s "$sample" input.vcf.gz -Oz -o "${sample}.vcf.gz" done
Split by Chromosome
for chr in $(bcftools view -h input.vcf.gz | grep "^##contig" | sed 's/.*ID=\([^,]*\).*/\1/'); do bcftools view -r "$chr" input.vcf.gz -Oz -o "${chr}.vcf.gz" done
Split Multiallelic Sites
bcftools norm -m-any input.vcf.gz -Oz -o split.vcf.gz
Common Workflows
Merge Cohort VCFs
# Create file list ls *.vcf.gz > files.txt # Merge all samples bcftools merge -l files.txt -Oz -o cohort.vcf.gz bcftools index cohort.vcf.gz
Combine Chromosome VCFs
# After parallel variant calling by chromosome bcftools concat chr{1..22}.vcf.gz chrX.vcf.gz chrY.vcf.gz -Oz -o genome.vcf.gz bcftools index genome.vcf.gz
Compare Two Callers
# Find variants called by both GATK and bcftools bcftools isec -p comparison gatk.vcf.gz bcftools.vcf.gz # Count results wc -l comparison/*.vcf
Extract Passing Variants
bcftools view -f PASS input.vcf.gz -Oz -o pass_only.vcf.gz bcftools index pass_only.vcf.gz
cyvcf2 Python Operations
Note: True VCF merging (combining samples at matching positions) is complex. Use
bcftools merge for production work. cyvcf2 is better for filtering/querying.
Concatenate Records (Not True Merge)
from cyvcf2 import VCF, Writer # WARNING: This concatenates records, not a true merge # For actual merging of samples, use bcftools merge vcf1 = VCF('file1.vcf.gz') writer = Writer('combined.vcf', vcf1) for variant in vcf1: writer.write_record(variant) writer.close() vcf1.close()
Find Shared Positions
from cyvcf2 import VCF # Load positions from first VCF vcf1_positions = set() for variant in VCF('sample1.vcf.gz'): vcf1_positions.add((variant.CHROM, variant.POS)) # Check second VCF shared = 0 unique = 0 for variant in VCF('sample2.vcf.gz'): if (variant.CHROM, variant.POS) in vcf1_positions: shared += 1 else: unique += 1 print(f'Shared: {shared}') print(f'Unique to sample2: {unique}')
Quick Reference
| Task | Command |
|---|---|
| Merge samples | |
| Concat regions | |
| Sort VCF | |
| Intersect | |
| Extract samples | |
| Rename samples | |
Common Errors
| Error | Cause | Solution |
|---|---|---|
| merge vs concat confusion | Use merge for samples, concat for regions |
| Unsorted input to concat | Sort first or use flag |
| Duplicate sample names | Use |
| Missing index for merge/isec | Run first |
Related Skills
- vcf-basics - View and query VCF files
- filtering-best-practices - Filter variants before manipulation
- variant-normalization - Normalize before comparing
- vcf-statistics - Compare statistics after manipulation