install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-phasing-imputation-genotype-imputation" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-phasing-imputation-genotype-impu && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-phasing-imputation-genotype-imputation" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-phasing-imputation-genotype-impu && rm -rf "$T"
manifest:
skills/bio-phasing-imputation-genotype-imputation/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-phasing-imputation-genotype-imputation description: Impute missing genotypes using reference panels with Beagle or Minimac4. Use when increasing variant density for GWAS, harmonizing data across genotyping platforms, or inferring variants not directly typed in array data. tool_type: cli primary_tool: beagle measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Genotype Imputation
Beagle Imputation
# Basic imputation java -jar beagle.jar \ gt=study.vcf.gz \ ref=reference_panel.vcf.gz \ map=genetic_map.txt \ out=imputed # Output: imputed.vcf.gz with imputed genotypes
Beagle with Options
java -Xmx32g -jar beagle.jar \ gt=study.vcf.gz \ ref=reference_panel.vcf.gz \ map=genetic_map.txt \ out=imputed \ nthreads=8 \ gp=true \ # Output genotype probabilities ap=true \ # Output allele probabilities impute=true \ # Perform imputation (default) ne=20000 # Effective population size
Impute Per Chromosome
for chr in {1..22}; do java -Xmx32g -jar beagle.jar \ gt=study.chr${chr}.vcf.gz \ ref=ref.chr${chr}.vcf.gz \ map=genetic_maps/plink.chr${chr}.GRCh38.map \ out=imputed.chr${chr} \ gp=true \ nthreads=8 done # Concatenate bcftools concat imputed.chr*.vcf.gz -Oz -o imputed.all.vcf.gz bcftools index imputed.all.vcf.gz
IMPUTE5 (Alternative)
# Newer IMPUTE software impute5 \ --h reference.bcf \ --m genetic_map.txt \ --g study.vcf.gz \ --r chr22 \ --o imputed.chr22.vcf.gz \ --threads 8
Minimac4 (Michigan Imputation Server)
# Often used via web server, but can run locally minimac4 \ --refHaps reference.m3vcf.gz \ --haps study.vcf.gz \ --prefix imputed \ --format GT,DS,GP \ --cpus 8
Input Preparation
# 1. Align to reference (strand, allele order) bcftools +fixref study.vcf.gz -Oz -o fixed.vcf.gz -- \ -f reference.fa -m flip # 2. Filter to sites in reference bcftools isec -n=2 -w1 fixed.vcf.gz reference_sites.vcf.gz \ -Oz -o study_overlap.vcf.gz # 3. Phase first (if not already phased) java -jar beagle.jar gt=study_overlap.vcf.gz out=phased # 4. Then impute java -jar beagle.jar gt=phased.vcf.gz ref=reference.vcf.gz out=imputed
Extract Imputation Quality
# INFO/DR2 or INFO/R2 contains imputation quality bcftools query -f '%CHROM\t%POS\t%ID\t%INFO/DR2\n' imputed.vcf.gz > info_scores.txt # Filter by quality bcftools view -i 'INFO/DR2 > 0.3' imputed.vcf.gz -Oz -o imputed_filtered.vcf.gz
Output Formats
| Format | Field | Description |
|---|---|---|
| GT | 0|0, 0|1, 1|1 | Hard-called genotype |
| DS | 0.0-2.0 | Dosage (expected ALT allele count) |
| GP | 0.0-1.0,0.0-1.0,0.0-1.0 | Genotype probabilities (AA,AB,BB) |
| DR2/R2 | 0.0-1.0 | Imputation quality score |
Using Dosages for GWAS
import pandas as pd # Extract dosages # bcftools query -f '%CHROM\t%POS\t%ID[\t%DS]\n' imputed.vcf.gz > dosages.txt dosages = pd.read_csv('dosages.txt', sep='\t') # Dosage-based association (treats uncertainty) # Use --dosage in PLINK2 or similar
# PLINK2 with dosages plink2 --vcf imputed.vcf.gz dosage=DS \ --glm \ --pheno phenotypes.txt \ --out gwas_results
Quality Thresholds
| Analysis | Minimum INFO/R2 |
|---|---|
| GWAS discovery | 0.3 |
| GWAS fine-mapping | 0.8 |
| Meta-analysis | 0.5 |
| Polygenic scores | 0.9 |
Key Parameters
| Parameter | Beagle | Description |
|---|---|---|
| gt | input VCF | Study genotypes |
| ref | reference VCF | Reference panel |
| map | genetic map | Recombination map |
| gp | true/false | Output genotype probs |
| ne | 20000 | Effective population size |
| nthreads | N | CPU threads |
| window | 40 | Window size (cM) |
Imputation Servers
For large-scale imputation, consider web-based servers:
- Michigan Imputation Server: imputationserver.sph.umich.edu
- TOPMed Imputation Server: imputation.biodatacatalyst.nhlbi.nih.gov
- Sanger Imputation Server: imputation.sanger.ac.uk
# Prepare input for server # Most require VCF.GZ per chromosome for chr in {1..22}; do bcftools view -r chr${chr} study.vcf.gz -Oz -o study.chr${chr}.vcf.gz done
Related Skills
- phasing-imputation/haplotype-phasing - Pre-phasing step
- phasing-imputation/reference-panels - Reference panel setup
- phasing-imputation/imputation-qc - Quality control
- population-genetics/association-testing - GWAS with imputed data