OpenClaw-Medical-Skills bio-phasing-imputation-haplotype-phasing

<!--

install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-phasing-imputation-haplotype-phasing" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-phasing-imputation-haplotype-pha && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-phasing-imputation-haplotype-phasing" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-phasing-imputation-haplotype-pha && rm -rf "$T"
manifest: skills/bio-phasing-imputation-haplotype-phasing/SKILL.md
safety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • downloads files (wget)
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA -->

name: bio-phasing-imputation-haplotype-phasing description: Phase genotypes into haplotypes using Beagle or SHAPEIT. Resolves which alleles are inherited together on each chromosome. Use when preparing VCF files for imputation, HLA typing, or population genetic analyses requiring phased haplotypes. tool_type: cli primary_tool: beagle measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

  • read_file
  • run_shell_command

Haplotype Phasing

Beagle 5.4 Phasing (Recommended)

# Download Beagle 5.4
wget https://faculty.washington.edu/browning/beagle/beagle.22Jul22.46e.jar

# Basic phasing
java -jar beagle.22Jul22.46e.jar \
    gt=input.vcf.gz \
    out=phased

# Output: phased.vcf.gz (phased genotypes)

# With genetic map (improves accuracy)
java -jar beagle.22Jul22.46e.jar \
    gt=input.vcf.gz \
    map=plink.chr22.GRCh38.map \
    out=phased

Beagle Options

java -jar beagle.22Jul22.46e.jar \
    gt=input.vcf.gz \
    out=phased \
    map=genetic_map.txt \
    nthreads=8 \
    window=40 \
    overlap=4 \
    ne=20000 \              # Effective population size
    seed=12345              # For reproducibility

Phase Per Chromosome

# Process each chromosome separately
for chr in {1..22}; do
    java -Xmx16g -jar beagle.jar \
        gt=input.chr${chr}.vcf.gz \
        map=genetic_maps/plink.chr${chr}.GRCh38.map \
        out=phased.chr${chr} \
        nthreads=8
done

# Concatenate chromosomes
bcftools concat phased.chr*.vcf.gz -Oz -o phased.all.vcf.gz
bcftools index phased.all.vcf.gz

SHAPEIT5 Phasing (for Large Datasets)

# Phase common variants first
shapeit5_phase_common \
    --input input.vcf.gz \
    --map genetic_map.txt \
    --output phased_common.bcf \
    --thread 8 \
    --log phased.log

# Then phase rare variants
shapeit5_phase_rare \
    --input input.vcf.gz \
    --scaffold phased_common.bcf \
    --map genetic_map.txt \
    --output phased.bcf \
    --thread 8

SHAPEIT5 with Reference Panel

# Improves phasing using reference haplotypes
shapeit5_phase_common \
    --input input.vcf.gz \
    --reference reference_panel.bcf \
    --map genetic_map.txt \
    --output phased.bcf \
    --thread 8

Beagle with Reference Panel

# Use reference panel for better phasing
java -jar beagle.22Jul22.46e.jar \
    gt=input.vcf.gz \
    ref=reference.vcf.gz \
    map=genetic_map.txt \
    out=phased \
    nthreads=8

Input Preparation

# Filter variants before phasing
bcftools view -m2 -M2 -v snps input.vcf.gz -Oz -o biallelic_snps.vcf.gz

# Remove missing genotypes (optional)
bcftools view -g ^miss biallelic_snps.vcf.gz -Oz -o no_missing.vcf.gz

# Normalize (important!)
bcftools norm -f reference.fa -Oz -o normalized.vcf.gz input.vcf.gz

Check Phasing Results

# View phased genotypes (| instead of /)
bcftools query -f '%CHROM\t%POS\t[%GT\t]\n' phased.vcf.gz | head

# Unphased: 0/1
# Phased: 0|1 or 1|0

# Count phased vs unphased
bcftools query -f '[%GT\n]' phased.vcf.gz | grep -c '|'

Genetic Maps

# Download genetic maps (GRCh38)
wget https://faculty.washington.edu/browning/beagle/genetic_maps/plink.GRCh38.map.zip
unzip plink.GRCh38.map.zip

# Format: chromosome position rate(cM/Mb) genetic_position(cM)
# chr1 55550 2.981822 0.000000

Key Parameters

ParameterBeagleSHAPEIT5Description
Threadsnthreads--threadCPU threads
Windowwindow--windowAnalysis window size
Eff. pop sizene--effective-sizeFor LD modeling
Seedseed--seedRandom seed

Memory Requirements

Dataset SizeBeagle MemorySHAPEIT5 Memory
1,000 samples8 GB4 GB
10,000 samples32 GB16 GB
100,000 samples64+ GB32 GB

Phasing Accuracy Metrics

  • Switch error rate: Rate of phase switches vs truth
  • Mismatch error rate: Overall haplotype differences
  • Measure using trio data or known haplotypes

Related Skills

  • phasing-imputation/genotype-imputation - Impute after phasing
  • phasing-imputation/reference-panels - Get reference data
  • variant-calling/filtering-best-practices - Prepare input VCF
  • population-genetics/linkage-disequilibrium - LD analysis
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->