git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/popeve-variant-predictor-agent" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-popeve-variant-predictor-agent && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/popeve-variant-predictor-agent" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-popeve-variant-predictor-agent && rm -rf "$T"
skills/popeve-variant-predictor-agent/SKILL.mdname: 'popeve-variant-predictor-agent' description: 'AI-powered genetic variant pathogenicity prediction using PopEVE deep learning model for population-aware disease variant identification and rare disease diagnosis.' measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
PopEVE Variant Predictor Agent
The PopEVE Variant Predictor Agent leverages the PopEVE deep learning model from Harvard Medical School to predict pathogenicity of genetic variants. PopEVE analyzes evolutionary conservation, protein structure, and population frequency to identify disease-causing variants, having identified over 100 previously unrecognized variants responsible for undiagnosed rare genetic diseases.
When to Use This Skill
- When predicting pathogenicity of missense variants genome-wide.
- For rare disease diagnosis with variants of uncertain significance (VUS).
- To prioritize candidate variants in exome/genome sequencing.
- When interpreting novel variants not in ClinVar or literature.
- For population-stratified variant interpretation.
Core Capabilities
-
Pathogenicity Prediction: Score any missense variant for disease likelihood.
-
VUS Resolution: Reclassify variants of uncertain significance.
-
Rare Disease Diagnosis: Identify causal variants in undiagnosed patients.
-
Population-Aware Scoring: Account for ancestry-specific variant frequencies.
-
Protein Context Analysis: Integrate structural and functional domains.
-
Batch Variant Scoring: Process thousands of variants efficiently.
Model Architecture
| Component | Description | Data Source |
|---|---|---|
| Evolutionary Module | Deep sequence alignment | UniRef90, 250M seqs |
| Structural Module | AlphaFold2 structures | 200M+ structures |
| Population Module | gnomAD frequencies | 800K+ individuals |
| Clinical Module | ClinVar training | 100K+ classifications |
| Integration | Multi-task neural network | Combined features |
Scoring Thresholds
| PopEVE Score | Interpretation | Suggested Action |
|---|---|---|
| > 0.9 | Likely Pathogenic | High priority |
| 0.7 - 0.9 | Possibly Pathogenic | Review carefully |
| 0.3 - 0.7 | Uncertain | Additional evidence needed |
| 0.1 - 0.3 | Possibly Benign | Lower priority |
| < 0.1 | Likely Benign | Deprioritize |
Workflow
-
Input: VCF file, gene list, or individual variants.
-
Annotation: Map variants to transcripts and proteins.
-
Feature Extraction: Compute evolutionary, structural, population features.
-
Prediction: Run PopEVE model for pathogenicity scores.
-
Population Adjustment: Apply ancestry-specific calibration.
-
Ranking: Prioritize variants by score and gene relevance.
-
Output: Scored variants with interpretations.
Example Usage
User: "Score all missense variants from this rare disease patient's exome to identify potential causal variants."
Agent Action:
python3 Skills/Genomics/PopEVE_Variant_Predictor_Agent/popeve_predict.py \ --vcf patient_exome.vcf \ --genome GRCh38 \ --ancestry EUR \ --gene_panel rare_disease_genes.txt \ --min_score 0.5 \ --output pathogenicity_scores.tsv
Input Formats
| Format | Description | Example |
|---|---|---|
| VCF | Standard variant calls | patient.vcf.gz |
| TSV | Simple variant list | chr, pos, ref, alt |
| HGVS | Protein notation | NP_000546.1:p.Arg248Gln |
| Gene + Position | Gene-centric | TP53:R248Q |
Output Components
| Column | Description |
|---|---|
| Variant | Genomic/protein notation |
| PopEVE_Score | 0-1 pathogenicity score |
| Classification | Benign/VUS/Pathogenic |
| Confidence | Prediction confidence |
| EVE_Score | Evolutionary component |
| Structure_Score | Structural impact |
| Population_AF | Population frequency |
| Gene | Affected gene |
| Domain | Protein domain affected |
| ClinVar | Existing classification if any |
Comparison with Other Tools
| Tool | PopEVE Advantage |
|---|---|
| SIFT/PolyPhen | More accurate, deep learning |
| CADD | Population-aware, less bias |
| REVEL | Better rare variant handling |
| AlphaMissense | Complimentary; can ensemble |
| ClinVar | Scores novel variants |
AI/ML Components
Deep Learning Architecture:
- Transformer for sequence context
- Graph neural network for structure
- Variational autoencoder for evolution
- Gradient boosting for integration
Training Strategy:
- Semi-supervised with ClinVar labels
- Evolutionary likelihood (unsupervised)
- Population frequency calibration
- Cross-validation across genes
Population Modeling:
- Ancestry-specific allele frequencies
- Selection coefficient estimation
- Demographic history modeling
Performance Metrics
| Metric | PopEVE | AlphaMissense | REVEL |
|---|---|---|---|
| AUROC (ClinVar) | 0.95 | 0.94 | 0.92 |
| AUROC (DMS) | 0.89 | 0.90 | 0.85 |
| VUS Resolution | 45% | 40% | 35% |
| Cross-ancestry | 0.93 | 0.91 | 0.88 |
Prerequisites
- Python 3.10+
- PyTorch, transformers
- PopEVE model weights
- Reference genome (GRCh37/38)
- VEP or similar annotator
- gnomAD database access
Related Skills
- AlphaMissense_Agent - For AlphaMissense predictions
- DiagAI_Agent - For clinical diagnosis support
- ACMG_Classifier_Agent - For ACMG classification
- Pharmacogenomics_Agent - For drug-gene variants
Disease Categories
| Category | Example Genes | PopEVE Performance |
|---|---|---|
| Cardiomyopathy | MYH7, MYBPC3 | Excellent |
| Neurological | SCN1A, KCNQ2 | Excellent |
| Cancer Predisposition | BRCA1, TP53 | Good-Excellent |
| Metabolic | PAH, CFTR | Good |
| Immunodeficiency | BTK, WAS | Good |
Special Considerations
- Gene Coverage: Best for well-conserved genes with orthologs
- Protein-Coding Only: Missense variants in coding regions
- Novel Genes: Lower confidence for poorly characterized genes
- Ancestry Calibration: Use appropriate population reference
- Ensemble Approach: Combine with AlphaMissense for best results
Clinical Integration
| Step | Action |
|---|---|
| 1 | Run PopEVE on all coding variants |
| 2 | Filter by phenotype-relevant genes |
| 3 | Rank by PopEVE score |
| 4 | Review top candidates |
| 5 | Apply ACMG criteria with PopEVE as evidence |
| 6 | Validate with functional studies if available |
Author
AI Group - Biomedical AI Platform
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->