OpenClaw-Medical-Skills tooluniverse-gwas-finemapping
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-gwas-finemapping" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-gwas-finemapping && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-gwas-finemapping" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-gwas-finemapping && rm -rf "$T"
skills/tooluniverse-gwas-finemapping/SKILL.mdGWAS Fine-Mapping & Causal Variant Prioritization
Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.
Overview
Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.
This skill provides tools to:
- Prioritize causal variants using fine-mapping posterior probabilities
- Link variants to genes using locus-to-gene (L2G) predictions
- Annotate variants with functional consequences
- Suggest validation strategies based on fine-mapping results
Key Concepts
Credible Sets
A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:
- SuSiE (Sum of Single Effects)
- FINEMAP (Bayesian fine-mapping)
- PAINTOR (Probabilistic Annotation INtegraTOR)
Posterior Probability
The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.
Locus-to-Gene (L2G) Predictions
L2G scores integrate multiple data types to predict which gene is affected by a variant:
- Distance to gene (closer = higher score)
- eQTL evidence (expression changes)
- Chromatin interactions (Hi-C, promoter capture)
- Functional annotations (coding variants, regulatory regions)
L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.
Use Cases
1. Prioritize Variants at a Known Locus
Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"
from python_implementation import prioritize_causal_variants # Prioritize variants in TCF7L2 for diabetes result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") print(result.get_summary()) # Output shows: # - Credible sets containing TCF7L2 variants # - Posterior probabilities (via fine-mapping methods) # - Top L2G genes (which genes are likely affected) # - Associated traits
2. Fine-Map a Specific Variant
Question: "What do we know about rs429358 (APOE4) from fine-mapping?"
# Fine-map a specific variant result = prioritize_causal_variants("rs429358") # Check which credible sets contain this variant for cs in result.credible_sets: print(f"Trait: {cs.trait}") print(f"Fine-mapping method: {cs.finemapping_method}") print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}") print(f"Confidence: {cs.confidence}")
3. Explore All Loci from a GWAS Study
Question: "What are all the causal loci from the recent T2D meta-analysis?"
from python_implementation import get_credible_sets_for_study # Get all fine-mapped loci from a study credible_sets = get_credible_sets_for_study("GCST90029024") # T2D GWAS print(f"Found {len(credible_sets)} independent loci") # Examine each locus for cs in credible_sets: print(f"\nRegion: {cs.region}") print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}") if cs.l2g_genes: top_gene = cs.l2g_genes[0] print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")
4. Find GWAS Studies for a Disease
Question: "What GWAS studies exist for Alzheimer's disease?"
from python_implementation import search_gwas_studies_for_disease # Search by disease name studies = search_gwas_studies_for_disease("Alzheimer's disease") for study in studies[:5]: print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples") print(f" Author: {study.get('publicationFirstAuthor', 'N/A')}") print(f" Has summary stats: {study.get('hasSumstats', False)}") # Or use precise disease ontology IDs studies = search_gwas_studies_for_disease( "Alzheimer's disease", disease_id="EFO_0000249" # EFO ID for Alzheimer's )
5. Get Validation Suggestions
Question: "How should we validate the top causal variant?"
result = prioritize_causal_variants("APOE", "alzheimer") # Get experimental validation suggestions suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion) # Output includes: # - CRISPR knock-in experiments # - Reporter assays # - eQTL analysis # - Colocalization studies
Workflow Example: Complete Fine-Mapping Analysis
from python_implementation import ( prioritize_causal_variants, search_gwas_studies_for_disease, get_credible_sets_for_study ) # Step 1: Find relevant GWAS studies print("Step 1: Finding T2D GWAS studies...") studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148") largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0) print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)") # Step 2: Get all fine-mapped loci from the study print("\nStep 2: Getting fine-mapped loci...") credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100) print(f"Found {len(credible_sets)} credible sets") # Step 3: Find loci near genes of interest print("\nStep 3: Finding TCF7L2 loci...") tcf7l2_loci = [ cs for cs in credible_sets if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes) ] print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci") # Step 4: Prioritize variants at TCF7L2 print("\nStep 4: Prioritizing TCF7L2 variants...") result = prioritize_causal_variants("TCF7L2", "type 2 diabetes") # Step 5: Print summary and validation plan print("\n" + "="*60) print("FINE-MAPPING SUMMARY") print("="*60) print(result.get_summary()) print("\n" + "="*60) print("VALIDATION STRATEGY") print("="*60) suggestions = result.get_validation_suggestions() for suggestion in suggestions: print(suggestion)
Data Classes
FineMappingResult
FineMappingResultMain result object containing:
: Variant annotationquery_variant
: Gene symbol (if queried by gene)query_gene
: List of fine-mapped locicredible_sets
: All associated traitsassociated_traits
: L2G genes ranked by scoretop_causal_genes
Methods:
: Human-readable summaryget_summary()
: Experimental validation strategiesget_validation_suggestions()
CredibleSet
CredibleSetRepresents a fine-mapped locus:
: Unique identifierstudy_locus_id
: Genomic region (e.g., "10:112861809-113404438")region
: Top variant by posterior probabilitylead_variant
: Statistical method used (SuSiE, FINEMAP, etc.)finemapping_method
: Locus-to-gene predictionsl2g_genes
: Credible set confidence (95%, 99%)confidence
L2GGene
L2GGeneLocus-to-gene prediction:
: Gene name (e.g., "TCF7L2")gene_symbol
: Ensembl gene IDgene_id
: Probability score (0-1)l2g_score
VariantAnnotation
VariantAnnotationFunctional annotation for a variant:
: Open Targets format (chr_pos_ref_alt)variant_id
: dbSNP identifiersrs_ids
,chromosome
: Genomic coordinatesposition
: Functional impactmost_severe_consequence
: Population-specific MAFsallele_frequencies
Tools Used
Open Targets Genetics (GraphQL)
: Variant details and allele frequenciesOpenTargets_get_variant_info
: Credible sets containing a variantOpenTargets_get_variant_credible_sets
: Detailed credible set informationOpenTargets_get_credible_set_detail
: All loci from a GWAS studyOpenTargets_get_study_credible_sets
: Find studies by diseaseOpenTargets_search_gwas_studies_by_disease
GWAS Catalog (REST API)
: Find SNPs by gene or rsIDgwas_search_snps
: Detailed SNP informationgwas_get_snp_by_id
: All trait associations for a variantgwas_get_associations_for_snp
: Find studies by disease/traitgwas_search_studies
Understanding Fine-Mapping Output
Interpreting Posterior Probabilities
- > 0.5: Very likely causal (strong candidate)
- 0.1 - 0.5: Plausible causal variant
- 0.01 - 0.1: Possible but uncertain
- < 0.01: Unlikely to be causal
Interpreting L2G Scores
- > 0.7: High confidence gene-variant link
- 0.5 - 0.7: Moderate confidence
- 0.3 - 0.5: Weak but possible link
- < 0.3: Low confidence
Fine-Mapping Methods Compared
| Method | Approach | Strengths | Use Case |
|---|---|---|---|
| SuSiE | Sum of Single Effects | Handles multiple causal variants | Multi-signal loci |
| FINEMAP | Bayesian shotgun stochastic search | Fast, scalable | Large studies |
| PAINTOR | Functional annotations | Integrates epigenomics | Regulatory variants |
| CAVIAR | Colocalization | Finds shared causal variants | eQTL overlap |
Common Questions
Q: Why don't all variants have credible sets? A: Fine-mapping requires:
- GWAS summary statistics (not just top hits)
- LD reference panel
- Sufficient signal strength (p < 5e-8)
- Computational resources
Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.
Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:
- eQTL evidence in relevant tissues
- Chromatin interaction data (Hi-C)
- Regulatory element annotations (Roadmap, ENCODE)
Q: How do I choose between variants in a credible set? A: Prioritize by:
- Posterior probability (higher = better)
- Functional consequence (coding > regulatory > intergenic)
- eQTL evidence
- Evolutionary conservation
- Experimental feasibility
Limitations
- LD-dependent: Fine-mapping accuracy depends on LD structure matching the study population
- Requires summary stats: Not all studies provide full summary statistics
- Computational intensive: Fine-mapping large studies takes significant resources
- Prior assumptions: Bayesian methods depend on priors (number of causal variants, effect sizes)
- Missing data: Not all GWAS loci have been fine-mapped in Open Targets
Best Practices
- Start with study-level queries when exploring a new disease
- Check multiple studies for replication of signals
- Combine with functional data (eQTLs, chromatin, CRISPR screens)
- Consider ancestry - LD differs across populations
- Validate experimentally - fine-mapping provides candidates, not proof
References
- Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
- Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
- Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
- Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet
Related Skills
- tooluniverse-gwas-explorer: Broader GWAS analysis
- tooluniverse-eqtl-colocalization: Link variants to gene expression
- tooluniverse-gene-prioritization: Systematic gene ranking