OpenClaw-Medical-Skills tooluniverse-gwas-finemapping

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions. Computes posterior probabilities for causal variants, links variants to genes via L2G predictions, annotates functional consequences, and suggests validation strategies. Use when asked to fine-map GWAS loci, prioritize causal variants, identify credible sets, or link GWAS signals to causal genes.

install

source · Clone the upstream repo

git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tooluniverse-gwas-finemapping" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-gwas-finemapping && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/tooluniverse-gwas-finemapping" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-tooluniverse-gwas-finemapping && rm -rf "$T"

manifest: skills/tooluniverse-gwas-finemapping/SKILL.md

GWAS Fine-Mapping & Causal Variant Prioritization

Identify and prioritize causal variants at GWAS loci using statistical fine-mapping and locus-to-gene predictions.

Overview

Genome-wide association studies (GWAS) identify genomic regions associated with traits, but linkage disequilibrium (LD) makes it difficult to pinpoint the causal variant. Fine-mapping uses Bayesian statistical methods to compute the posterior probability that each variant is causal, given the GWAS summary statistics.

This skill provides tools to:

Prioritize causal variants using fine-mapping posterior probabilities
Link variants to genes using locus-to-gene (L2G) predictions
Annotate variants with functional consequences
Suggest validation strategies based on fine-mapping results

Key Concepts

Credible Sets

A credible set is a minimal set of variants that contains the causal variant with high confidence (typically 95% or 99%). Each variant in the set has a posterior probability of being causal, computed using methods like:

SuSiE (Sum of Single Effects)
FINEMAP (Bayesian fine-mapping)
PAINTOR (Probabilistic Annotation INtegraTOR)

Posterior Probability

The probability that a specific variant is causal, given the GWAS data and LD structure. Higher posterior probability = more likely to be causal.

Locus-to-Gene (L2G) Predictions

L2G scores integrate multiple data types to predict which gene is affected by a variant:

Distance to gene (closer = higher score)
eQTL evidence (expression changes)
Chromatin interactions (Hi-C, promoter capture)
Functional annotations (coding variants, regulatory regions)

L2G scores range from 0 to 1, with higher scores indicating stronger gene-variant links.

Use Cases

1. Prioritize Variants at a Known Locus

Question: "Which variant at the TCF7L2 locus is likely causal for type 2 diabetes?"

from python_implementation import prioritize_causal_variants

# Prioritize variants in TCF7L2 for diabetes
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")
print(result.get_summary())

# Output shows:
# - Credible sets containing TCF7L2 variants
# - Posterior probabilities (via fine-mapping methods)
# - Top L2G genes (which genes are likely affected)
# - Associated traits

2. Fine-Map a Specific Variant

Question: "What do we know about rs429358 (APOE4) from fine-mapping?"

# Fine-map a specific variant
result = prioritize_causal_variants("rs429358")

# Check which credible sets contain this variant
for cs in result.credible_sets:
    print(f"Trait: {cs.trait}")
    print(f"Fine-mapping method: {cs.finemapping_method}")
    print(f"Top gene: {cs.l2g_genes[0] if cs.l2g_genes else 'N/A'}")
    print(f"Confidence: {cs.confidence}")

3. Explore All Loci from a GWAS Study

Question: "What are all the causal loci from the recent T2D meta-analysis?"

from python_implementation import get_credible_sets_for_study

# Get all fine-mapped loci from a study
credible_sets = get_credible_sets_for_study("GCST90029024")  # T2D GWAS

print(f"Found {len(credible_sets)} independent loci")

# Examine each locus
for cs in credible_sets:
    print(f"\nRegion: {cs.region}")
    print(f"Lead variant: {cs.lead_variant.rs_ids[0] if cs.lead_variant else 'N/A'}")

    if cs.l2g_genes:
        top_gene = cs.l2g_genes[0]
        print(f"Most likely causal gene: {top_gene.gene_symbol} (L2G: {top_gene.l2g_score:.3f})")

4. Find GWAS Studies for a Disease

Question: "What GWAS studies exist for Alzheimer's disease?"

from python_implementation import search_gwas_studies_for_disease

# Search by disease name
studies = search_gwas_studies_for_disease("Alzheimer's disease")

for study in studies[:5]:
    print(f"{study['id']}: {study.get('nSamples', 'N/A')} samples")
    print(f"   Author: {study.get('publicationFirstAuthor', 'N/A')}")
    print(f"   Has summary stats: {study.get('hasSumstats', False)}")

# Or use precise disease ontology IDs
studies = search_gwas_studies_for_disease(
    "Alzheimer's disease",
    disease_id="EFO_0000249"  # EFO ID for Alzheimer's
)

5. Get Validation Suggestions

Question: "How should we validate the top causal variant?"

result = prioritize_causal_variants("APOE", "alzheimer")

# Get experimental validation suggestions
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
    print(suggestion)

# Output includes:
# - CRISPR knock-in experiments
# - Reporter assays
# - eQTL analysis
# - Colocalization studies

Workflow Example: Complete Fine-Mapping Analysis

from python_implementation import (
    prioritize_causal_variants,
    search_gwas_studies_for_disease,
    get_credible_sets_for_study
)

# Step 1: Find relevant GWAS studies
print("Step 1: Finding T2D GWAS studies...")
studies = search_gwas_studies_for_disease("type 2 diabetes", "MONDO_0005148")
largest_study = max(studies, key=lambda s: s.get('nSamples', 0) or 0)
print(f"Largest study: {largest_study['id']} ({largest_study.get('nSamples', 'N/A')} samples)")

# Step 2: Get all fine-mapped loci from the study
print("\nStep 2: Getting fine-mapped loci...")
credible_sets = get_credible_sets_for_study(largest_study['id'], max_sets=100)
print(f"Found {len(credible_sets)} credible sets")

# Step 3: Find loci near genes of interest
print("\nStep 3: Finding TCF7L2 loci...")
tcf7l2_loci = [
    cs for cs in credible_sets
    if any(gene.gene_symbol == "TCF7L2" for gene in cs.l2g_genes)
]

print(f"TCF7L2 appears in {len(tcf7l2_loci)} loci")

# Step 4: Prioritize variants at TCF7L2
print("\nStep 4: Prioritizing TCF7L2 variants...")
result = prioritize_causal_variants("TCF7L2", "type 2 diabetes")

# Step 5: Print summary and validation plan
print("\n" + "="*60)
print("FINE-MAPPING SUMMARY")
print("="*60)
print(result.get_summary())

print("\n" + "="*60)
print("VALIDATION STRATEGY")
print("="*60)
suggestions = result.get_validation_suggestions()
for suggestion in suggestions:
    print(suggestion)

Data Classes

FineMappingResult

Main result object containing:

```
query_variant
```
: Variant annotation
```
query_gene
```
: Gene symbol (if queried by gene)
```
credible_sets
```
: List of fine-mapped loci
```
associated_traits
```
: All associated traits
```
top_causal_genes
```
: L2G genes ranked by score

Methods:

```
get_summary()
```
: Human-readable summary
```
get_validation_suggestions()
```
: Experimental validation strategies

CredibleSet

Represents a fine-mapped locus:

```
study_locus_id
```
: Unique identifier
```
region
```
: Genomic region (e.g., "10:112861809-113404438")
```
lead_variant
```
: Top variant by posterior probability
```
finemapping_method
```
: Statistical method used (SuSiE, FINEMAP, etc.)
```
l2g_genes
```
: Locus-to-gene predictions
```
confidence
```
: Credible set confidence (95%, 99%)

L2GGene

Locus-to-gene prediction:

```
gene_symbol
```
: Gene name (e.g., "TCF7L2")
```
gene_id
```
: Ensembl gene ID
```
l2g_score
```
: Probability score (0-1)

VariantAnnotation

Functional annotation for a variant:

```
variant_id
```
: Open Targets format (chr_pos_ref_alt)
```
rs_ids
```
: dbSNP identifiers
```
chromosome
```
,
```
position
```
: Genomic coordinates
```
most_severe_consequence
```
: Functional impact
```
allele_frequencies
```
: Population-specific MAFs

Tools Used

Open Targets Genetics (GraphQL)

```
OpenTargets_get_variant_info
```
: Variant details and allele frequencies
```
OpenTargets_get_variant_credible_sets
```
: Credible sets containing a variant
```
OpenTargets_get_credible_set_detail
```
: Detailed credible set information
```
OpenTargets_get_study_credible_sets
```
: All loci from a GWAS study

OpenTargets_search_gwas_studies_by_disease

: Find studies by disease

GWAS Catalog (REST API)

```
gwas_search_snps
```
: Find SNPs by gene or rsID
```
gwas_get_snp_by_id
```
: Detailed SNP information
```
gwas_get_associations_for_snp
```
: All trait associations for a variant
```
gwas_search_studies
```
: Find studies by disease/trait

Understanding Fine-Mapping Output

Interpreting Posterior Probabilities

> 0.5: Very likely causal (strong candidate)
0.1 - 0.5: Plausible causal variant
0.01 - 0.1: Possible but uncertain
< 0.01: Unlikely to be causal

Interpreting L2G Scores

> 0.7: High confidence gene-variant link
0.5 - 0.7: Moderate confidence
0.3 - 0.5: Weak but possible link
< 0.3: Low confidence

Fine-Mapping Methods Compared

Method	Approach	Strengths	Use Case
SuSiE	Sum of Single Effects	Handles multiple causal variants	Multi-signal loci
FINEMAP	Bayesian shotgun stochastic search	Fast, scalable	Large studies
PAINTOR	Functional annotations	Integrates epigenomics	Regulatory variants
CAVIAR	Colocalization	Finds shared causal variants	eQTL overlap

Common Questions

Q: Why don't all variants have credible sets? A: Fine-mapping requires:

GWAS summary statistics (not just top hits)
LD reference panel
Sufficient signal strength (p < 5e-8)
Computational resources

Q: Can a variant be in multiple credible sets? A: Yes! A variant can be causal for multiple traits (pleiotropy) or appear in different studies for the same trait.

Q: What if the top L2G gene is far from the variant? A: This suggests regulatory effects (enhancers, promoters). Check:

eQTL evidence in relevant tissues
Chromatin interaction data (Hi-C)
Regulatory element annotations (Roadmap, ENCODE)

Q: How do I choose between variants in a credible set? A: Prioritize by:

Posterior probability (higher = better)
Functional consequence (coding > regulatory > intergenic)
eQTL evidence
Evolutionary conservation
Experimental feasibility

Limitations

LD-dependent: Fine-mapping accuracy depends on LD structure matching the study population
Requires summary stats: Not all studies provide full summary statistics
Computational intensive: Fine-mapping large studies takes significant resources
Prior assumptions: Bayesian methods depend on priors (number of causal variants, effect sizes)
Missing data: Not all GWAS loci have been fine-mapped in Open Targets

Best Practices

Start with study-level queries when exploring a new disease
Check multiple studies for replication of signals
Combine with functional data (eQTLs, chromatin, CRISPR screens)
Consider ancestry - LD differs across populations
Validate experimentally - fine-mapping provides candidates, not proof

References

Wang et al. (2020) "A simple new approach to variable selection in regression, with application to genetic fine mapping." JRSS-B (SuSiE)
Benner et al. (2016) "FINEMAP: efficient variable selection using summary data from genome-wide association studies." Bioinformatics
Ghoussaini et al. (2021) "Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics." NAR
Mountjoy et al. (2021) "An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci." Nat Genet

Related Skills

tooluniverse-gwas-explorer: Broader GWAS analysis
tooluniverse-eqtl-colocalization: Link variants to gene expression
tooluniverse-gene-prioritization: Systematic gene ranking