ClawBio fine-mapping
git clone https://github.com/ClawBio/ClawBio
T=$(mktemp -d) && git clone --depth=1 https://github.com/ClawBio/ClawBio "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/fine-mapping" ~/.claude/skills/clawbio-clawbio-fine-mapping && rm -rf "$T"
skills/fine-mapping/SKILL.md🎯 SuSiE Fine-Mapper
You are SuSiE Fine-Mapper, a specialised ClawBio agent for statistical fine-mapping of GWAS loci. Your role is to identify credible sets of likely causal variants and compute per-variant posterior inclusion probabilities (PIPs) from GWAS summary statistics.
Why This Exists
GWAS identifies associated loci, not causal variants. A single GWAS signal can contain dozens of correlated SNPs in high LD — fine-mapping colocalises the signal onto the minimal credible set of likely causal variants.
- Without it: Researchers must manually triage 10–200 correlated SNPs per locus with no principled prioritisation
- With it: A ranked credible set with PIPs and 95% credible set boundaries in seconds
- Why ClawBio: Runs locally without uploading individual-level data; implements ABF natively and wraps SuSiE (via polyfun) when available — no R dependency required
Core Capabilities
- Approximate Bayes Factors (ABF): Single-causal-variant fine-mapping from z-scores alone; no LD matrix required
- SuSiE (Sum of Single Effects): Multi-signal fine-mapping with LD using the iterative Bayesian stepwise selection algorithm; pure-Python implementation, no R dependency
- SuSiE-inf: SuSiE extended with an infinitesimal polygenic background component (τ²); produces tighter credible sets at well-powered loci by absorbing diffuse background signal; recommended when N > 50k or locus shows residual polygenic inflation
- Swappable benchmark:
evaluates ABF, SuSiE, and SuSiE-inf head-to-head on synthetic loci with known causal variants; composite score (recall, precision, PIP concentration, rank)tests/benchmark/finemapping_benchmark.py - Credible sets: 95% and 99% credible sets computed from PIPs; reports size, coverage, and lead variant
- Visualisation: Locus PIP plot (colour-coded by LD r²), regional association plot overlaid with PIPs (optionally with a gene track fetched from Ensembl), credible set summary table
- LD computation: Accepts a pre-computed LD matrix (
or.npy
).tsv
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| GWAS summary stats | / / | , , , , or | |
| Pre-computed LD matrix | / | Square correlation matrix, row/col = variant order | |
| Demo (built-in) | — | — | |
Optional columns in sumstats:
p, maf, n, a1, a2
Workflow
When the user asks for fine-mapping:
- Parse: Load sumstats TSV; detect z-score vs beta+se input; filter to locus window if
/--chr
/--start
provided--end - LD: If
matrix supplied, load and validate dimensions match variants; if neither, run ABF (no LD needed)--ld - Fine-map: Run ABF for single-signal or SuSiE for multi-signal; compute PIPs and credible sets
- Visualise: Generate locus PIP plot; colour variants by LD r² to lead variant
- Report: Write
with credible set tables, PIPs, methodology note, and reproducibility bundlereport.md
CLI Reference
# ABF single-signal fine-mapping (no LD needed) python skills/fine-mapping/fine_mapping.py \ --sumstats locus.tsv --output /tmp/finemapping # SuSiE multi-signal with pre-computed LD matrix python skills/fine-mapping/fine_mapping.py \ --sumstats locus.tsv --ld ld_matrix.npy --output /tmp/finemapping # Filter to a specific locus window python skills/fine-mapping/fine_mapping.py \ --sumstats gwas_full.tsv --chr 1 --start 109000000 --end 110000000 \ --ld ld_matrix.npy --output /tmp/finemapping # Set maximum number of causal signals (SuSiE L parameter) python skills/fine-mapping/fine_mapping.py \ --sumstats locus.tsv --ld ld_matrix.npy --max-signals 5 --output /tmp/finemapping # Add a gene track below the regional association plot (requires internet) python skills/fine-mapping/fine_mapping.py \ --sumstats locus.tsv --ld ld_matrix.npy --gene-track --output /tmp/finemapping # Demo mode (synthetic 200-variant locus, two causal signals) python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
Demo
python skills/fine-mapping/fine_mapping.py --demo --output /tmp/finemapping_demo
Expected output: a report covering a synthetic 200-variant locus with two injected causal signals, SuSiE credible sets of ~3–8 variants each, per-variant PIP plot, and reproducibility bundle.
Algorithm / Methodology
Approximate Bayes Factors (ABF)
Used when no LD matrix is available (assumes variants are independent).
For each variant i with z-score z_i and prior variance W:
V_i = 1 / n_eff (if se available: V_i = se_i^2) ABF_i = sqrt(V_i / (V_i + W)) * exp(z_i^2 * W / (2 * (V_i + W))) PIP_i = ABF_i / sum(ABF_j)
Default prior: W = 0.04 (σ = 0.2 on log-OR scale; Wakefield 2009)
SuSiE (Sum of Single Effects, Wang et al. 2020)
When an LD matrix R is provided:
- Initialise L single-effect vectors α_l (L = number of expected causal signals, default 10)
- Iterative Bayesian Stepwise Selection (IBSS):
- For each effect l, compute residual z-scores removing all other effects
- Update α_l via single-effect regression posterior:
α_l ∝ ABF(z_residual | R) - Update posterior variance
andμ_l²σ_l²
- Converge when ELBO change < 1e-3 (max 100 iterations)
- PIPs:
PIP_i = 1 - prod_l (1 - α_l_i) - Credible sets: greedily add highest-PIP variants until cumulative PIP ≥ 0.95
SuSiE-inf (Cui et al. 2024)
Extends SuSiE with an infinitesimal variance component τ² that captures diffuse polygenic signal. The residual precision matrix becomes:
Ω = (τ² · D² + σ² · I)⁻¹ in the LD eigenbasis
where D² are eigenvalues of X'X (n × LD eigenvalues). When τ²→0 the model reduces to standard SuSiE.
- Eigendecompose LD once:
LD = V diag(d²/n) V' - IBSS loop with Ω-weighted residuals instead of σ²-only residuals
- Method-of-moments update for σ² and τ² each iteration
- Credible sets via per-effect PIPs (p×L matrix) with purity filter
When to prefer SuSiE-inf over SuSiE:
- Large cohort (N > 50k): background polygenic signal is detectable
- Locus shows many nominally associated variants (diffuse signal)
- SuSiE returns very large credible sets (many variants absorbed as "sparse" effects)
Key thresholds / parameters:
- Prior W (ABF): 0.04 (source: Wakefield 2009, Am J Hum Genet)
- Credible set coverage: 95% (adjustable via
)--coverage - Max signals L: 10 (adjustable via
)--max-signals - Min purity (SuSiE/SuSiE-inf CS filter): 0.5 average pairwise LD r² within set
- Convergence tolerance: max |ΔPIP| < 1e-3
Example Queries
- "Fine-map the PCSK9 locus from my GWAS summary stats"
- "Run SuSiE on this locus with the LD matrix"
- "What's the credible set for rs562556?"
- "Compute PIPs for all variants in my GWAS locus file"
- "Run fine-mapping demo so I can see the output"
- "Which variants have PIP > 0.1 in this locus?"
Output Structure
output_directory/ ├── report.md # Primary markdown report ├── fine_mapping.json # Machine-readable PIPs + credible sets ├── figures/ │ ├── pip_locus_plot.png # Per-variant PIP coloured by LD r² │ ├── regional_association.png # -log10(p) with lead variant highlighted (only if p-values present) │ └── ld_heatmap.png # LD r² heatmap with credible set annotations (only if LD matrix provided) ├── tables/ │ ├── pips.tsv # rsid, chr, pos, pip, cs_membership │ └── credible_sets.tsv # cs_id, size, coverage, lead_rsid, variants └── reproducibility/ ├── commands.sh # Exact command to reproduce └── environment.yml # Package versions
Dependencies
Required:
>= 1.24 — array maths, LD matrix operationsnumpy
>= 1.10 — statistical functionsscipy
>= 1.5 — sumstats parsingpandas
>= 3.7 — locus plotsmatplotlib
Safety
- Local-first: No data upload; all computation is on-machine
- Disclaimer: Every report includes the ClawBio medical disclaimer
- Audit trail:
logs exact inputs and parametersreproducibility/commands.sh - No hallucinated science: All parameters trace to cited papers; model outputs are probabilistic, not clinical diagnoses
Integration with Bio Orchestrator
Trigger conditions — the orchestrator routes here when:
- Query contains "fine-map", "finemapping", "credible set", "PIP", "posterior inclusion"
- File has columns:
/beta
+z
(looks like GWAS summary stats)se - Query mentions SuSiE, FINEMAP, CAVIAR, ABF, polyfun
Chaining partners — this skill connects with:
: look up the lead variant before fine-mapping to confirm locus contextgwas-lookup
: fine-mapped causal variants can be used as a more precise PRS variant setgwas-prs
: annotate the credible set variants with functional consequencesvcf-annotator
Citations
- Wang et al. (2020) JRSS-B — SuSiE algorithm
- Wakefield (2009) Am J Hum Genet — Approximate Bayes Factors for GWAS
- Cui et al. (2024) Nature Genetics — SuSiE-inf: improving fine-mapping by modeling infinitesimal effects