SciAgent-Skills gnomad-database

Query gnomAD v4 population variant frequencies via GraphQL API. Retrieve allele counts and frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint metrics (pLI, LOEUF, missense z-score), and read depth coverage. Identify variants with low population frequency or under evolutionary constraint. For clinical pathogenicity classifications use clinvar-database; for GWAS associations use gwas-database.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/gnomad-database" ~/.claude/skills/jaechang-hits-sciagent-skills-gnomad-database && rm -rf "$T"
manifest: skills/genomics-bioinformatics/gnomad-database/SKILL.md
source content

gnomAD Database

Overview

The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at

https://gnomad.broadinstitute.org/api
— no authentication required, no official SDK.

When to Use

  • Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
  • Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
  • Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
  • Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
  • Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
  • Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
  • For clinical pathogenicity classifications use
    clinvar-database
    ; gnomAD provides frequency evidence but does not classify pathogenicity
  • For GWAS associations at the study level use
    gwas-database
    ; gnomAD is for population frequency lookups

Prerequisites

  • Python packages:
    requests
    ,
    pandas
    ,
    matplotlib
  • Data requirements: gene symbols (e.g.,
    BRCA1
    ), variant IDs (
    1-69511-A-G
    format, or rsIDs)
  • Environment: internet connection; no API key required
  • Rate limits: no official published limits; use
    time.sleep(0.5)
    between requests for polite access; avoid bursts over 10 requests/second
pip install requests pandas matplotlib

Quick Start

import requests
import time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query: str, variables: dict = None) -> dict:
    """Execute a gnomAD GraphQL query and return the data payload."""
    payload = {"query": query, "variables": variables or {}}
    r = requests.post(GNOMAD_API, json=payload, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

# Quick check: get pLI for BRCA1
query = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gnomad_constraint { pLI lof { oe_ci { upper } } }
  }
}
"""
data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"})
constraint = data["gene"]["gnomad_constraint"]
print(f"BRCA1 pLI: {constraint['pLI']:.3f}")
print(f"BRCA1 LOEUF: {constraint['lof']['oe_ci']['upper']:.3f}")
# BRCA1 pLI: 0.999
# BRCA1 LOEUF: 0.127

Core API

Query 1: Gene Variant Query

Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.

import requests, time

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(f"GraphQL errors: {result['errors']}")
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    gene_name
    variants(dataset: $dataset) {
      variant_id
      rsids
      chrom
      pos
      ref
      alt
      consequence
      lof
      genome {
        an
        ac
        af
        faf95 { popmax popmax_population }
      }
    }
  }
}
"""

data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": "PCSK9",
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})
variants = data["gene"]["variants"]
print(f"Gene: {data['gene']['gene_name']} ({data['gene']['gene_id']})")
print(f"Total variants: {len(variants)}")
# Filter to rare variants (AF < 0.001)
rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001]
print(f"Rare variants (AF < 0.1%): {len(rare)}")
for v in rare[:3]:
    print(f"  {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")

Query 2: Variant Lookup

Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.

VARIANT_QUERY = """
query VariantDetails($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    rsids
    chrom
    pos
    ref
    alt
    consequence
    lof
    lof_filter
    lof_flags
    genome {
      an
      ac
      af
      faf95 { popmax popmax_population }
      populations {
        id
        ac
        an
        af
      }
    }
  }
}
"""

data = gnomad_query(VARIANT_QUERY, {
    "variant_id": "1-55039974-G-T",   # PCSK9 p.Tyr142Ter (LoF)
    "dataset": "gnomad_r4"
})
v = data["variant"]
print(f"Variant: {v['variant_id']}")
print(f"rsIDs: {v['rsids']}")
print(f"Consequence: {v['consequence']}  |  LoF: {v['lof']}")
g = v["genome"]
print(f"Genome AF: {g['af']:.2e}  (AC={g['ac']}, AN={g['an']})")
print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")

Query 3: Population Frequencies

Retrieve allele frequency broken down by ancestry group for a specific variant.

import pandas as pd

POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    genome {
      populations {
        id
        ac
        an
        af
        homozygote_count
      }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "African/African American",
    "amr": "Admixed American",
    "eas": "East Asian",
    "fin": "Finnish",
    "nfe": "Non-Finnish European",
    "sas": "South Asian",
    "asj": "Ashkenazi Jewish",
    "mid": "Middle Eastern",
    "oth": "Other",
}

data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variant_id": "1-55039974-G-T",
    "dataset": "gnomad_r4"
})
pops = data["variant"]["genome"]["populations"]

# Filter to top-level ancestry groups (exclude sex-specific)
main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(main_pops)
df["label"] = df["id"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)
print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))

Query 4: Coverage Query

Retrieve per-base read depth coverage for a gene region to assess data completeness.

COVERAGE_QUERY = """
query Coverage($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!) {
  coverage(dataset: $dataset, chrom: $chrom, start: $start, stop: $stop) {
    pos
    mean
    median
    over_1
    over_10
    over_20
    over_30
    over_100
  }
}
"""

data = gnomad_query(COVERAGE_QUERY, {
    "chrom": "1",
    "start": 55039700,
    "stop": 55040200,
    "dataset": "gnomad_r4"
})
cov = data["coverage"]
print(f"Coverage positions retrieved: {len(cov)}")
if cov:
    avg_mean = sum(c["mean"] for c in cov) / len(cov)
    pct_20x = sum(1 for c in cov if c["over_20"] > 0.9) / len(cov) * 100
    print(f"Average mean depth: {avg_mean:.1f}x")
    print(f"Positions with >90% samples at >=20x: {pct_20x:.1f}%")
    # Example single position
    c = cov[0]
    print(f"\nPosition {c['pos']}: mean={c['mean']:.1f}x, median={c['median']}x")
    print(f"  Fraction >=10x: {c['over_10']:.3f}, >=20x: {c['over_20']:.3f}, >=30x: {c['over_30']:.3f}")

Query 5: Gene Constraint

Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id
    gene_name
    gnomad_constraint {
      pLI
      pNull
      pRec
      mis_z
      lof {
        obs
        exp
        oe
        oe_ci { lower upper }
      }
    }
  }
}
"""

genes = ["PCSK9", "BRCA1", "TP53", "TTN"]
print(f"{'Gene':<10} {'pLI':>6} {'LOEUF':>7} {'mis_z':>7}")
print("-" * 35)
for gene in genes:
    data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
    c = data["gene"]["gnomad_constraint"]
    loeuf = c["lof"]["oe_ci"]["upper"]
    print(f"{gene:<10} {c['pLI']:>6.3f} {loeuf:>7.3f} {c['mis_z']:>7.2f}")
    time.sleep(0.5)
# Gene        pLI   LOEUF   mis_z
# PCSK9     0.855   0.543    2.11
# BRCA1     0.999   0.127    3.84
# TP53      0.993   0.191    5.21
# TTN       0.001   0.993   -2.40

Query 6: Variant Search by Region

Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.

REGION_VARIANTS_QUERY = """
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!,
                     $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
  region(chrom: $chrom, start: $start, stop: $stop,
         reference_genome: $reference_genome) {
    variants(dataset: $dataset) {
      variant_id
      rsids
      pos
      consequence
      lof
      genome {
        af
        ac
        an
        faf95 { popmax }
      }
    }
  }
}
"""

data = gnomad_query(REGION_VARIANTS_QUERY, {
    "chrom": "1",
    "start": 55039974,
    "stop": 55064852,   # PCSK9 coding region
    "dataset": "gnomad_r4",
    "reference_genome": "GRCh38"
})
variants = data["region"]["variants"]
print(f"Variants in region: {len(variants)}")

# Summarize by consequence
from collections import Counter
conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"])
for c, n in conseq_counts.most_common(5):
    print(f"  {c}: {n}")

# Loss-of-function variants
lof_vars = [v for v in variants if v["lof"] == "HC"]
print(f"\nHigh-confidence LoF variants: {len(lof_vars)}")
for v in lof_vars[:3]:
    af = v["genome"]["af"] if v["genome"] else None
    print(f"  {v['variant_id']} | AF={af:.2e}" if af else f"  {v['variant_id']} | AF=NA")

Key Concepts

gnomAD Data Model

gnomAD v4 has two datasets:

gnomad_r4
(exomes + genomes, GRCh38, 730K+ individuals) and
gnomad_r2_1
(GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through
gene()
,
region()
, or direct
variant()
lookups. Each variant has separate
exome
and
genome
frequency objects; the
genome
object is preferred for population frequency comparisons.

Ancestry Groups

gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):

CodePopulationDataset size (approx)
afr
African/African American76,000+
amr
Admixed American45,000+
eas
East Asian50,000+
fin
Finnish24,000+
nfe
Non-Finnish European400,000+
sas
South Asian80,000+
asj
Ashkenazi Jewish10,000+
mid
Middle Eastern5,000+
oth
Other/Unknownvaries

Filtering Allele Frequency (FAF95)

The

faf95
field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with
faf95.popmax < 0.001
is likely rare enough to warrant clinical investigation.

Constraint Scores

ScoreInterpretation
pLI > 0.9
Gene is intolerant to LoF — likely essential
LOEUF < 0.35
Strong LoF constraint (upper CI of oe ratio)
mis_z > 3.09
Gene shows significant missense constraint
pLI < 0.1
Gene tolerates LoF — homozygous LoF variants exist

Common Workflows

Workflow 1: Rare Variant Frequency Report for a Gene

Goal: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

GENE_VARIANTS_QUERY = """
query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id gene_name
    variants(dataset: $dataset) {
      variant_id rsids chrom pos ref alt consequence lof lof_filter
      genome {
        an ac af
        faf95 { popmax popmax_population }
        populations { id ac an af }
      }
    }
  }
}
"""

gene = "LDLR"
data = gnomad_query(GENE_VARIANTS_QUERY, {
    "gene_symbol": gene,
    "reference_genome": "GRCh38",
    "dataset": "gnomad_r4"
})

variants = data["gene"]["variants"]
rows = []
for v in variants:
    g = v.get("genome") or {}
    af = g.get("af")
    if af is None or af >= 0.01:   # keep only rare variants
        continue
    rows.append({
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v.get("consequence"),
        "lof": v.get("lof"),
        "af_genome": af,
        "ac": g.get("ac"),
        "an": g.get("an"),
        "faf95_popmax": g.get("faf95", {}).get("popmax"),
        "faf95_pop": g.get("faf95", {}).get("popmax_population"),
    })

df = pd.DataFrame(rows)
df = df.sort_values("af_genome")
df.to_csv(f"{gene}_rare_variants.csv", index=False)
print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)")
print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6))
# LDLR: 2847 total variants, 2631 rare (AF<1%)
# consequence
# missense_variant              1423
# synonymous_variant             512
# splice_region_variant          231
# stop_gained                    198

Workflow 2: Ancestry-Stratified Frequency Visualization

Goal: Query a list of variants and produce a barplot of allele frequencies by ancestry group.

import requests, time
import pandas as pd
import matplotlib.pyplot as plt

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

POPULATION_FREQ_QUERY = """
query PopFreqs($variant_id: String!, $dataset: DatasetId!) {
  variant(variant_id: $variant_id, dataset: $dataset) {
    variant_id
    genome {
      populations { id ac an af }
    }
  }
}
"""

ANCESTRY_LABELS = {
    "afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN",
    "nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID",
}

variant_id = "1-55039974-G-T"   # PCSK9 p.Tyr142Ter
data = gnomad_query(POPULATION_FREQ_QUERY, {
    "variant_id": variant_id,
    "dataset": "gnomad_r4"
})

pops = data["variant"]["genome"]["populations"]
rows = [{"code": p["id"], "af": p["af"], "ac": p["ac"], "an": p["an"]}
        for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0]
df = pd.DataFrame(rows)
df["label"] = df["code"].map(ANCESTRY_LABELS)
df = df.sort_values("af", ascending=False)

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white")
ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2)
ax.set_xlabel("Ancestry Group")
ax.set_ylabel("Allele Frequency (%)")
ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}")
ax.set_ylim(0, df["af"].max() * 150)
plt.tight_layout()
plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight")
print(f"Saved gnomad_pop_frequencies.png  (n={len(df)} ancestry groups)")
print(df[["label", "af", "ac", "an"]].to_string(index=False))

Workflow 3: Constraint-Guided Gene Prioritization

Goal: Score a gene list by constraint metrics and flag LoF-intolerant genes.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def gnomad_query(query, variables=None):
    r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30)
    r.raise_for_status()
    result = r.json()
    if "errors" in result:
        raise ValueError(result["errors"])
    return result["data"]

CONSTRAINT_QUERY = """
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
  gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
    gene_id gene_name
    gnomad_constraint {
      pLI pNull pRec mis_z
      lof { obs exp oe oe_ci { lower upper } }
    }
  }
}
"""

gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"]
records = []
for gene in gene_list:
    try:
        data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"})
        c = data["gene"]["gnomad_constraint"]
        records.append({
            "gene": gene,
            "pLI": c["pLI"],
            "LOEUF": c["lof"]["oe_ci"]["upper"],
            "mis_z": c["mis_z"],
            "lof_obs": c["lof"]["obs"],
            "lof_exp": c["lof"]["exp"],
            "lof_oe": c["lof"]["oe"],
        })
    except Exception as e:
        print(f"Warning: {gene} failed — {e}")
    time.sleep(0.5)

df = pd.DataFrame(records).sort_values("LOEUF")
df["lof_intolerant"] = df["pLI"] > 0.9
print(df[["gene", "pLI", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False))
df.to_csv("constraint_scores.csv", index=False)
print(f"\nLoF-intolerant genes: {df['lof_intolerant'].sum()}/{len(df)}")

Key Parameters

ParameterFunction/EndpointDefaultRange / OptionsEffect
dataset
All variant queries
gnomad_r4
,
gnomad_r2_1
,
gnomad_r3
Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1)
reference_genome
gene(), region()
GRCh38
,
GRCh37
Coordinate system; must match dataset
variant_id
variant()
CHROM-POS-REF-ALT
string
Identifies the specific variant to query
gene_symbol
gene()HGNC symbol stringGene to retrieve; case-insensitive
chrom
,
start
,
stop
region()valid genomic coordinatesRegion boundaries for region queries
faf95.popmax
variant() genomefloat 0–1Filtering allele frequency (95% CI upper bound); use < 0.001 for rare
lof
filter field
gene() variants
"HC"
(high-confidence),
"LC"
LoF confidence level
populations.id
genome.populations
afr
,
amr
,
eas
,
fin
,
nfe
,
sas
,
asj
,
mid
,
oth
Per-ancestry frequency

Best Practices

  1. Use

    gnomad_r4
    for GRCh38 analyses: gnomAD v4 is the most current dataset with 730K+ individuals. Use
    gnomad_r2_1
    only when comparing to GRCh37-based variant calls.

  2. Use

    faf95.popmax
    for clinical filtering, not overall AF: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.

  3. Add

    time.sleep(0.5)
    in batch loops: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.

  4. Filter

    lof == "HC"
    for LoF burden analyses: Low-confidence LoF (
    "LC"
    ) annotations are often in repetitive regions or may be sequencing artifacts. High-confidence (
    "HC"
    ) calls are filtered by LOFTEE.

  5. Check AN before interpreting AF: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.

Common Recipes

Recipe: Check if a Variant Is Common in Any Population

When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.

import requests

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"):
    query = """
    query($variant_id: String!, $dataset: DatasetId!) {
      variant(variant_id: $variant_id, dataset: $dataset) {
        genome { faf95 { popmax popmax_population } af }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                                         "variables": {"variant_id": variant_id, "dataset": dataset}},
                      timeout=15)
    data = r.json()["data"]["variant"]
    if not data or not data["genome"]:
        return None, "Variant not found in gnomAD"
    af = data["genome"]["af"]
    popmax = data["genome"]["faf95"]["popmax"]
    pop = data["genome"]["faf95"]["popmax_population"]
    is_common = (popmax or 0) >= threshold
    return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}"

common, info = is_common_in_any_population("1-55039974-G-T")
print(f"Common: {common}  |  {info}")
# Common: False  |  overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe

Recipe: Batch Constraint Lookup

When to use: Score multiple genes from a differential expression or GWAS gene list.

import requests, time, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_constraint(gene_symbol, reference_genome="GRCh38"):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        gnomad_constraint { pLI mis_z lof { oe_ci { upper } } }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}},
                      timeout=15)
    data = r.json().get("data", {}).get("gene", {})
    if not data or not data.get("gnomad_constraint"):
        return None
    c = data["gnomad_constraint"]
    return {"gene": gene_symbol, "pLI": c["pLI"], "LOEUF": c["lof"]["oe_ci"]["upper"], "mis_z": c["mis_z"]}

genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"]
rows = [r for g in genes for r in [get_constraint(g)] if r]
time.sleep(0.5)   # polite delay per gene in real loop

df = pd.DataFrame(rows)
print(df.to_string(index=False))
# gene     pLI   LOEUF  mis_z
# BRCA1  0.999   0.127   3.84
# BRCA2  1.000   0.176   3.21

Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference

When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.

import requests, pandas as pd

GNOMAD_API = "https://gnomad.broadinstitute.org/api"

def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001):
    query = """
    query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) {
      gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
        variants(dataset: $dataset) {
          variant_id rsids chrom pos ref alt consequence lof
          genome { af ac an }
        }
      }
    }
    """
    r = requests.post(GNOMAD_API, json={"query": query,
                      "variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}},
                      timeout=60)
    variants = r.json()["data"]["gene"]["variants"]
    lof = [v for v in variants
           if v.get("lof") == "HC"
           and v.get("genome") and v["genome"].get("af") is not None
           and v["genome"]["af"] < max_af]
    return pd.DataFrame([{
        "variant_id": v["variant_id"],
        "rsids": ";".join(v.get("rsids") or []),
        "consequence": v["consequence"],
        "af": v["genome"]["af"],
        "ac": v["genome"]["ac"],
    } for v in lof])

df = get_lof_variants("CFTR", max_af=0.001)
print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("CFTR_HC_lof_variants.csv", index=False)

Troubleshooting

ProblemCauseSolution
{"errors": [...]}
from GraphQL
Invalid field name, wrong dataset ID, or null geneCheck field names match gnomAD v4 schema; use
gnomad_r4
not
gnomad_v4
Variant returns
None
genome object
Variant only in exome data, not genomeTry accessing
exome
field instead of
genome
; genome is absent for exome-only variants
Gene query returns empty variants listGene symbol not found or mismatchVerify HGNC symbol (case-sensitive); use
gene_id
(ENSG ID) as fallback
faf95
returns
null
Variant is absent or monomorphic in all populationsCheck
ac
and
an
— variant may have AC=0 or be filtered
requests.exceptions.Timeout
Large gene (e.g., TTN) takes >30sIncrease
timeout=120
; for very large genes use region queries instead
Population AF is
None
for some groups
Variant not observed in that ancestryTreat
None
AF as 0 for filtering; check
an
to confirm the group was sequenced
reference_genome
mismatch error
Using GRCh37 coords with
gnomad_r4
Use
GRCh38
for
gnomad_r4
/
gnomad_r3
; use
GRCh37
only for
gnomad_r2_1

Related Skills

  • clinvar-database
    — ClinVar pathogenicity classifications (complement to gnomAD population frequency data)
  • gwas-database
    — GWAS Catalog for SNP-trait associations from published GWAS studies
  • ensembl-database
    — Ensembl VEP for variant consequence prediction and gene annotation
  • dbsnp-database
    — dbSNP for rsID lookup, variant classes, and cross-database ID mapping

References