SciAgent-Skills gnomad-database
Query gnomAD v4 population variant frequencies via GraphQL API. Retrieve allele counts and frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint metrics (pLI, LOEUF, missense z-score), and read depth coverage. Identify variants with low population frequency or under evolutionary constraint. For clinical pathogenicity classifications use clinvar-database; for GWAS associations use gwas-database.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/genomics-bioinformatics/gnomad-database" ~/.claude/skills/jaechang-hits-sciagent-skills-gnomad-database && rm -rf "$T"
skills/genomics-bioinformatics/gnomad-database/SKILL.mdgnomAD Database
Overview
The Genome Aggregation Database (gnomAD) is a resource of aggregated exome and genome sequencing data from 730,000+ individuals. It provides population variant frequencies stratified by 9 ancestry groups, gene-level constraint scores (pLI, LOEUF), and read coverage information. Access is free via a GraphQL API at
https://gnomad.broadinstitute.org/api — no authentication required, no official SDK.
When to Use
- Checking whether a candidate variant is rare enough to be clinically relevant (AF < 0.1% in all populations)
- Retrieving allele frequencies stratified by ancestry group (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID) for a variant
- Identifying all rare loss-of-function variants in a gene for burden testing or candidate prioritization
- Getting gene constraint metrics (pLI, LOEUF) to assess tolerance to loss-of-function variants
- Checking read depth coverage for a region to evaluate if low variant frequency reflects low sequencing coverage
- Filtering a VCF by population frequency — query gnomAD AF to discard common variants before clinical interpretation
- For clinical pathogenicity classifications use
; gnomAD provides frequency evidence but does not classify pathogenicityclinvar-database - For GWAS associations at the study level use
; gnomAD is for population frequency lookupsgwas-database
Prerequisites
- Python packages:
,requests
,pandasmatplotlib - Data requirements: gene symbols (e.g.,
), variant IDs (BRCA1
format, or rsIDs)1-69511-A-G - Environment: internet connection; no API key required
- Rate limits: no official published limits; use
between requests for polite access; avoid bursts over 10 requests/secondtime.sleep(0.5)
pip install requests pandas matplotlib
Quick Start
import requests import time GNOMAD_API = "https://gnomad.broadinstitute.org/api" def gnomad_query(query: str, variables: dict = None) -> dict: """Execute a gnomAD GraphQL query and return the data payload.""" payload = {"query": query, "variables": variables or {}} r = requests.post(GNOMAD_API, json=payload, timeout=30) r.raise_for_status() result = r.json() if "errors" in result: raise ValueError(f"GraphQL errors: {result['errors']}") return result["data"] # Quick check: get pLI for BRCA1 query = """ query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gnomad_constraint { pLI lof { oe_ci { upper } } } } } """ data = gnomad_query(query, {"gene_symbol": "BRCA1", "reference_genome": "GRCh38"}) constraint = data["gene"]["gnomad_constraint"] print(f"BRCA1 pLI: {constraint['pLI']:.3f}") print(f"BRCA1 LOEUF: {constraint['lof']['oe_ci']['upper']:.3f}") # BRCA1 pLI: 0.999 # BRCA1 LOEUF: 0.127
Core API
Query 1: Gene Variant Query
Fetch all variants in a gene with population allele frequencies. Returns a list of variants with their genome-level frequencies.
import requests, time GNOMAD_API = "https://gnomad.broadinstitute.org/api" def gnomad_query(query, variables=None): r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30) r.raise_for_status() result = r.json() if "errors" in result: raise ValueError(f"GraphQL errors: {result['errors']}") return result["data"] GENE_VARIANTS_QUERY = """ query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gene_id gene_name variants(dataset: $dataset) { variant_id rsids chrom pos ref alt consequence lof genome { an ac af faf95 { popmax popmax_population } } } } } """ data = gnomad_query(GENE_VARIANTS_QUERY, { "gene_symbol": "PCSK9", "reference_genome": "GRCh38", "dataset": "gnomad_r4" }) variants = data["gene"]["variants"] print(f"Gene: {data['gene']['gene_name']} ({data['gene']['gene_id']})") print(f"Total variants: {len(variants)}") # Filter to rare variants (AF < 0.001) rare = [v for v in variants if v["genome"] and v["genome"]["af"] is not None and v["genome"]["af"] < 0.001] print(f"Rare variants (AF < 0.1%): {len(rare)}") for v in rare[:3]: print(f" {v['variant_id']} | {v['consequence']} | AF={v['genome']['af']:.2e}")
Query 2: Variant Lookup
Fetch detailed information for a single variant by its gnomAD variant ID (CHROM-POS-REF-ALT format) or search by rsID.
VARIANT_QUERY = """ query VariantDetails($variant_id: String!, $dataset: DatasetId!) { variant(variant_id: $variant_id, dataset: $dataset) { variant_id rsids chrom pos ref alt consequence lof lof_filter lof_flags genome { an ac af faf95 { popmax popmax_population } populations { id ac an af } } } } """ data = gnomad_query(VARIANT_QUERY, { "variant_id": "1-55039974-G-T", # PCSK9 p.Tyr142Ter (LoF) "dataset": "gnomad_r4" }) v = data["variant"] print(f"Variant: {v['variant_id']}") print(f"rsIDs: {v['rsids']}") print(f"Consequence: {v['consequence']} | LoF: {v['lof']}") g = v["genome"] print(f"Genome AF: {g['af']:.2e} (AC={g['ac']}, AN={g['an']})") print(f"FAF95 popmax: {g['faf95']['popmax']:.2e} in {g['faf95']['popmax_population']}")
Query 3: Population Frequencies
Retrieve allele frequency broken down by ancestry group for a specific variant.
import pandas as pd POPULATION_FREQ_QUERY = """ query PopFreqs($variant_id: String!, $dataset: DatasetId!) { variant(variant_id: $variant_id, dataset: $dataset) { variant_id genome { populations { id ac an af homozygote_count } } } } """ ANCESTRY_LABELS = { "afr": "African/African American", "amr": "Admixed American", "eas": "East Asian", "fin": "Finnish", "nfe": "Non-Finnish European", "sas": "South Asian", "asj": "Ashkenazi Jewish", "mid": "Middle Eastern", "oth": "Other", } data = gnomad_query(POPULATION_FREQ_QUERY, { "variant_id": "1-55039974-G-T", "dataset": "gnomad_r4" }) pops = data["variant"]["genome"]["populations"] # Filter to top-level ancestry groups (exclude sex-specific) main_pops = [p for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0] df = pd.DataFrame(main_pops) df["label"] = df["id"].map(ANCESTRY_LABELS) df = df.sort_values("af", ascending=False) print(df[["label", "ac", "an", "af", "homozygote_count"]].to_string(index=False))
Query 4: Coverage Query
Retrieve per-base read depth coverage for a gene region to assess data completeness.
COVERAGE_QUERY = """ query Coverage($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!) { coverage(dataset: $dataset, chrom: $chrom, start: $start, stop: $stop) { pos mean median over_1 over_10 over_20 over_30 over_100 } } """ data = gnomad_query(COVERAGE_QUERY, { "chrom": "1", "start": 55039700, "stop": 55040200, "dataset": "gnomad_r4" }) cov = data["coverage"] print(f"Coverage positions retrieved: {len(cov)}") if cov: avg_mean = sum(c["mean"] for c in cov) / len(cov) pct_20x = sum(1 for c in cov if c["over_20"] > 0.9) / len(cov) * 100 print(f"Average mean depth: {avg_mean:.1f}x") print(f"Positions with >90% samples at >=20x: {pct_20x:.1f}%") # Example single position c = cov[0] print(f"\nPosition {c['pos']}: mean={c['mean']:.1f}x, median={c['median']}x") print(f" Fraction >=10x: {c['over_10']:.3f}, >=20x: {c['over_20']:.3f}, >=30x: {c['over_30']:.3f}")
Query 5: Gene Constraint
Retrieve gene-level constraint scores: pLI (probability of loss-of-function intolerance), LOEUF (LoF observed/expected upper bound fraction), and missense z-score.
CONSTRAINT_QUERY = """ query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gene_id gene_name gnomad_constraint { pLI pNull pRec mis_z lof { obs exp oe oe_ci { lower upper } } } } } """ genes = ["PCSK9", "BRCA1", "TP53", "TTN"] print(f"{'Gene':<10} {'pLI':>6} {'LOEUF':>7} {'mis_z':>7}") print("-" * 35) for gene in genes: data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"}) c = data["gene"]["gnomad_constraint"] loeuf = c["lof"]["oe_ci"]["upper"] print(f"{gene:<10} {c['pLI']:>6.3f} {loeuf:>7.3f} {c['mis_z']:>7.2f}") time.sleep(0.5) # Gene pLI LOEUF mis_z # PCSK9 0.855 0.543 2.11 # BRCA1 0.999 0.127 3.84 # TP53 0.993 0.191 5.21 # TTN 0.001 0.993 -2.40
Query 6: Variant Search by Region
Fetch all variants in a chromosomal region, useful for targeted panels and regional analyses.
REGION_VARIANTS_QUERY = """ query RegionVariants($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) { region(chrom: $chrom, start: $start, stop: $stop, reference_genome: $reference_genome) { variants(dataset: $dataset) { variant_id rsids pos consequence lof genome { af ac an faf95 { popmax } } } } } """ data = gnomad_query(REGION_VARIANTS_QUERY, { "chrom": "1", "start": 55039974, "stop": 55064852, # PCSK9 coding region "dataset": "gnomad_r4", "reference_genome": "GRCh38" }) variants = data["region"]["variants"] print(f"Variants in region: {len(variants)}") # Summarize by consequence from collections import Counter conseq_counts = Counter(v["consequence"] for v in variants if v["consequence"]) for c, n in conseq_counts.most_common(5): print(f" {c}: {n}") # Loss-of-function variants lof_vars = [v for v in variants if v["lof"] == "HC"] print(f"\nHigh-confidence LoF variants: {len(lof_vars)}") for v in lof_vars[:3]: af = v["genome"]["af"] if v["genome"] else None print(f" {v['variant_id']} | AF={af:.2e}" if af else f" {v['variant_id']} | AF=NA")
Key Concepts
gnomAD Data Model
gnomAD v4 has two datasets:
gnomad_r4 (exomes + genomes, GRCh38, 730K+ individuals) and gnomad_r2_1 (GRCh37, 141K individuals). The API uses a GraphQL schema where variants are accessed either through gene(), region(), or direct variant() lookups. Each variant has separate exome and genome frequency objects; the genome object is preferred for population frequency comparisons.
Ancestry Groups
gnomAD v4 reports frequencies for 9 top-level ancestry groups identified by genetic ancestry (not self-reported):
| Code | Population | Dataset size (approx) |
|---|---|---|
| African/African American | 76,000+ |
| Admixed American | 45,000+ |
| East Asian | 50,000+ |
| Finnish | 24,000+ |
| Non-Finnish European | 400,000+ |
| South Asian | 80,000+ |
| Ashkenazi Jewish | 10,000+ |
| Middle Eastern | 5,000+ |
| Other/Unknown | varies |
Filtering Allele Frequency (FAF95)
The
faf95 field provides a one-sided 95% confidence interval lower bound on the allele frequency in the population where the variant is most common. Use this for conservative variant filtering in clinical pipelines — a variant with faf95.popmax < 0.001 is likely rare enough to warrant clinical investigation.
Constraint Scores
| Score | Interpretation |
|---|---|
| Gene is intolerant to LoF — likely essential |
| Strong LoF constraint (upper CI of oe ratio) |
| Gene shows significant missense constraint |
| Gene tolerates LoF — homozygous LoF variants exist |
Common Workflows
Workflow 1: Rare Variant Frequency Report for a Gene
Goal: Retrieve all rare (AF < 1%) variants in a gene, stratified by consequence, exported to CSV.
import requests, time, pandas as pd GNOMAD_API = "https://gnomad.broadinstitute.org/api" def gnomad_query(query, variables=None): r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30) r.raise_for_status() result = r.json() if "errors" in result: raise ValueError(result["errors"]) return result["data"] GENE_VARIANTS_QUERY = """ query GeneVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gene_id gene_name variants(dataset: $dataset) { variant_id rsids chrom pos ref alt consequence lof lof_filter genome { an ac af faf95 { popmax popmax_population } populations { id ac an af } } } } } """ gene = "LDLR" data = gnomad_query(GENE_VARIANTS_QUERY, { "gene_symbol": gene, "reference_genome": "GRCh38", "dataset": "gnomad_r4" }) variants = data["gene"]["variants"] rows = [] for v in variants: g = v.get("genome") or {} af = g.get("af") if af is None or af >= 0.01: # keep only rare variants continue rows.append({ "variant_id": v["variant_id"], "rsids": ";".join(v.get("rsids") or []), "consequence": v.get("consequence"), "lof": v.get("lof"), "af_genome": af, "ac": g.get("ac"), "an": g.get("an"), "faf95_popmax": g.get("faf95", {}).get("popmax"), "faf95_pop": g.get("faf95", {}).get("popmax_population"), }) df = pd.DataFrame(rows) df = df.sort_values("af_genome") df.to_csv(f"{gene}_rare_variants.csv", index=False) print(f"{gene}: {len(variants)} total variants, {len(df)} rare (AF<1%)") print(df.groupby("consequence")["variant_id"].count().sort_values(ascending=False).head(6)) # LDLR: 2847 total variants, 2631 rare (AF<1%) # consequence # missense_variant 1423 # synonymous_variant 512 # splice_region_variant 231 # stop_gained 198
Workflow 2: Ancestry-Stratified Frequency Visualization
Goal: Query a list of variants and produce a barplot of allele frequencies by ancestry group.
import requests, time import pandas as pd import matplotlib.pyplot as plt GNOMAD_API = "https://gnomad.broadinstitute.org/api" def gnomad_query(query, variables=None): r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30) r.raise_for_status() result = r.json() if "errors" in result: raise ValueError(result["errors"]) return result["data"] POPULATION_FREQ_QUERY = """ query PopFreqs($variant_id: String!, $dataset: DatasetId!) { variant(variant_id: $variant_id, dataset: $dataset) { variant_id genome { populations { id ac an af } } } } """ ANCESTRY_LABELS = { "afr": "AFR", "amr": "AMR", "eas": "EAS", "fin": "FIN", "nfe": "NFE", "sas": "SAS", "asj": "ASJ", "mid": "MID", } variant_id = "1-55039974-G-T" # PCSK9 p.Tyr142Ter data = gnomad_query(POPULATION_FREQ_QUERY, { "variant_id": variant_id, "dataset": "gnomad_r4" }) pops = data["variant"]["genome"]["populations"] rows = [{"code": p["id"], "af": p["af"], "ac": p["ac"], "an": p["an"]} for p in pops if p["id"] in ANCESTRY_LABELS and p["an"] > 0] df = pd.DataFrame(rows) df["label"] = df["code"].map(ANCESTRY_LABELS) df = df.sort_values("af", ascending=False) fig, ax = plt.subplots(figsize=(9, 4)) bars = ax.bar(df["label"], df["af"] * 100, color="#4472C4", edgecolor="white") ax.bar_label(bars, fmt="%.3f%%", fontsize=8, padding=2) ax.set_xlabel("Ancestry Group") ax.set_ylabel("Allele Frequency (%)") ax.set_title(f"gnomAD v4 Population Frequencies\n{variant_id}") ax.set_ylim(0, df["af"].max() * 150) plt.tight_layout() plt.savefig("gnomad_pop_frequencies.png", dpi=150, bbox_inches="tight") print(f"Saved gnomad_pop_frequencies.png (n={len(df)} ancestry groups)") print(df[["label", "af", "ac", "an"]].to_string(index=False))
Workflow 3: Constraint-Guided Gene Prioritization
Goal: Score a gene list by constraint metrics and flag LoF-intolerant genes.
import requests, time, pandas as pd GNOMAD_API = "https://gnomad.broadinstitute.org/api" def gnomad_query(query, variables=None): r = requests.post(GNOMAD_API, json={"query": query, "variables": variables or {}}, timeout=30) r.raise_for_status() result = r.json() if "errors" in result: raise ValueError(result["errors"]) return result["data"] CONSTRAINT_QUERY = """ query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gene_id gene_name gnomad_constraint { pLI pNull pRec mis_z lof { obs exp oe oe_ci { lower upper } } } } } """ gene_list = ["BRCA1", "BRCA2", "PCSK9", "LDLR", "TTN", "CFTR", "HTT"] records = [] for gene in gene_list: try: data = gnomad_query(CONSTRAINT_QUERY, {"gene_symbol": gene, "reference_genome": "GRCh38"}) c = data["gene"]["gnomad_constraint"] records.append({ "gene": gene, "pLI": c["pLI"], "LOEUF": c["lof"]["oe_ci"]["upper"], "mis_z": c["mis_z"], "lof_obs": c["lof"]["obs"], "lof_exp": c["lof"]["exp"], "lof_oe": c["lof"]["oe"], }) except Exception as e: print(f"Warning: {gene} failed — {e}") time.sleep(0.5) df = pd.DataFrame(records).sort_values("LOEUF") df["lof_intolerant"] = df["pLI"] > 0.9 print(df[["gene", "pLI", "LOEUF", "mis_z", "lof_intolerant"]].to_string(index=False)) df.to_csv("constraint_scores.csv", index=False) print(f"\nLoF-intolerant genes: {df['lof_intolerant'].sum()}/{len(df)}")
Key Parameters
| Parameter | Function/Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
| All variant queries | — | , , | Dataset version (GRCh38 for r4/r3, GRCh37 for r2_1) |
| gene(), region() | — | , | Coordinate system; must match dataset |
| variant() | — | string | Identifies the specific variant to query |
| gene() | — | HGNC symbol string | Gene to retrieve; case-insensitive |
, , | region() | — | valid genomic coordinates | Region boundaries for region queries |
| variant() genome | — | float 0–1 | Filtering allele frequency (95% CI upper bound); use < 0.001 for rare |
filter field | gene() variants | — | (high-confidence), | LoF confidence level |
| genome.populations | — | , , , , , , , , | Per-ancestry frequency |
Best Practices
-
Use
for GRCh38 analyses: gnomAD v4 is the most current dataset with 730K+ individuals. Usegnomad_r4
only when comparing to GRCh37-based variant calls.gnomad_r2_1 -
Use
for clinical filtering, not overall AF: The filtering allele frequency accounts for maximum population stratification and provides a more conservative rarity estimate than the global AF.faf95.popmax -
Add
in batch loops: gnomAD has no published rate limits but the API is shared infrastructure. Polite delays prevent server-side throttling.time.sleep(0.5) -
Filter
for LoF burden analyses: Low-confidence LoF (lof == "HC"
) annotations are often in repetitive regions or may be sequencing artifacts. High-confidence ("LC"
) calls are filtered by LOFTEE."HC" -
Check AN before interpreting AF: Low allele number (AN) means poor coverage in that population. A zero or near-zero AF may reflect absent data, not true rarity. Cross-reference with the coverage query when AN is unexpectedly low.
Common Recipes
Recipe: Check if a Variant Is Common in Any Population
When to use: Quick check before clinical interpretation — confirm no ancestry group has AF > 1%.
import requests GNOMAD_API = "https://gnomad.broadinstitute.org/api" def is_common_in_any_population(variant_id, threshold=0.01, dataset="gnomad_r4"): query = """ query($variant_id: String!, $dataset: DatasetId!) { variant(variant_id: $variant_id, dataset: $dataset) { genome { faf95 { popmax popmax_population } af } } } """ r = requests.post(GNOMAD_API, json={"query": query, "variables": {"variant_id": variant_id, "dataset": dataset}}, timeout=15) data = r.json()["data"]["variant"] if not data or not data["genome"]: return None, "Variant not found in gnomAD" af = data["genome"]["af"] popmax = data["genome"]["faf95"]["popmax"] pop = data["genome"]["faf95"]["popmax_population"] is_common = (popmax or 0) >= threshold return is_common, f"overall AF={af:.2e}, FAF95 popmax={popmax:.2e} in {pop}" common, info = is_common_in_any_population("1-55039974-G-T") print(f"Common: {common} | {info}") # Common: False | overall AF=3.2e-05, FAF95 popmax=6.4e-05 in nfe
Recipe: Batch Constraint Lookup
When to use: Score multiple genes from a differential expression or GWAS gene list.
import requests, time, pandas as pd GNOMAD_API = "https://gnomad.broadinstitute.org/api" def get_constraint(gene_symbol, reference_genome="GRCh38"): query = """ query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { gnomad_constraint { pLI mis_z lof { oe_ci { upper } } } } } """ r = requests.post(GNOMAD_API, json={"query": query, "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}}, timeout=15) data = r.json().get("data", {}).get("gene", {}) if not data or not data.get("gnomad_constraint"): return None c = data["gnomad_constraint"] return {"gene": gene_symbol, "pLI": c["pLI"], "LOEUF": c["lof"]["oe_ci"]["upper"], "mis_z": c["mis_z"]} genes = ["BRCA1", "BRCA2", "ATM", "CHEK2", "PALB2"] rows = [r for g in genes for r in [get_constraint(g)] if r] time.sleep(0.5) # polite delay per gene in real loop df = pd.DataFrame(rows) print(df.to_string(index=False)) # gene pLI LOEUF mis_z # BRCA1 0.999 0.127 3.84 # BRCA2 1.000 0.176 3.21
Recipe: Export LoF Variants for CADD/ClinVar Cross-Reference
When to use: Get high-confidence LoF variants from gnomAD for downstream annotation.
import requests, pandas as pd GNOMAD_API = "https://gnomad.broadinstitute.org/api" def get_lof_variants(gene_symbol, dataset="gnomad_r4", max_af=0.001): query = """ query($gene_symbol: String!, $reference_genome: ReferenceGenomeId!, $dataset: DatasetId!) { gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) { variants(dataset: $dataset) { variant_id rsids chrom pos ref alt consequence lof genome { af ac an } } } } """ r = requests.post(GNOMAD_API, json={"query": query, "variables": {"gene_symbol": gene_symbol, "reference_genome": "GRCh38", "dataset": dataset}}, timeout=60) variants = r.json()["data"]["gene"]["variants"] lof = [v for v in variants if v.get("lof") == "HC" and v.get("genome") and v["genome"].get("af") is not None and v["genome"]["af"] < max_af] return pd.DataFrame([{ "variant_id": v["variant_id"], "rsids": ";".join(v.get("rsids") or []), "consequence": v["consequence"], "af": v["genome"]["af"], "ac": v["genome"]["ac"], } for v in lof]) df = get_lof_variants("CFTR", max_af=0.001) print(f"High-confidence LoF variants in CFTR (AF<0.1%): {len(df)}") print(df.head(5).to_string(index=False)) df.to_csv("CFTR_HC_lof_variants.csv", index=False)
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
from GraphQL | Invalid field name, wrong dataset ID, or null gene | Check field names match gnomAD v4 schema; use not |
Variant returns genome object | Variant only in exome data, not genome | Try accessing field instead of ; genome is absent for exome-only variants |
| Gene query returns empty variants list | Gene symbol not found or mismatch | Verify HGNC symbol (case-sensitive); use (ENSG ID) as fallback |
returns | Variant is absent or monomorphic in all populations | Check and — variant may have AC=0 or be filtered |
| Large gene (e.g., TTN) takes >30s | Increase ; for very large genes use region queries instead |
Population AF is for some groups | Variant not observed in that ancestry | Treat AF as 0 for filtering; check to confirm the group was sequenced |
mismatch error | Using GRCh37 coords with | Use for /; use only for |
Related Skills
— ClinVar pathogenicity classifications (complement to gnomAD population frequency data)clinvar-database
— GWAS Catalog for SNP-trait associations from published GWAS studiesgwas-database
— Ensembl VEP for variant consequence prediction and gene annotationensembl-database
— dbSNP for rsID lookup, variant classes, and cross-database ID mappingdbsnp-database
References
- gnomAD GraphQL API — Interactive GraphQL explorer and endpoint documentation
- Karczewski et al., Nature 2020 — gnomAD v2.1 flagship paper (constraint metrics, LoF analysis)
- gnomAD Help & FAQ — Data model, ancestry definitions, FAF95 explanation
- gnomAD v4 blog post — gnomAD v4 release notes and dataset composition