SciAgent-Skills emdb-database
Search and retrieve cryo-EM density maps, fitted atomic models, and metadata from the Electron Microscopy Data Bank (EMDB) REST API. Query by keyword, resolution, method, or organism; fetch entry details, map download URLs, associated PDB models, and publications. No authentication required. For experimental atomic coordinates use pdb-database; for AlphaFold predicted structures use alphafold-database-access.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/structural-biology-drug-discovery/emdb-database" ~/.claude/skills/jaechang-hits-sciagent-skills-emdb-database && rm -rf "$T"
skills/structural-biology-drug-discovery/emdb-database/SKILL.mdEMDB Database
Overview
The Electron Microscopy Data Bank (EMDB) at EBI archives 3D electron microscopy density maps — primarily cryo-EM and cryo-ET maps — for macromolecular assemblies. It holds 30,000+ entries including ribosomes, membrane proteins, viruses, and large complexes not tractable by X-ray crystallography. The EMDB REST API at
https://www.ebi.ac.uk/emdb/api/ provides JSON responses for entry metadata, map download info, fitted atomic models (PDB IDs), and publications. No authentication or API key is required.
When to Use
- Finding cryo-EM density maps for a protein or complex by keyword (e.g., "spike protein", "ribosome 70S")
- Fetching the download URL for a
density file to use in local structure visualization (UCSF ChimeraX, PyMOL).map.gz - Identifying which PDB atomic models have been fitted into an EMDB map (and vice versa)
- Retrieving EMDB entry metadata — resolution, reconstruction method, fitted model count, and organism — for a literature search or database survey
- Searching for cryo-EM structures of a specific organism or filtered by resolution cutoff (e.g., < 3 Å)
- Linking EMDB maps to their primary publications for citation retrieval
- Use
instead when you need experimentally determined atomic coordinates (X-ray, NMR, or cryo-EM deposited with coordinates); EMDB provides the raw density map, PDB provides the atom positionspdb-database - For AlphaFold AI-predicted structures use
; EMDB is for experimental EM maps onlyalphafold-database-access
Prerequisites
- Python packages:
,requests
,pandasmatplotlib - Data requirements: EMDB entry IDs (format:
), keyword search strings, or PDB IDs for cross-referencingEMD-XXXX - Environment: internet connection; no API key required
- Rate limits: no official published limits; add
between requests in batch loops for polite accesstime.sleep(0.2)
pip install requests pandas matplotlib
Quick Start
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" # Search for cryo-EM maps of the SARS-CoV-2 spike protein response = requests.get(f"{EMDB_API}/search/", params={"q": "spike protein SARS-CoV-2"}, timeout=30) response.raise_for_status() results = response.json() hits = results.get("results", []) print(f"Total hits: {results.get('numFound', 0)}") for entry in hits[:5]: emdb_id = entry.get("emdbId", "") title = entry.get("title", "") resol = entry.get("resolution", "?") print(f" {emdb_id}: {title[:60]} ({resol} Å)") # EMD-30210: SARS-CoV-2 spike protein in the prefusion conf... (3.46 Å) # EMD-22221: SARS-CoV-2 spike protein glycoprotein structure... (2.8 Å)
Core API
Query 1: Full-Text Search
Search EMDB entries by keyword. Returns a paginated result list with basic metadata for each hit.
import requests import pandas as pd EMDB_API = "https://www.ebi.ac.uk/emdb/api" def emdb_search(query: str, rows: int = 20, start: int = 0) -> dict: """Full-text search of EMDB entries. Returns JSON response.""" params = {"q": query, "rows": rows, "start": start} r = requests.get(f"{EMDB_API}/search/", params=params, timeout=30) r.raise_for_status() return r.json() data = emdb_search("ribosome 70S bacterial", rows=10) print(f"Total entries found: {data.get('numFound', 0)}") rows = [] for entry in data.get("results", []): rows.append({ "emdb_id": entry.get("emdbId"), "title": entry.get("title", "")[:80], "resolution": entry.get("resolution"), "method": entry.get("imageAcquisition", {}).get("imagingMethod", ""), "organism": entry.get("organism", ""), }) df = pd.DataFrame(rows) print(df.to_string(index=False))
# Search with resolution filter using pandas post-filtering data = emdb_search("membrane protein", rows=50) rows = [] for entry in data.get("results", []): resol = entry.get("resolution") if resol is not None and resol <= 3.0: rows.append({ "emdb_id": entry.get("emdbId"), "title": entry.get("title", "")[:70], "resolution": resol, }) df_highres = pd.DataFrame(rows).sort_values("resolution") if rows else pd.DataFrame() print(f"High-resolution membrane protein maps (≤3.0 Å): {len(df_highres)}") if not df_highres.empty: print(df_highres.head(5).to_string(index=False))
Query 2: Entry Details
Retrieve full metadata for a single EMDB entry by its ID (e.g.,
EMD-1234).
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def get_entry(emdb_id: str) -> dict: """Fetch full metadata for a single EMDB entry. emdb_id e.g. 'EMD-1234'.""" r = requests.get(f"{EMDB_API}/entry/{emdb_id}", timeout=30) r.raise_for_status() return r.json() entry = get_entry("EMD-30210") # SARS-CoV-2 spike # Navigate the nested JSON header = entry.get("map", {}).get("header", {}) title = header.get("title", "") deposited = header.get("depositionDate", "") print(f"Entry: EMD-30210") print(f"Title: {title}") print(f"Deposited: {deposited}") # Resolution resol_block = entry.get("processing", {}).get("reconstruction", {}).get("resolutionByAuthor", "") print(f"Resolution: {resol_block}") # Sample organism sample = entry.get("sample", {}) name_block = sample.get("name", "") print(f"Sample: {name_block}")
Query 3: Map Download Information
Retrieve the download URL and file format for the associated
.map.gz density file.
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def get_map_info(emdb_id: str) -> dict: """Retrieve map file download metadata for an EMDB entry.""" r = requests.get(f"{EMDB_API}/entry/{emdb_id}/map", timeout=30) r.raise_for_status() return r.json() map_info = get_map_info("EMD-30210") print("Map download info:") for item in map_info if isinstance(map_info, list) else [map_info]: file_url = item.get("url", "") file_size = item.get("size", "") format_ = item.get("format", "") print(f" URL: {file_url}") print(f" Format: {format_} | Size: {file_size}") # Construct standard download URL manually (always available) num = "30210" # numeric part of EMD-30210 standard_url = f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-{num}/map/emd_{num}.map.gz" print(f"\nFTP map URL: {standard_url}")
Query 4: Fitted Atomic Models (PDB Cross-Reference)
List the PDB IDs of atomic models that have been fitted into this EM map.
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def get_fitted_models(emdb_id: str) -> list: """Return list of PDB IDs fitted to the EMDB map.""" r = requests.get(f"{EMDB_API}/entry/{emdb_id}/fitted", timeout=30) r.raise_for_status() data = r.json() if isinstance(data, list): return data return data.get("fittedModels", []) models = get_fitted_models("EMD-30210") print(f"Fitted PDB models for EMD-30210: {len(models)}") for m in models: pdb_id = m.get("pdbId") if isinstance(m, dict) else m print(f" PDB: {pdb_id}")
# Reverse lookup: given a PDB ID, find associated EMDB entries via search import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def find_emdb_for_pdb(pdb_id: str) -> list: """Search EMDB for entries associated with a PDB ID.""" r = requests.get(f"{EMDB_API}/search/", params={"q": pdb_id, "rows": 10}, timeout=30) r.raise_for_status() results = r.json().get("results", []) return [e.get("emdbId") for e in results if e.get("emdbId")] pdb_id = "7BNM" # SARS-CoV-2 spike structure associated = find_emdb_for_pdb(pdb_id) print(f"EMDB entries associated with PDB {pdb_id}: {associated}")
Query 5: Publications
Retrieve primary publications (citations) linked to an EMDB entry.
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def get_publications(emdb_id: str) -> list: """Retrieve publications associated with an EMDB entry.""" r = requests.get(f"{EMDB_API}/entry/{emdb_id}/publications", timeout=30) r.raise_for_status() data = r.json() if isinstance(data, list): return data return data.get("publications", []) pubs = get_publications("EMD-30210") print(f"Publications for EMD-30210: {len(pubs)}") for pub in pubs: title = pub.get("title", "") doi = pub.get("doi", "") year = pub.get("year", "") print(f" [{year}] {title[:70]}") if doi: print(f" DOI: {doi}")
Query 6: Overall Statistics
Retrieve aggregate EMDB database statistics — total entry count, resolution distribution, method breakdown.
import requests EMDB_API = "https://www.ebi.ac.uk/emdb/api" def get_statistics() -> dict: """Retrieve overall EMDB database statistics.""" r = requests.get(f"{EMDB_API}/statistics/", timeout=30) r.raise_for_status() return r.json() stats = get_statistics() print("EMDB Database Statistics:") total = stats.get("totalEntries", stats.get("total", "n/a")) print(f" Total entries: {total}") # Method breakdown if available methods = stats.get("methods", stats.get("imagingMethods", {})) if methods: print(" By method:") for method, count in sorted(methods.items(), key=lambda x: -x[1] if isinstance(x[1], int) else 0): print(f" {method}: {count}")
Query 7: Visualization — Resolution Distribution
Plot the resolution distribution of a set of EMDB search results.
import requests import matplotlib.pyplot as plt EMDB_API = "https://www.ebi.ac.uk/emdb/api" # Fetch 200 entries for visualization r = requests.get(f"{EMDB_API}/search/", params={"q": "cryo-EM", "rows": 200}, timeout=60) r.raise_for_status() results = r.json().get("results", []) resolutions = [ entry["resolution"] for entry in results if entry.get("resolution") is not None and 1.0 <= entry["resolution"] <= 10.0 ] fig, ax = plt.subplots(figsize=(8, 4)) ax.hist(resolutions, bins=30, color="#2c7fb8", edgecolor="white", alpha=0.85) ax.axvline(x=3.0, color="#d62728", lw=1.5, ls="--", label="3 Å threshold") ax.set_xlabel("Resolution (Å)") ax.set_ylabel("Number of entries") ax.set_title(f"EMDB Resolution Distribution (n={len(resolutions)})") ax.legend() plt.tight_layout() plt.savefig("emdb_resolution_distribution.png", dpi=150, bbox_inches="tight") print(f"Saved emdb_resolution_distribution.png ({len(resolutions)} entries)") below3 = sum(1 for r in resolutions if r <= 3.0) print(f"Entries at ≤3.0 Å: {below3}/{len(resolutions)} ({below3/len(resolutions)*100:.1f}%)")
Key Concepts
EMDB ID Format
EMDB IDs follow the pattern
EMD-XXXX (e.g., EMD-1234, EMD-30210). The numeric part is used in FTP paths. The API accepts both EMD-1234 and 1234 in most endpoints. FTP download paths use zero-padded 4-digit numbers for older entries.
Map vs. Atomic Model
An EMDB entry holds the raw electron density map (
.map or .map.gz, in MRC/CCP4 format) — a 3D voxel grid of electron scattering density. The fitted atomic model (PDB entry) is a separate record with ATOM/HETATM coordinates interpreted from the map. Many maps have multiple fitted models from different groups; some maps have none (primary data without model deposition).
Resolution and Quality
| Resolution | Typical interpretability |
|---|---|
| < 2.5 Å | Near-atomic: side-chain positions visible |
| 2.5–3.5 Å | High-res: backbone well-resolved, some side chains |
| 3.5–5.0 Å | Medium: secondary structure clear, limited side-chain detail |
| > 5.0 Å | Low-res: domain arrangement only |
Use the
resolution field from search results to filter for structures appropriate for your analysis task.
Common Workflows
Workflow 1: Survey All High-Resolution Entries for a Target
Goal: Find all EMDB maps for a protein target with resolution ≤ 3.5 Å, export to CSV with PDB model cross-references.
import requests import time import pandas as pd EMDB_API = "https://www.ebi.ac.uk/emdb/api" def emdb_search_all(query: str, rows_per_page: int = 50) -> list: """Paginate through all search results.""" all_results = [] start = 0 while True: r = requests.get(f"{EMDB_API}/search/", params={"q": query, "rows": rows_per_page, "start": start}, timeout=30) r.raise_for_status() data = r.json() batch = data.get("results", []) all_results.extend(batch) if start + rows_per_page >= data.get("numFound", 0) or not batch: break start += rows_per_page time.sleep(0.2) return all_results target = "ACE2" print(f"Searching EMDB for: {target}") entries = emdb_search_all(target) print(f"Total entries: {len(entries)}") rows = [] for entry in entries: resol = entry.get("resolution") if resol is None or resol > 3.5: continue emdb_id = entry.get("emdbId", "") # Fetch fitted PDB models try: r2 = requests.get(f"{EMDB_API}/entry/{emdb_id}/fitted", timeout=15) models = r2.json() if r2.status_code == 200 else [] pdb_ids = [m.get("pdbId") if isinstance(m, dict) else str(m) for m in (models if isinstance(models, list) else [])] except Exception: pdb_ids = [] time.sleep(0.2) rows.append({ "emdb_id": emdb_id, "title": entry.get("title", "")[:80], "resolution": resol, "pdb_models": ";".join(pdb_ids) if pdb_ids else "", "organism": entry.get("organism", ""), }) df = pd.DataFrame(rows).sort_values("resolution") df.to_csv(f"{target}_emdb_highres.csv", index=False) print(f"High-res entries (≤3.5 Å): {len(df)}") print(df[["emdb_id", "resolution", "pdb_models", "title"]].head(8).to_string(index=False))
Workflow 2: Batch Metadata Collection from Entry ID List
Goal: Given a list of EMDB IDs from a literature search, fetch structured metadata and build a summary table.
import requests import time import pandas as pd EMDB_API = "https://www.ebi.ac.uk/emdb/api" emdb_ids = ["EMD-30210", "EMD-22221", "EMD-23970", "EMD-13731", "EMD-14127"] records = [] for emdb_id in emdb_ids: try: r = requests.get(f"{EMDB_API}/entry/{emdb_id}", timeout=20) r.raise_for_status() entry = r.json() header = entry.get("map", {}).get("header", {}) processing = entry.get("processing", {}) recon = processing.get("reconstruction", {}) records.append({ "emdb_id": emdb_id, "title": header.get("title", "")[:80], "deposited": header.get("depositionDate", ""), "resolution": recon.get("resolutionByAuthor", ""), "software": recon.get("software", {}).get("name", "") if isinstance(recon.get("software"), dict) else "", }) except Exception as e: print(f"Warning: {emdb_id} failed — {e}") time.sleep(0.2) df = pd.DataFrame(records) print(df.to_string(index=False)) df.to_csv("emdb_batch_metadata.csv", index=False) print(f"\nSaved emdb_batch_metadata.csv ({len(df)} entries)")
Workflow 3: Download a Density Map File
Goal: Download an EMDB
.map.gz file programmatically for use in ChimeraX or PyMOL.
import requests from pathlib import Path def download_emdb_map(emdb_id: str, output_dir: str = ".") -> str: """ Download an EMDB map file (.map.gz) via FTP. Returns the path to the downloaded file. """ num = emdb_id.replace("EMD-", "").replace("emd-", "").lstrip("0") or "0" num_padded = num.zfill(4) if len(num) < 4 else num url = (f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/" f"EMD-{num_padded}/map/emd_{num_padded}.map.gz") out_path = Path(output_dir) / f"emd_{num_padded}.map.gz" print(f"Downloading {emdb_id} map from EBI FTP...") print(f" URL: {url}") r = requests.get(url, stream=True, timeout=120) r.raise_for_status() total_mb = int(r.headers.get("content-length", 0)) / 1e6 downloaded = 0 with open(out_path, "wb") as f: for chunk in r.iter_content(chunk_size=1024 * 1024): f.write(chunk) downloaded += len(chunk) print(f" Saved: {out_path} ({downloaded/1e6:.1f} MB)") return str(out_path) # Example: download spike protein map path = download_emdb_map("EMD-30210", output_dir="/tmp") print(f"Map file: {path}") print("Open in ChimeraX with: open /tmp/emd_30210.map.gz")
Key Parameters
| Parameter | Function/Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
| | — | Any keyword string | Full-text search query; supports boolean and phrase matching |
| | | – | Number of results per page |
| | | – | Pagination offset for large result sets |
| | — | format | Specific entry identifier |
| Result field | — | float (Å) | Filter post-query by threshold |
| Result field | — | , | Method filter; applies via pandas post-fetch |
| Result field | — | organism name string | Organism filter; match with |
Best Practices
-
Use the FTP endpoint for large map files: The REST API provides metadata; the actual
files are served via the EBI FTP. Construct the FTP URL as.map.gz
.https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-XXXX/map/emd_XXXX.map.gznum = "30210" url = f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-{num}/map/emd_{num}.map.gz" -
Add
in batch loops: The EMDB REST API is shared infrastructure with no published rate limits. Polite delays prevent throttling.time.sleep(0.2) -
Filter by resolution post-query: The
endpoint does not support server-side numeric range filtering. Fetch a larger/search/
value and filter therows
field with pandas locally.resolution -
Cross-reference via both directions: An EMDB entry can have 0–10+ fitted PDB models. Always check
for the definitive PDB list; keyword search alone may miss older depositions./entry/{emdb_id}/fitted -
Check for
resolution: Some cryo-ET and subtomogram averages lack a numeric resolution estimate. Guard withNone
before numeric comparisons.if entry.get("resolution") is not None
Common Recipes
Recipe: Get All PDB Models Fitted to a Map
When to use: You have an EMDB ID and want to load all associated atomic coordinates.
import requests def get_pdb_ids_for_emdb(emdb_id: str) -> list: """Return list of PDB IDs fitted into the given EMDB map.""" r = requests.get(f"https://www.ebi.ac.uk/emdb/api/entry/{emdb_id}/fitted", timeout=15) if r.status_code != 200: return [] data = r.json() models = data if isinstance(data, list) else data.get("fittedModels", []) return [m.get("pdbId") if isinstance(m, dict) else str(m) for m in models] pdb_ids = get_pdb_ids_for_emdb("EMD-30210") print(f"PDB models for EMD-30210: {pdb_ids}") # PDB models for EMD-30210: ['7BNM', '7BNN']
Recipe: Batch Resolution Summary for a Gene List
When to use: Survey EMDB coverage and resolution for a list of protein targets.
import requests import time import pandas as pd EMDB_API = "https://www.ebi.ac.uk/emdb/api" targets = ["KRAS", "EGFR", "ACE2", "p53", "mTOR"] rows = [] for target in targets: r = requests.get(f"{EMDB_API}/search/", params={"q": target, "rows": 100}, timeout=30) r.raise_for_status() entries = r.json().get("results", []) resolutions = [e["resolution"] for e in entries if e.get("resolution") is not None] rows.append({ "target": target, "n_entries": len(entries), "best_resol": min(resolutions) if resolutions else None, "mean_resol": round(sum(resolutions)/len(resolutions), 2) if resolutions else None, }) time.sleep(0.3) df = pd.DataFrame(rows) print(df.to_string(index=False)) # target n_entries best_resol mean_resol # KRAS 12 2.19 3.84 # EGFR 31 2.60 4.12 # ACE2 45 2.05 3.27
Recipe: Find All Entries Below a Resolution Cutoff
When to use: Build a benchmark set of high-resolution cryo-EM structures for a specific system.
import requests import pandas as pd EMDB_API = "https://www.ebi.ac.uk/emdb/api" def find_highres_entries(query: str, resolution_cutoff: float = 3.0, max_results: int = 200) -> pd.DataFrame: r = requests.get(f"{EMDB_API}/search/", params={"q": query, "rows": max_results}, timeout=60) r.raise_for_status() entries = r.json().get("results", []) rows = [] for e in entries: resol = e.get("resolution") if resol is not None and resol <= resolution_cutoff: rows.append({ "emdb_id": e.get("emdbId"), "resolution": resol, "title": e.get("title", "")[:70], "organism": e.get("organism", ""), }) return pd.DataFrame(rows).sort_values("resolution") if rows else pd.DataFrame() df = find_highres_entries("ion channel", resolution_cutoff=3.0) print(f"Ion channel maps at ≤3.0 Å: {len(df)}") print(df.head(5).to_string(index=False)) df.to_csv("ion_channel_highres_emdb.csv", index=False)
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
for | Entry ID not found or wrong format | Verify format is with correct numeric suffix; confirm entry exists on https://www.ebi.ac.uk/emdb/ |
Empty list | Query too specific or misspelled | Broaden the search term; try the gene name alone without qualifiers |
field is | Cryo-ET or subtomogram averaging entries without reported resolution | Skip with ; these are valid entries |
FTP download returns | Wrong numeric padding in FTP path | Use the raw number without leading zeros for EMD-XXXX where XXXX is < 4 digits; verify the path at https://ftp.ebi.ac.uk/pub/databases/emdb/structures/ |
| Fitted models list is empty | Map has no associated deposited PDB model | Some authors deposit maps without atomic models; cross-reference by keyword search with the EMDB title |
| Slow search for common terms | Large result sets | Limit to a manageable number (50–200) and filter post-fetch; avoid open-ended queries like |
/ | Network issue or server overload | Retry with exponential backoff; increase to 60s for large requests |
Related Skills
— RCSB PDB REST API for experimental atomic coordinates; complement to EMDB mapspdb-database
— AlphaFold predicted structures (200M+ proteins), no EM mapalphafold-database-access
— Retrieve publications by DOI or PMID retrieved from EMDB publications endpointpubmed-database
— Analyze MD trajectories of structures initially determined by cryo-EMmdanalysis-trajectory
References
- EMDB website — Browse entries, access documentation, and download maps via web interface
- EMDB REST API documentation — Endpoint reference and JSON schema for all API routes
- Lawson et al., Nucleic Acids Res. 2016 — EMDB database description and content overview
- EMDB FTP archive — Direct download of
density files and XML metadata.map.gz