Medical-research-skills uniprot-database
Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/uniprot-database" ~/.claude/skills/aipoch-medical-research-skills-uniprot-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/uniprot-database/SKILL.mdsource content
When to Use
- You need to search UniProtKB with Lucene-style queries (e.g., by gene name, organism, reviewed status).
- You want to fetch the full details of a specific protein entry by UniProt accession (e.g.,
).P12345 - You need to map identifiers between databases (e.g., gene names, Ensembl IDs, RefSeq IDs ↔ UniProt accessions).
- You are building pipelines that require automated protein annotation retrieval in JSON/TSV/FASTA formats.
- You need a lightweight client that talks directly to UniProt’s REST API without additional SDKs.
Key Features
- Protein search via UniProtKB REST endpoint using Lucene query syntax.
- Entry retrieval by accession with selectable output formats.
- Identifier mapping between supported source/target databases using UniProt ID mapping service.
- Format control (default
) for consistent downstream parsing.json - Reference docs for query syntax and available API fields:
references/query_syntax.mdreferences/api_fields.md
Dependencies
- Python
>=3.8 requests >=2.31.0
Example Usage
import time import requests BASE = "https://rest.uniprot.org" def search_protein(query: str, fmt: str = "json", size: int = 5): """ Search UniProtKB using Lucene-style query syntax. """ url = f"{BASE}/uniprotkb/search" params = {"query": query, "format": fmt, "size": size} r = requests.get(url, params=params, timeout=30) r.raise_for_status() return r.json() if fmt == "json" else r.text def retrieve_entry(accession: str, fmt: str = "json"): """ Retrieve a UniProtKB entry by accession. """ url = f"{BASE}/uniprotkb/{accession}" params = {"format": fmt} r = requests.get(url, params=params, timeout=30) r.raise_for_status() return r.json() if fmt == "json" else r.text def id_mapping(from_db: str, to_db: str, ids, poll_interval_s: float = 1.0): """ Map identifiers using UniProt ID Mapping. ids can be a list of strings or a comma-separated string. """ if isinstance(ids, (list, tuple)): ids = ",".join(ids) # 1) Submit mapping job submit_url = f"{BASE}/idmapping/run" r = requests.post( submit_url, data={"from": from_db, "to": to_db, "ids": ids}, timeout=30, ) r.raise_for_status() job_id = r.json()["jobId"] # 2) Poll job status status_url = f"{BASE}/idmapping/status/{job_id}" while True: s = requests.get(status_url, timeout=30) s.raise_for_status() payload = s.json() if payload.get("jobStatus") in (None, "FINISHED"): break if payload.get("jobStatus") == "FAILED": raise RuntimeError(f"ID mapping failed: {payload}") time.sleep(poll_interval_s) # 3) Fetch results (JSON) results_url = f"{BASE}/idmapping/results/{job_id}" res = requests.get(results_url, params={"format": "json"}, timeout=30) res.raise_for_status() return res.json() if __name__ == "__main__": # Search example: human BRCA1 search = search_protein("gene:BRCA1 AND organism_id:9606", size=3) print("Search results (first accessions):", [item["primaryAccession"] for item in search.get("results", [])]) # Retrieve entry example entry = retrieve_entry("P38398") # UniProt accession for human BRCA1 (example) print("Entry primaryAccession:", entry.get("primaryAccession")) print("Protein name:", entry.get("proteinDescription", {}).get("recommendedName", {}).get("fullName", {}).get("value")) # ID mapping example: gene name -> UniProtKB mapping = id_mapping(from_db="Gene_Name", to_db="UniProtKB", ids=["BRCA1"]) print("Mapping results keys:", mapping.keys())
Implementation Details
-
Search Protein
- Uses
GET /uniprotkb/search - Key parameters:
: Lucene-style query string (seequery
)references/query_syntax.md
: output format (defaultformat
)json- Optional common parameters:
,size
,fieldssort
- Returns parsed JSON when
, otherwise raw text.format=json
- Uses
-
Retrieve Entry
- Uses
GET /uniprotkb/{accession} - Key parameters:
: UniProt accession (e.g.,accession
)P12345
: output format (defaultformat
)json
- Suitable for fetching full record details for a known accession.
- Uses
-
ID Mapping
- Uses UniProt asynchronous mapping workflow:
withPOST /idmapping/run
,from
,toids- Poll
until finishedGET /idmapping/status/{jobId} - Fetch
GET /idmapping/results/{jobId}?format=json
accepts either a list or a comma-separated string.ids- Recommended parameters:
: controls polling frequency to avoid excessive requests.poll_interval_s
/from_db
must match UniProt-supported database identifiers (consult UniProt mapping documentation as needed).to_db
- Uses UniProt asynchronous mapping workflow: