BioSkills bio-uniprot-access
Access UniProt protein database for sequences, annotations, and functional information. Use when retrieving protein data, GO terms, domain annotations, or protein-protein interactions.
git clone https://github.com/GPTomics/bioSkills
T=$(mktemp -d) && git clone --depth=1 https://github.com/GPTomics/bioSkills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/database-access/uniprot-access" ~/.claude/skills/gptomics-bioskills-bio-uniprot-access && rm -rf "$T"
database-access/uniprot-access/SKILL.mdVersion Compatibility
Reference examples tested with: BioPython 1.83+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
thenpip show <package>
to check signatureshelp(module.function)
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
UniProt Access
Query UniProt for protein sequences, functional annotations, and cross-references.
"Get protein information from UniProt" → Fetch sequences, GO terms, domains, and cross-references for a protein accession.
- Python:
(UniProt REST API)requests.get(f'https://rest.uniprot.org/uniprotkb/{acc}.json') - Python:
+ExPASy.get_sprot_raw(acc)
(BioPython)SwissProt.read()
UniProt REST API
Fetch Single Entry
import requests def fetch_uniprot(accession, format='fasta'): '''Fetch UniProt entry. Formats: fasta, json, txt, xml, gff''' url = f'https://rest.uniprot.org/uniprotkb/{accession}.{format}' response = requests.get(url) response.raise_for_status() return response.text sequence = fetch_uniprot('P53_HUMAN', 'fasta') entry_json = fetch_uniprot('P04637', 'json')
Search UniProt
Goal: Find UniProt protein entries matching gene name, organism, or functional criteria.
Approach: Query the UniProt REST search endpoint with structured query syntax and parse the JSON results for accessions and protein descriptions.
def search_uniprot(query, format='json', size=25): '''Search UniProt with query syntax''' url = 'https://rest.uniprot.org/uniprotkb/search' params = {'query': query, 'format': format, 'size': size} response = requests.get(url, params=params) response.raise_for_status() return response.json() if format == 'json' else response.text results = search_uniprot('gene:BRCA1 AND organism_id:9606') for entry in results['results']: print(entry['primaryAccession'], entry['proteinDescription']['recommendedName']['fullName']['value'])
Query Syntax
| Query | Description |
|---|---|
| Gene name |
| Human (NCBI taxonomy) |
| Swiss-Prot only |
| Sequence length range |
| GO term (apoptosis) |
| Keyword |
| Enzyme classification |
| Has PDB structure |
Combine Queries
# Human kinases with structures query = 'organism_id:9606 AND keyword:kinase AND database:pdb AND reviewed:true' results = search_uniprot(query, size=100)
Batch Retrieval
Multiple Accessions
def batch_fetch(accessions, format='fasta'): '''Fetch multiple entries''' url = 'https://rest.uniprot.org/uniprotkb/accessions' params = {'accessions': ','.join(accessions), 'format': format} response = requests.get(url, params=params) return response.text accessions = ['P04637', 'P53_HUMAN', 'Q9Y6K9'] sequences = batch_fetch(accessions)
Stream Large Results
def search_all(query, format='tsv', fields=None): '''Stream all results for large queries''' url = 'https://rest.uniprot.org/uniprotkb/stream' params = {'query': query, 'format': format} if fields: params['fields'] = ','.join(fields) response = requests.get(url, params=params, stream=True) return response.text # Get all human proteins as TSV all_human = search_all('organism_id:9606 AND reviewed:true', fields=['accession', 'gene_names', 'protein_name'])
ID Mapping
Map Between Databases
Goal: Convert identifiers between databases (e.g., Ensembl gene IDs to UniProt accessions) in batch.
Approach: Submit an asynchronous ID mapping job to the UniProt API, poll for completion, then retrieve the mapped results.
import time def map_ids(ids, from_db, to_db): '''Map IDs between databases''' url = 'https://rest.uniprot.org/idmapping/run' response = requests.post(url, data={'ids': ','.join(ids), 'from': from_db, 'to': to_db}) job_id = response.json()['jobId'] # Poll for results while True: status = requests.get(f'https://rest.uniprot.org/idmapping/status/{job_id}') if 'results' in status.json() or 'failedIds' in status.json(): break time.sleep(1) results = requests.get(f'https://rest.uniprot.org/idmapping/results/{job_id}') return results.json() # Ensembl gene IDs to UniProt mapping = map_ids(['ENSG00000141510', 'ENSG00000171862'], 'Ensembl', 'UniProtKB') for result in mapping['results']: print(result['from'], '->', result['to']['primaryAccession'])
Common Database Codes
| Code | Database |
|---|---|
| UniProt accessions |
| UniProt AC or ID |
| Ensembl gene ID |
| RefSeq protein |
| PDB ID |
| NCBI Gene ID |
| Gene symbols |
Extract Specific Data
Parse JSON Entry
Goal: Extract structured annotations (GO terms, domains, PDB structures) from a UniProt JSON entry.
Approach: Fetch the entry in JSON format and navigate the nested structure to pull accession, gene name, sequence, and cross-reference lists filtered by database type.
import json entry = json.loads(fetch_uniprot('P04637', 'json')) accession = entry['primaryAccession'] gene_name = entry['genes'][0]['geneName']['value'] protein_name = entry['proteinDescription']['recommendedName']['fullName']['value'] sequence = entry['sequence']['value'] length = entry['sequence']['length'] # GO terms go_terms = [ref for ref in entry.get('uniProtKBCrossReferences', []) if ref['database'] == 'GO'] # Domains (InterPro) domains = [ref for ref in entry.get('uniProtKBCrossReferences', []) if ref['database'] == 'InterPro'] # PDB structures pdb_refs = [ref for ref in entry.get('uniProtKBCrossReferences', []) if ref['database'] == 'PDB']
Get Specific Fields (TSV)
def get_fields(query, fields): '''Get specific fields as DataFrame''' import pandas as pd from io import StringIO url = 'https://rest.uniprot.org/uniprotkb/search' params = {'query': query, 'format': 'tsv', 'fields': ','.join(fields), 'size': 500} response = requests.get(url, params=params) return pd.read_csv(StringIO(response.text), sep='\t') df = get_fields('organism_id:9606 AND keyword:kinase AND reviewed:true', ['accession', 'gene_names', 'protein_name', 'length', 'go_p'])
Available Fields
| Field | Description |
|---|---|
| UniProt accession |
| Gene names |
| Protein name |
| Species |
| Sequence length |
| Molecular mass |
| GO biological process |
| GO cellular component |
| GO molecular function |
| PDB cross-references |
| Domain features |
| Binding sites |
Biopython Integration
from Bio import SeqIO from io import StringIO fasta_text = fetch_uniprot('P04637', 'fasta') record = SeqIO.read(StringIO(fasta_text), 'fasta') print(record.id, len(record.seq))
Related Skills
- database-access/entrez-fetch - NCBI protein access
- database-access/blast-searches - BLAST against UniProt
- structural-biology/structure-io - Download PDB structures
- structural-biology/alphafold-predictions - AlphaFold structures