Claude-skill-registry bio-entrez-link

Find cross-references between NCBI databases using Biopython Bio.Entrez. Use when navigating from genes to proteins, sequences to publications, finding related records, or discovering database relationships.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/entrez-link" ~/.claude/skills/majiayu000-claude-skill-registry-bio-entrez-link && rm -rf "$T"
manifest: skills/data/entrez-link/SKILL.md
source content

Entrez Link

Navigate between NCBI databases using Biopython's Entrez module (ELink utility).

Required Setup

from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit

Core Function

Entrez.elink() - Cross-Database Links

Find related records in the same or different databases.

# Find proteins linked to a gene
handle = Entrez.elink(dbfrom='gene', db='protein', id='672')
record = Entrez.read(handle)
handle.close()

# Extract linked IDs
linkset = record[0]
if linkset['LinkSetDb']:
    links = linkset['LinkSetDb'][0]['Link']
    protein_ids = [link['Id'] for link in links]
    print(f"Found {len(protein_ids)} linked proteins")

Key Parameters:

ParameterDescriptionExample
dbfrom
Source database
'gene'
db
Target database
'protein'
id
Source record ID(s)
'672'
or
'672,675'
linkname
Specific link type
'gene_protein_refseq'
cmd
Link command
'neighbor'
,
'neighbor_score'

ELink Result Structure

record[0]                          # First linkset
record[0]['DbFrom']                # Source database
record[0]['IdList']                # Input IDs
record[0]['LinkSetDb']             # List of link results
record[0]['LinkSetDb'][0]['DbTo']  # Target database
record[0]['LinkSetDb'][0]['LinkName']  # Link name
record[0]['LinkSetDb'][0]['Link']  # List of linked records
record[0]['LinkSetDb'][0]['Link'][0]['Id']  # Linked ID

Common Link Paths

Gene to Other Databases

FromToLink NameDescription
geneprotein
gene_protein
All proteins
geneprotein
gene_protein_refseq
RefSeq proteins only
genenucleotide
gene_nuccore
Nucleotide sequences
genenucleotide
gene_nuccore_refseqrna
RefSeq mRNA
genepubmed
gene_pubmed
Related publications
genehomologene
gene_homologene
Homologs
genesnp
gene_snp
SNPs in gene
geneclinvar
gene_clinvar
Clinical variants

Nucleotide to Other Databases

FromToLink NameDescription
nucleotideprotein
nuccore_protein
Encoded proteins
nucleotidegene
nuccore_gene
Gene records
nucleotidepubmed
nuccore_pubmed
Publications
nucleotidetaxonomy
nuccore_taxonomy
Organism taxonomy
nucleotidebiosample
nuccore_biosample
Sample info
nucleotidesra
nuccore_sra
Related SRA data

Protein to Other Databases

FromToLink NameDescription
proteinnucleotide
protein_nuccore
Coding sequences
proteingene
protein_gene
Gene records
proteinpubmed
protein_pubmed
Publications
proteinstructure
protein_structure
3D structures
proteincdd
protein_cdd
Conserved domains

PubMed Links

FromToLink NameDescription
pubmedpubmed
pubmed_pubmed
Related articles
pubmedgene
pubmed_gene
Mentioned genes
pubmedprotein
pubmed_protein
Mentioned proteins
pubmednucleotide
pubmed_nuccore
Mentioned sequences

Code Patterns

Gene to Protein

from Bio import Entrez

Entrez.email = 'your.email@example.com'

def get_proteins_for_gene(gene_id):
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

protein_ids = get_proteins_for_gene('672')  # BRCA1
print(f"RefSeq proteins: {protein_ids[:5]}")

Nucleotide to Gene

def get_gene_for_nucleotide(nuc_id):
    handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return None
    return record[0]['LinkSetDb'][0]['Link'][0]['Id']

gene_id = get_gene_for_nucleotide('NM_007294')
print(f"Gene ID: {gene_id}")

Find Related PubMed Articles

def get_related_articles(pmid, max_results=10):
    handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    links = record[0]['LinkSetDb'][0]['Link']
    return [link['Id'] for link in links[:max_results]]

related = get_related_articles('35412348')
print(f"Related articles: {related}")

Get All Available Links

def discover_links(db, record_id):
    handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck')
    record = Entrez.read(handle)
    handle.close()

    links = {}
    for linkset in record[0].get('LinkSetDb', []):
        links[linkset['LinkName']] = linkset['DbTo']
    return links

available = discover_links('gene', '672')
for name, target in available.items():
    print(f"{name} -> {target}")

Navigate Gene -> Protein -> Structure

def gene_to_structures(gene_id):
    # Gene to protein
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]

    # Protein to structure
    handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
    record = Entrez.read(handle)
    handle.close()

    structure_ids = []
    for linkset in record:
        if linkset['LinkSetDb']:
            structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
    return structure_ids

structures = gene_to_structures('672')
print(f"Structure IDs: {structures[:5]}")

Link Multiple IDs at Once

def batch_link(dbfrom, db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)

    handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
    record = Entrez.read(handle)
    handle.close()

    # Returns one linkset per input ID
    results = {}
    for linkset in record:
        source_id = linkset['IdList'][0]
        linked_ids = []
        if linkset['LinkSetDb']:
            linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
        results[source_id] = linked_ids
    return results

results = batch_link('gene', 'protein', ['672', '675', '7157'])
for gene, proteins in results.items():
    print(f"Gene {gene}: {len(proteins)} proteins")

Get Publications for a Sequence

def get_sequence_publications(accession):
    # First get the GI/UID
    handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]')
    search = Entrez.read(handle)
    handle.close()

    if not search['IdList']:
        return []
    uid = search['IdList'][0]

    # Link to PubMed
    handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

pmids = get_sequence_publications('NM_007294')
print(f"PubMed IDs: {pmids[:5]}")

Link Commands

CommandDescription
neighbor
Default - get linked records
neighbor_score
Include relevance scores
neighbor_history
Store results in history
acheck
List all available links
ncheck
Check if any links exist
lcheck
Check specific link exists
llinks
Get URLs to Entrez links
prlinks
Get provider links (external)

Common Errors

ErrorCauseSolution
Empty
LinkSetDb
No links existCheck if record has linked data
HTTPError 400
Invalid ID or databaseVerify ID exists in source database
KeyError
Missing expected fieldCheck if
LinkSetDb
is empty first
Single linkset expected, got listMultiple input IDsIterate through record list

Decision Tree

Need to find related records?
├── Know what link you want?
│   └── Use elink with specific linkname
├── Discover what links exist?
│   └── Use elink with cmd='acheck'
├── Navigate to target database?
│   └── Use elink(dbfrom=X, db=Y, id=Z)
├── Find related records in same database?
│   └── Use elink(dbfrom=X, db=X) with neighbor
├── Chain multiple databases?
│   └── Call elink multiple times
└── Need the actual records?
    └── Use elink first, then efetch with IDs

Related Skills

  • entrez-search - Search databases before linking
  • entrez-fetch - Retrieve records after finding linked IDs
  • batch-downloads - Download many linked records efficiently