install
source · Clone the upstream repo
git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-entrez-fetch" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-entrez-fetch && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-entrez-fetch" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-entrez-fetch && rm -rf "$T"
manifest:
skills/bio-entrez-fetch/SKILL.mdsafety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- references API keys
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-entrez-fetch description: Retrieve records from NCBI databases using Biopython Bio.Entrez. Use when downloading sequences, fetching GenBank records, getting document summaries, or parsing NCBI data into Biopython objects. tool_type: python primary_tool: Bio.Entrez measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Entrez Fetch
Retrieve records from NCBI databases using Biopython's Entrez module (EFetch, ESummary utilities).
Required Setup
from Bio import Entrez Entrez.email = 'your.email@example.com' # Required by NCBI Entrez.api_key = 'your_api_key' # Optional, raises rate limit 3->10 req/sec
Core Functions
Entrez.efetch() - Retrieve Full Records
Fetch complete records in various formats from any NCBI database.
# Fetch GenBank record by ID handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text') genbank_text = handle.read() handle.close() # Fetch FASTA sequence handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text') fasta_text = handle.read() handle.close() # Fetch multiple records handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059', rettype='fasta', retmode='text')
Key Parameters:
| Parameter | Description | Example |
|---|---|---|
| Database name | , , |
| Record ID(s) | or |
| Return type | , , |
| Return mode | , |
| Start index | |
| Max records | |
| History server session | From esearch |
| History server query | From esearch |
Common Return Types by Database
Nucleotide/Protein:
| rettype | retmode | Description |
|---|---|---|
| | FASTA sequence |
| | GenBank flat file |
| | GenPept flat file (protein) |
| | GenBank with contig sequences |
| | Seq-id only |
| | Accession only |
PubMed:
| rettype | retmode | Description |
|---|---|---|
| | Abstract text |
| | MEDLINE format |
| | Full PubMed XML |
Gene:
| rettype | retmode | Description |
|---|---|---|
| | Gene table format |
| | Full gene XML |
Entrez.esummary() - Document Summaries
Get brief summaries without downloading full records. Faster than efetch.
# Get summary for nucleotide record handle = Entrez.esummary(db='nucleotide', id='NM_007294') record = Entrez.read(handle) handle.close() summary = record[0] # First (only) record print(f"Title: {summary['Title']}") print(f"Length: {summary['Length']}") print(f"Organism: {summary['Organism']}")
Common Summary Fields:
# Nucleotide/Protein summary['Title'] # Record title/description summary['Caption'] # Short identifier summary['Length'] # Sequence length summary['Organism'] # Source organism summary['TaxId'] # Taxonomy ID summary['AccessionVersion'] # Full accession.version # PubMed summary['Title'] # Article title summary['AuthorList'] # Authors summary['Source'] # Journal summary['PubDate'] # Publication date summary['DOI'] # Digital Object Identifier
Parsing with Biopython
Parse into SeqRecord Objects
from Bio import Entrez, SeqIO Entrez.email = 'your.email@example.com' # Parse GenBank into SeqRecord handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='gb', retmode='text') record = SeqIO.read(handle, 'genbank') handle.close() print(f"ID: {record.id}") print(f"Length: {len(record.seq)}") print(f"Features: {len(record.features)}") # Parse FASTA into SeqRecord handle = Entrez.efetch(db='nucleotide', id='NM_007294', rettype='fasta', retmode='text') record = SeqIO.read(handle, 'fasta') handle.close()
Parse Multiple Records
# Fetch multiple as FASTA handle = Entrez.efetch(db='nucleotide', id='NM_007294,NM_000059,NM_000546', rettype='fasta', retmode='text') records = list(SeqIO.parse(handle, 'fasta')) handle.close() for record in records: print(f"{record.id}: {len(record.seq)} bp")
Parse XML with Entrez.read()
# For structured data, use XML mode handle = Entrez.efetch(db='gene', id='672', retmode='xml') records = Entrez.read(handle) handle.close() # Navigate nested structure gene = records[0] print(f"Gene: {gene['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']}")
Code Patterns
Fetch Sequence by Accession
from Bio import Entrez, SeqIO Entrez.email = 'your.email@example.com' def fetch_sequence(accession, db='nucleotide'): handle = Entrez.efetch(db=db, id=accession, rettype='fasta', retmode='text') record = SeqIO.read(handle, 'fasta') handle.close() return record seq = fetch_sequence('NM_007294') print(f"{seq.id}: {seq.seq[:50]}...")
Fetch GenBank with Features
def fetch_genbank(accession): handle = Entrez.efetch(db='nucleotide', id=accession, rettype='gb', retmode='text') record = SeqIO.read(handle, 'genbank') handle.close() return record gb = fetch_genbank('NM_007294') for feature in gb.features: if feature.type == 'CDS': print(f"CDS: {feature.location}") print(f"Product: {feature.qualifiers.get('product', ['?'])[0]}")
Fetch PubMed Abstract
def fetch_abstract(pmid): handle = Entrez.efetch(db='pubmed', id=pmid, rettype='abstract', retmode='text') abstract = handle.read() handle.close() return abstract abstract = fetch_abstract('35412348') print(abstract)
Get Record Summaries
def get_summaries(db, ids): if isinstance(ids, list): ids = ','.join(ids) handle = Entrez.esummary(db=db, id=ids) records = Entrez.read(handle) handle.close() return records summaries = get_summaries('nucleotide', ['NM_007294', 'NM_000059']) for s in summaries: print(f"{s['Caption']}: {s['Title'][:50]}... ({s['Length']} bp)")
Search Then Fetch
# Search for records handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND insulin[gene] AND mRNA[fkey]', retmax=5) search_results = Entrez.read(handle) handle.close() ids = search_results['IdList'] # Fetch the sequences handle = Entrez.efetch(db='nucleotide', id=','.join(ids), rettype='fasta', retmode='text') records = list(SeqIO.parse(handle, 'fasta')) handle.close() for record in records: print(f"{record.id}: {len(record.seq)} bp")
Fetch Protein by Gene ID
# Search gene database handle = Entrez.esearch(db='gene', term='BRCA1[sym] AND human[orgn]') result = Entrez.read(handle) handle.close() gene_id = result['IdList'][0] # Get linked protein IDs handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id) links = Entrez.read(handle) handle.close() protein_ids = [link['Id'] for link in links[0]['LinkSetDb'][0]['Link'][:3]] # Fetch proteins handle = Entrez.efetch(db='protein', id=','.join(protein_ids), rettype='fasta', retmode='text') proteins = list(SeqIO.parse(handle, 'fasta')) handle.close()
Save Fetched Records to File
def download_sequences(ids, output_file, db='nucleotide', format='fasta'): handle = Entrez.efetch(db=db, id=','.join(ids), rettype=format, retmode='text') with open(output_file, 'w') as out: out.write(handle.read()) handle.close() download_sequences(['NM_007294', 'NM_000059'], 'brca_genes.fasta')
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Invalid ID or parameters | Verify ID exists, check rettype |
| Rate limit exceeded | Add delays or use API key |
| Empty result | Record doesn't exist | Verify accession in web browser |
in SeqIO | Wrong format specified | Match rettype with SeqIO format |
| XML parsing error | Use instead |
Decision Tree
Need to retrieve NCBI records? ├── Need full sequence? │ └── Use efetch with rettype='fasta' ├── Need sequence + annotations? │ └── Use efetch with rettype='gb' (GenBank) ├── Just need metadata (length, organism)? │ └── Use esummary (faster) ├── Need PubMed abstract? │ └── Use efetch with rettype='abstract' ├── Need structured data for parsing? │ └── Use efetch with retmode='xml' + Entrez.read() ├── Downloading many records? │ └── See batch-downloads skill └── Need records from multiple databases? └── See entrez-link skill first
Related Skills
- entrez-search - Find record IDs before fetching
- entrez-link - Find related records in other databases
- batch-downloads - Download large numbers of records efficiently
- sequence-io/read-sequences - Parse downloaded sequences with SeqIO