Awesome-Agent-Skills-for-Empirical-Research ena-sequence-api
Access nucleotide sequence data from the European Nucleotide Archive
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/biomedical/ena-sequence-api" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-ena-sequence-api && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/domains/biomedical/ena-sequence-api/SKILL.mdsource content
European Nucleotide Archive (ENA) API
Overview
The European Nucleotide Archive (ENA) at EMBL-EBI is one of the three global nucleotide sequence databases (with NCBI GenBank and DDBJ). It provides access to raw sequencing reads, assembled sequences, and functional annotations from all organisms. The API supports accession lookup, text search, and bulk data retrieval. Free, no authentication required.
API Endpoints
Portal API (Search)
# Search for studies curl "https://www.ebi.ac.uk/ena/portal/api/search?query=CRISPR+cas9&result=study&limit=20&format=json" # Search for samples curl "https://www.ebi.ac.uk/ena/portal/api/search?query=human+gut+microbiome&result=sample&limit=20&format=json" # Search for runs (sequencing data) curl "https://www.ebi.ac.uk/ena/portal/api/search?query=RNA-seq+cancer&result=read_run&limit=20&format=json"
Browser API (Accession Lookup)
# Get record by accession curl "https://www.ebi.ac.uk/ena/browser/api/xml/PRJEB12345" # Get in JSON format curl "https://www.ebi.ac.uk/ena/browser/api/summary/PRJEB12345" # Get sequence in FASTA curl "https://www.ebi.ac.uk/ena/browser/api/fasta/AF123456" # Get in EMBL flat file format curl "https://www.ebi.ac.uk/ena/browser/api/embl/AF123456"
Taxonomy Search
# Search by organism curl "https://www.ebi.ac.uk/ena/portal/api/search?query=tax_tree(9606)&result=study&limit=20&format=json" # Get taxonomy details curl "https://www.ebi.ac.uk/ena/taxonomy/rest/tax-id/9606"
Result Types
| Type | Description | Example accession |
|---|---|---|
| Research project | PRJEB12345 |
| Biological sample | SAMEA12345 |
| Library/protocol | ERX12345 |
| Sequencing run | ERR12345 |
| Computed analysis | ERZ12345 |
| Assembled sequence | AF123456 |
| Whole genome shotgun | AABR00000000 |
Query Parameters
| Parameter | Description | Example |
|---|---|---|
| Search text or taxonomy | |
| Result type | |
| Max results (default 100K) | |
| Pagination offset | |
| Response format | , , |
| Specific fields | |
Python Usage
import requests PORTAL_URL = "https://www.ebi.ac.uk/ena/portal/api" BROWSER_URL = "https://www.ebi.ac.uk/ena/browser/api" def search_studies(query: str, limit: int = 20) -> list: """Search ENA for research studies.""" params = { "query": query, "result": "study", "limit": limit, "format": "json", "fields": "study_accession,study_title,study_description," "tax_id,scientific_name,center_name", } resp = requests.get(f"{PORTAL_URL}/search", params=params) resp.raise_for_status() return resp.json() def search_runs(query: str, limit: int = 20) -> list: """Search for sequencing runs.""" params = { "query": query, "result": "read_run", "limit": limit, "format": "json", "fields": "run_accession,experiment_title,instrument_platform," "library_strategy,read_count,base_count", } resp = requests.get(f"{PORTAL_URL}/search", params=params) resp.raise_for_status() return resp.json() def get_fasta(accession: str) -> str: """Retrieve sequence in FASTA format.""" resp = requests.get(f"{BROWSER_URL}/fasta/{accession}") resp.raise_for_status() return resp.text def get_study_runs(study_accession: str) -> list: """Get all sequencing runs for a study.""" params = { "query": f'study_accession="{study_accession}"', "result": "read_run", "format": "json", "fields": "run_accession,fastq_ftp,read_count,base_count", "limit": 1000, } resp = requests.get(f"{PORTAL_URL}/search", params=params) resp.raise_for_status() return resp.json() # Example: find COVID-19 sequencing studies studies = search_studies("SARS-CoV-2 whole genome", limit=5) for s in studies: print(f"{s['study_accession']}: {s['study_title']}") print(f" Organism: {s.get('scientific_name')}") # Example: find RNA-seq runs runs = search_runs("RNA-seq breast cancer", limit=5) for r in runs: reads = int(r.get("read_count", 0)) print(f"{r['run_accession']}: {r.get('experiment_title', '')}") print(f" Platform: {r.get('instrument_platform')} | " f"Reads: {reads:,}")
Data Access
# Download FASTQ files (from run metadata) # The fastq_ftp field provides FTP paths: wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR123/ERR123456/ERR123456_1.fastq.gz # Bulk download via Aspera (faster) ascp -QT -l 300m -P33001 \ era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/ERR123/ERR123456/ ./
References
- ENA
- ENA Portal API
- ENA Browser API
- Harrison, P.W. et al. (2021). "The European Nucleotide Archive in 2020." NAR 49(D1).