Medical-research-skills bioservices
Unified Python access to 40+ bioinformatics web services; use when you need to query multiple databases (e.g., UniProt/KEGG/ChEMBL/Reactome) with one consistent API in a single workflow, especially for cross-database analysis and identifier mapping.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/bioservices" ~/.claude/skills/aipoch-medical-research-skills-bioservices && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/bioservices/SKILL.mdsource content
When to Use
- You need to retrieve and combine biological data from multiple databases (e.g., UniProt + KEGG + GO) in one Python workflow.
- You need cross-database identifier mapping (e.g., UniProt ↔ KEGG, KEGG compound ↔ ChEMBL) as part of downstream analysis.
- You want to programmatically explore pathways and networks (e.g., KEGG pathway parsing, exporting interactions to SIF).
- You need service-agnostic access across many providers (REST and SOAP/WSDL) without writing custom clients per service.
- You are building integrated bioinformatics pipelines (protein → sequence → BLAST → pathways → interactions) that span multiple resources.
Key Features
- Unified API for ~40+ bioinformatics services (single Python package, consistent patterns).
- Transparent protocol handling (REST and SOAP/WSDL).
- Protein-centric workflows via UniProt (search, retrieve, ID mapping).
- Pathway discovery and parsing via KEGG (KGML parsing, relations extraction, SIF export).
- Compound lookup and cross-referencing (e.g., KEGG compounds + UniChem mapping to ChEMBL).
- Sequence analysis integrations (e.g., NCBI BLAST asynchronous jobs).
- Ontology and annotation queries (e.g., QuickGO).
- Protein–protein interaction queries via PSICQUIC-compatible services.
Dependencies
python >= 3.9
(install via pip/uv; version depends on your environment)bioservices
Optional (commonly used alongside returned formats):
(TSV/tabular outputs)pandas >= 1.5
(XML parsing)beautifulsoup4 >= 4.11
(faster XML parsing)lxml >= 4.9
(network analysis of interactions)networkx >= 2.8
(sequence handling for FASTA outputs)biopython >= 1.81
Example Usage
A single runnable script that demonstrates a cross-service workflow:
- UniProt search + FASTA retrieval
- UniProt → KEGG ID mapping
- KEGG pathway lookup and KGML relation extraction
- QuickGO annotation query
- PSICQUIC interaction query
- KEGG compound lookup + UniChem mapping to ChEMBL
""" Run: uv pip install bioservices pandas python bioservices_example.py Notes: - Some services may rate-limit or be temporarily unavailable. - NCBI BLAST requires an email; this example does not run BLAST to stay lightweight. """ from bioservices import UniProt, KEGG, QuickGO, PSICQUIC, UniChem def main(): # --- UniProt: search + retrieve --- u = UniProt(verbose=False) # Search by entry name (example: ZAP70 human) tab = u.search("ZAP70_HUMAN", frmt="tab", columns="id,entry name,genes,organism") print("UniProt search (tab):") print(tab.splitlines()[0:3], "\n") # show header + first rows uniprot_ac = "P43403" # ZAP70_HUMAN accession fasta = u.retrieve(uniprot_ac, "fasta") print("UniProt FASTA header:") print(fasta.splitlines()[0], "\n") # --- UniProt: identifier mapping (UniProt -> KEGG) --- mapping = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query=uniprot_ac) print("UniProt -> KEGG mapping:") print(mapping, "\n") # --- KEGG: pathway discovery + parsing --- k = KEGG(verbose=False) k.organism = "hsa" # Example gene: ZAP70 is KEGG gene hsa:7535 pathways = k.get_pathway_by_gene("7535", "hsa") print("KEGG pathways containing hsa:7535:") print(pathways, "\n") pathway_id = "hsa04660" # T cell receptor signaling pathway (example) kgml_relations = k.parse_kgml_pathway(pathway_id).get("relations", []) print(f"KEGG KGML relations count for {pathway_id}: {len(kgml_relations)}\n") # Export to SIF (useful for network tools) sif = k.pathway2sif(pathway_id) print(f"KEGG SIF preview for {pathway_id}:") print("\n".join(sif.splitlines()[:5]), "\n") # --- QuickGO: GO annotations for a UniProt protein --- g = QuickGO(verbose=False) ann = g.Annotation(protein=uniprot_ac, format="tsv") print("QuickGO annotation TSV header:") print(ann.splitlines()[0], "\n") # --- PSICQUIC: interaction query (database name may vary by availability) --- p = PSICQUIC(verbose=False) # Example query: ZAP70 interactions in human # Choose a database that is active in your environment; "intact" is commonly available. interactions = p.query("intact", "ZAP70 AND species:9606") print("PSICQUIC query result preview:") print("\n".join(interactions.splitlines()[:3]), "\n") # --- Compound workflow: KEGG compound -> UniChem -> ChEMBL --- # Example: Geldanamycin cpd_hits = k.find("compound", "Geldanamycin") print("KEGG compound find('Geldanamycin'):") print(cpd_hits, "\n") # If you already know the KEGG compound ID: kegg_compound_id = "C11222" uc = UniChem(verbose=False) chembl_id = uc.get_compound_id_from_kegg(kegg_compound_id) print(f"UniChem KEGG {kegg_compound_id} -> ChEMBL:") print(chembl_id, "\n") if __name__ == "__main__": main()
Implementation Details
- Service objects: Each remote resource is exposed as a Python class (e.g.,
,UniProt
,KEGG
,QuickGO
,PSICQUIC
). You instantiate a client and call methods that wrap the underlying endpoints.NCBIblast - Protocols: BioServices abstracts REST and SOAP/WSDL services behind similar method calls; returned payloads may be text (TSV), XML, JSON-like dicts, or FASTA.
- Common parameters
: toggles HTTP/request logging (verbose
is recommended for scripts).verbose=False
: per-service timeout control (useful for slow networks or large responses).TIMEOUT- Service-specific parameters (examples):
- UniProt:
,search(query, frmt=..., columns=...)
,retrieve(accession, format)mapping(fr=..., to=..., query=...) - KEGG:
,find(db, query)
,get(entry_id)
,parse(raw)
,parse_kgml_pathway(pathway_id)pathway2sif(pathway_id) - NCBI BLAST: asynchronous job model (
→run(...)
→getStatus(jobid)
)getResult(jobid, ...)
- UniProt:
- Data handling guidance
- TSV/tabular outputs: load into
pandas.read_csv(io.StringIO(text), sep="\t") - XML outputs: parse with
orBeautifulSouplxml - Network exports (SIF): import into NetworkX/Cytoscape-compatible tooling
- TSV/tabular outputs: load into
- Operational considerations
- Many endpoints are rate-limited; implement retries/backoff for production pipelines.
- Some services require contact information (e.g., NCBI BLAST email) and may enforce usage policies.
- Availability varies by provider; design workflows to degrade gracefully (try/except, fallbacks).