Medical-research-skills drugbank-database
Programmatic access to DrugBank drug and target data; use when you need to download, parse, and analyze DrugBank XML for properties, interactions, pathways, and pharmacology.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/drugbank-database" ~/.claude/skills/aipoch-medical-research-skills-drugbank-database && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/drugbank-database/SKILL.mdsource content
When to Use
- You need to extract structured drug properties (e.g., identifiers, synonyms, ATC codes) from DrugBank XML for downstream analysis.
- You want to build and analyze drug–drug interaction (DDI) networks from DrugBank interaction records.
- You are mapping drugs to targets (proteins/genes) to support target discovery, mechanism-of-action analysis, or enrichment workflows.
- You need to connect drugs to pathways and pharmacology annotations for systems pharmacology or knowledge graph construction.
- You want to generate tabular datasets (CSV/Parquet) from DrugBank for use in notebooks, dashboards, or ML pipelines.
Key Features
- Programmatic download of DrugBank releases via
(requires DrugBank access).drugbank-downloader - XML parsing and traversal using
for reliable extraction of nested DrugBank entities.lxml - Data wrangling into
DataFrames for filtering, joining, and export.pandas - Network construction and analysis with
(e.g., DDI graphs, drug–target bipartite graphs).networkx - Optional cheminformatics support with
for structure-based processing (e.g., SMILES/InChI handling when present).rdkit
Dependencies
(version varies by your environment)drugbank-downloaderlxml>=4.9pandas>=2.0networkx>=3.0
(optional; required only for structure/chemistry workflows)rdkit>=2022.09
Example Usage
""" End-to-end example: 1) Parse a local DrugBank XML file 2) Extract a minimal drug table 3) Extract drug-drug interactions 4) Build a DDI graph Prerequisites: - You must obtain DrugBank XML via your DrugBank account/license. - Place the XML file at ./drugbank.xml (or update the path). """ from lxml import etree import pandas as pd import networkx as nx DRUGBANK_XML_PATH = "./drugbank.xml" NS = {"db": "http://www.drugbank.ca"} # DrugBank XML namespace # --- Parse XML --- tree = etree.parse(DRUGBANK_XML_PATH) root = tree.getroot() # --- Extract drug records (minimal fields) --- drugs = [] for drug in root.xpath("//db:drug", namespaces=NS): drugbank_id = drug.xpath("string(db:drugbank-id[@primary='true'])", namespaces=NS).strip() name = drug.xpath("string(db:name)", namespaces=NS).strip() drug_type = drug.get("type", "").strip() # Optional: first SMILES if present smiles = drug.xpath( "string(db:calculated-properties/db:property[db:kind='SMILES']/db:value)", namespaces=NS, ).strip() drugs.append( { "drugbank_id": drugbank_id, "name": name, "type": drug_type, "smiles": smiles or None, } ) drugs_df = pd.DataFrame(drugs).dropna(subset=["drugbank_id"]) print("Drugs:", len(drugs_df)) print(drugs_df.head()) # --- Extract drug-drug interactions --- interactions = [] for drug in root.xpath("//db:drug", namespaces=NS): src_id = drug.xpath("string(db:drugbank-id[@primary='true'])", namespaces=NS).strip() src_name = drug.xpath("string(db:name)", namespaces=NS).strip() for ddi in drug.xpath("db:drug-interactions/db:drug-interaction", namespaces=NS): tgt_id = ddi.xpath("string(db:drugbank-id)", namespaces=NS).strip() tgt_name = ddi.xpath("string(db:name)", namespaces=NS).strip() description = ddi.xpath("string(db:description)", namespaces=NS).strip() if src_id and tgt_id: interactions.append( { "source_id": src_id, "source_name": src_name, "target_id": tgt_id, "target_name": tgt_name, "description": description or None, } ) ddi_df = pd.DataFrame(interactions) print("Interactions:", len(ddi_df)) print(ddi_df.head()) # --- Build a DDI graph --- G = nx.from_pandas_edgelist( ddi_df, source="source_id", target="target_id", edge_attr=["description"], create_using=nx.Graph(), ) print("DDI graph nodes:", G.number_of_nodes()) print("DDI graph edges:", G.number_of_edges()) # Example analysis: top 10 drugs by interaction degree top_degree = sorted(G.degree, key=lambda x: x[1], reverse=True)[:10] top_degree_df = pd.DataFrame(top_degree, columns=["drugbank_id", "degree"]).merge( drugs_df[["drugbank_id", "name"]], on="drugbank_id", how="left", ) print(top_degree_df)
Implementation Details
-
Access & authentication
- DrugBank data access requires a free academic account or a paid license depending on your use case.
- The
step is responsible for fetching the release artifacts; ensure you comply with DrugBank terms.drugbank-downloader
-
XML parsing approach
- DrugBank is distributed as a large XML document;
is used for robust XPath-based extraction.lxml.etree - The XML uses a namespace (commonly
); XPath queries must include the namespace mapping (e.g.,http://www.drugbank.ca
).NS = {"db": "http://www.drugbank.ca"}
- DrugBank is distributed as a large XML document;
-
Core extraction patterns
- Primary DrugBank ID:
db:drugbank-id[@primary='true'] - Drug name:
db:name - Calculated properties (e.g., SMILES):
db:calculated-properties/db:property[db:kind='SMILES']/db:value - Drug interactions:
with fieldsdb:drug-interactions/db:drug-interaction
,db:drugbank-id
,db:namedb:description
- Primary DrugBank ID:
-
Data modeling
- Use
DataFrames for normalized tables (drugs, targets, interactions, pathways).pandas - Use
for graph representations:networkx- DDI graph: nodes are drugs, edges are interactions (store
as edge attribute).description - Drug–target graph: bipartite graph with drug nodes and target nodes.
- DDI graph: nodes are drugs, edges are interactions (store
- Use
-
Performance considerations
- DrugBank XML can be large; for memory-sensitive environments, consider iterative parsing (
) and writing intermediate results to disk.etree.iterparse - Normalize identifiers early (e.g., always keep primary DrugBank IDs) to simplify joins across tables.
- DrugBank XML can be large; for memory-sensitive environments, consider iterative parsing (
-
Further references
- See:
references/data-access.md - See:
references/drug-queries.md - See:
references/interactions.md
- See: