Medical-research-skills drugbank-database

Programmatic access to DrugBank drug and target data; use when you need to download, parse, and analyze DrugBank XML for properties, interactions, pathways, and pharmacology.

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/drugbank-database" ~/.claude/skills/aipoch-medical-research-skills-drugbank-database && rm -rf "$T"

manifest: scientific-skills/Evidence Insight/drugbank-database/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

You need to extract structured drug properties (e.g., identifiers, synonyms, ATC codes) from DrugBank XML for downstream analysis.
You want to build and analyze drug–drug interaction (DDI) networks from DrugBank interaction records.
You are mapping drugs to targets (proteins/genes) to support target discovery, mechanism-of-action analysis, or enrichment workflows.
You need to connect drugs to pathways and pharmacology annotations for systems pharmacology or knowledge graph construction.
You want to generate tabular datasets (CSV/Parquet) from DrugBank for use in notebooks, dashboards, or ML pipelines.

Key Features

Programmatic download of DrugBank releases via
```
drugbank-downloader
```
(requires DrugBank access).
XML parsing and traversal using
```
lxml
```
for reliable extraction of nested DrugBank entities.
Data wrangling into
```
pandas
```
DataFrames for filtering, joining, and export.
Network construction and analysis with
```
networkx
```
(e.g., DDI graphs, drug–target bipartite graphs).
Optional cheminformatics support with
```
rdkit
```
for structure-based processing (e.g., SMILES/InChI handling when present).

Dependencies

```
drugbank-downloader
```
(version varies by your environment)
```
lxml>=4.9
```
```
pandas>=2.0
```
```
networkx>=3.0
```
```
rdkit>=2022.09
```
(optional; required only for structure/chemistry workflows)

Example Usage

"""
End-to-end example:
1) Parse a local DrugBank XML file
2) Extract a minimal drug table
3) Extract drug-drug interactions
4) Build a DDI graph

Prerequisites:
- You must obtain DrugBank XML via your DrugBank account/license.
- Place the XML file at ./drugbank.xml (or update the path).
"""

from lxml import etree
import pandas as pd
import networkx as nx

DRUGBANK_XML_PATH = "./drugbank.xml"
NS = {"db": "http://www.drugbank.ca"}  # DrugBank XML namespace

# --- Parse XML ---
tree = etree.parse(DRUGBANK_XML_PATH)
root = tree.getroot()

# --- Extract drug records (minimal fields) ---
drugs = []
for drug in root.xpath("//db:drug", namespaces=NS):
    drugbank_id = drug.xpath("string(db:drugbank-id[@primary='true'])", namespaces=NS).strip()
    name = drug.xpath("string(db:name)", namespaces=NS).strip()
    drug_type = drug.get("type", "").strip()

    # Optional: first SMILES if present
    smiles = drug.xpath(
        "string(db:calculated-properties/db:property[db:kind='SMILES']/db:value)",
        namespaces=NS,
    ).strip()

    drugs.append(
        {
            "drugbank_id": drugbank_id,
            "name": name,
            "type": drug_type,
            "smiles": smiles or None,
        }
    )

drugs_df = pd.DataFrame(drugs).dropna(subset=["drugbank_id"])
print("Drugs:", len(drugs_df))
print(drugs_df.head())

# --- Extract drug-drug interactions ---
interactions = []
for drug in root.xpath("//db:drug", namespaces=NS):
    src_id = drug.xpath("string(db:drugbank-id[@primary='true'])", namespaces=NS).strip()
    src_name = drug.xpath("string(db:name)", namespaces=NS).strip()

    for ddi in drug.xpath("db:drug-interactions/db:drug-interaction", namespaces=NS):
        tgt_id = ddi.xpath("string(db:drugbank-id)", namespaces=NS).strip()
        tgt_name = ddi.xpath("string(db:name)", namespaces=NS).strip()
        description = ddi.xpath("string(db:description)", namespaces=NS).strip()

        if src_id and tgt_id:
            interactions.append(
                {
                    "source_id": src_id,
                    "source_name": src_name,
                    "target_id": tgt_id,
                    "target_name": tgt_name,
                    "description": description or None,
                }
            )

ddi_df = pd.DataFrame(interactions)
print("Interactions:", len(ddi_df))
print(ddi_df.head())

# --- Build a DDI graph ---
G = nx.from_pandas_edgelist(
    ddi_df,
    source="source_id",
    target="target_id",
    edge_attr=["description"],
    create_using=nx.Graph(),
)

print("DDI graph nodes:", G.number_of_nodes())
print("DDI graph edges:", G.number_of_edges())

# Example analysis: top 10 drugs by interaction degree
top_degree = sorted(G.degree, key=lambda x: x[1], reverse=True)[:10]
top_degree_df = pd.DataFrame(top_degree, columns=["drugbank_id", "degree"]).merge(
    drugs_df[["drugbank_id", "name"]],
    on="drugbank_id",
    how="left",
)
print(top_degree_df)

Implementation Details

Access & authentication
- DrugBank data access requires a free academic account or a paid license depending on your use case.
- The
```
drugbank-downloader
```
  step is responsible for fetching the release artifacts; ensure you comply with DrugBank terms.
XML parsing approach
- DrugBank is distributed as a large XML document;
```
lxml.etree
```
  is used for robust XPath-based extraction.
- The XML uses a namespace (commonly
```
http://www.drugbank.ca
```
  ); XPath queries must include the namespace mapping (e.g.,
```
NS = {"db": "http://www.drugbank.ca"}
```
  ).

Core extraction patterns

Primary DrugBank ID:
```
db:drugbank-id[@primary='true']
```
Drug name:
```
db:name
```

Calculated properties (e.g., SMILES):

db:calculated-properties/db:property[db:kind='SMILES']/db:value

Drug interactions:

db:drug-interactions/db:drug-interaction

with fields

db:drugbank-id

db:name

db:description

Data modeling
- Use
```
pandas
```
  DataFrames for normalized tables (drugs, targets, interactions, pathways).
- Use
```
networkx
```
  for graph representations:
  - DDI graph: nodes are drugs, edges are interactions (store
```
description
```
    as edge attribute).
  - Drug–target graph: bipartite graph with drug nodes and target nodes.
Performance considerations
- DrugBank XML can be large; for memory-sensitive environments, consider iterative parsing (
```
etree.iterparse
```
  ) and writing intermediate results to disk.
- Normalize identifiers early (e.g., always keep primary DrugBank IDs) to simplify joins across tables.

Further references

See:
```
references/data-access.md
```
See:
```
references/drug-queries.md
```
See:
```
references/interactions.md
```