AlterLab-Academic-Skills alterlab-chembl
Query ChEMBL bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry. Part of the AlterLab Academic Skills suite.
git clone https://github.com/AlterLab-IEU/AlterLab-Academic-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/AlterLab-IEU/AlterLab-Academic-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/databases/alterlab-chembl" ~/.claude/skills/alterlab-ieu-alterlab-academic-skills-alterlab-chembl && rm -rf "$T"
skills/databases/alterlab-chembl/SKILL.mdChEMBL Database
Overview
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
When to Use This Skill
This skill should be used when:
- Compound searches: Finding molecules by name, structure, or properties
- Target information: Retrieving data about proteins, enzymes, or biological targets
- Bioactivity data: Querying IC50, Ki, EC50, or other activity measurements
- Drug information: Looking up approved drugs, mechanisms, or indications
- Structure searches: Performing similarity or substructure searches
- Cheminformatics: Analyzing molecular properties and drug-likeness
- Target-ligand relationships: Exploring compound-target interactions
- Drug discovery: Identifying inhibitors, agonists, or bioactive molecules
Installation and Setup
Python Client
The ChEMBL Python client is required for programmatic access:
uv pip install chembl_webresource_client
Basic Usage Pattern
from chembl_webresource_client.new_client import new_client # Access different endpoints molecule = new_client.molecule target = new_client.target activity = new_client.activity drug = new_client.drug
Core Capabilities
1. Molecule Queries
Retrieve by ChEMBL ID:
molecule = new_client.molecule aspirin = molecule.get('CHEMBL25')
Search by name:
results = molecule.filter(pref_name__icontains='aspirin')
Filter by properties:
# Find small molecules (MW <= 500) with favorable LogP results = molecule.filter( molecule_properties__mw_freebase__lte=500, molecule_properties__alogp__lte=5 )
2. Target Queries
Retrieve target information:
target = new_client.target egfr = target.get('CHEMBL203')
Search for specific target types:
# Find all kinase targets kinases = target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' )
3. Bioactivity Data
Query activities for a target:
activity = new_client.activity # Find potent EGFR inhibitors results = activity.filter( target_chembl_id='CHEMBL203', standard_type='IC50', standard_value__lte=100, standard_units='nM' )
Get all activities for a compound:
compound_activities = activity.filter( molecule_chembl_id='CHEMBL25', pchembl_value__isnull=False )
4. Structure-Based Searches
Similarity search:
similarity = new_client.similarity # Find compounds similar to aspirin similar = similarity.filter( smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85 # 85% similarity threshold )
Substructure search:
substructure = new_client.substructure # Find compounds containing benzene ring results = substructure.filter(smiles='c1ccccc1')
5. Drug Information
Retrieve drug data:
drug = new_client.drug drug_info = drug.get('CHEMBL25')
Get mechanisms of action:
mechanism = new_client.mechanism mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
Query drug indications:
drug_indication = new_client.drug_indication indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
Query Workflow
Workflow 1: Finding Inhibitors for a Target
-
Identify the target by searching by name:
targets = new_client.target.filter(pref_name__icontains='EGFR') target_id = targets[0]['target_chembl_id'] -
Query bioactivity data for that target:
activities = new_client.activity.filter( target_chembl_id=target_id, standard_type='IC50', standard_value__lte=100 ) -
Extract compound IDs and retrieve details:
compound_ids = [act['molecule_chembl_id'] for act in activities] compounds = [new_client.molecule.get(cid) for cid in compound_ids]
Workflow 2: Analyzing a Known Drug
-
Get drug information:
drug_info = new_client.drug.get('CHEMBL1234') -
Retrieve mechanisms:
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234') -
Find all bioactivities:
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
Workflow 3: Structure-Activity Relationship (SAR) Study
-
Find similar compounds:
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80) -
Get activities for each compound:
for compound in similar: activities = new_client.activity.filter( molecule_chembl_id=compound['molecule_chembl_id'] ) -
Analyze property-activity relationships using molecular properties from results.
Filter Operators
ChEMBL supports Django-style query filters:
- Exact match__exact
- Case-insensitive exact match__iexact
/__contains
- Substring matching__icontains
/__startswith
- Prefix/suffix matching__endswith
,__gt
,__gte
,__lt
- Numeric comparisons__lte
- Value in range__range
- Value in list__in
- Null/not null check__isnull
Data Export and Analysis
Convert results to pandas DataFrame for analysis:
import pandas as pd activities = new_client.activity.filter(target_chembl_id='CHEMBL203') df = pd.DataFrame(list(activities)) # Analyze results print(df['standard_value'].describe()) print(df.groupby('standard_type').size())
Performance Optimization
Caching
The client automatically caches results for 24 hours. Configure caching:
from chembl_webresource_client.settings import Settings # Disable caching Settings.Instance().CACHING = False # Adjust cache expiration (seconds) Settings.Instance().CACHE_EXPIRE = 86400
Lazy Evaluation
Queries execute only when data is accessed. Convert to list to force execution:
# Query is not executed yet results = molecule.filter(pref_name__icontains='aspirin') # Force execution results_list = list(results)
Pagination
Results are paginated automatically. Iterate through all results:
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'): # Process each activity print(activity['molecule_chembl_id'])
Common Use Cases
Find Kinase Inhibitors
# Identify kinase targets kinases = new_client.target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' ) # Get potent inhibitors for kinase in kinases[:5]: # First 5 kinases activities = new_client.activity.filter( target_chembl_id=kinase['target_chembl_id'], standard_type='IC50', standard_value__lte=50 )
Explore Drug Repurposing
# Get approved drugs drugs = new_client.drug.filter() # For each drug, find all targets for drug in drugs[:10]: mechanisms = new_client.mechanism.filter( molecule_chembl_id=drug['molecule_chembl_id'] )
Virtual Screening
# Find compounds with desired properties candidates = new_client.molecule.filter( molecule_properties__mw_freebase__range=[300, 500], molecule_properties__alogp__lte=5, molecule_properties__hba__lte=10, molecule_properties__hbd__lte=5 )
Resources
scripts/example_queries.py
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
- Retrieve molecule details by IDget_molecule_info()
- Name-based molecule searchsearch_molecules_by_name()
- Property-based filteringfind_molecules_by_properties()
- Query bioactivities for targetsget_bioactivity_data()
- Similarity searchingfind_similar_compounds()
- Substructure matchingsubstructure_search()
- Retrieve drug informationget_drug_info()
- Specialized kinase inhibitor searchfind_kinase_inhibitors()
- Convert results to pandas DataFrameexport_to_dataframe()
Consult this script for implementation details and usage examples.
references/api_reference.md
Comprehensive API documentation including:
- Complete endpoint listing (molecule, target, activity, assay, drug, etc.)
- All filter operators and query patterns
- Molecular properties and bioactivity fields
- Advanced query examples
- Configuration and performance tuning
- Error handling and rate limiting
Refer to this document when detailed API information is needed or when troubleshooting queries.
Important Notes
Data Reliability
- ChEMBL data is manually curated but may contain inconsistencies
- Always check
field in activity recordsdata_validity_comment - Be aware of
flagspotential_duplicate
Units and Standards
- Bioactivity values use standard units (nM, uM, etc.)
provides normalized activity (-log scale)pchembl_value- Check
to understand measurement type (IC50, Ki, EC50, etc.)standard_type
Rate Limiting
- Respect ChEMBL's fair usage policies
- Use caching to minimize repeated requests
- Consider bulk downloads for large datasets
- Avoid hammering the API with rapid consecutive requests
Chemical Structure Formats
- SMILES strings are the primary structure format
- InChI keys available for compounds
- SVG images can be generated via the image endpoint
Additional Resources
- ChEMBL website: https://www.ebi.ac.uk/chembl/
- API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
- Python client GitHub: https://github.com/chembl/chembl_webresource_client
- Interface documentation: https://chembl.gitbook.io/chembl-interface-documentation/
- Example notebooks: https://github.com/chembl/notebooks