SciAgent-Skills deepchem

Deep learning framework for drug discovery and materials science. 60+ models (GCN, GAT, AttentiveFP, MPNN, DMPNN, ChemBERTa, GROVER), 50+ molecular featurizers, MoleculeNet benchmarks, hyperparameter optimization, transfer learning. Unified load-featurize-split-train-evaluate API. For fingerprint-only cheminformatics use rdkit-cheminformatics; for featurization hub without training use molfeat-molecular-featurization.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/structural-biology-drug-discovery/deepchem" ~/.claude/skills/jaechang-hits-sciagent-skills-deepchem && rm -rf "$T"
manifest: skills/structural-biology-drug-discovery/deepchem/SKILL.md
source content

DeepChem — Deep Learning for Drug Discovery

Overview

DeepChem is an open-source Python framework providing a unified API for molecular machine learning across drug discovery, materials science, and quantum chemistry. It wraps 60+ model architectures (graph neural networks, transformers, classical ML) with 50+ molecular featurizers and standardized datasets (MoleculeNet), enabling end-to-end workflows from SMILES strings to trained predictive models.

When to Use

  • Predicting molecular properties (solubility, toxicity, binding affinity) from SMILES
  • Benchmarking models on MoleculeNet standardized datasets (BBBP, Tox21, ESOL, FreeSolv, etc.)
  • Training graph neural networks on molecular graphs (GCN, GAT, AttentiveFP, MPNN, DMPNN)
  • Fine-tuning pretrained chemical language models (ChemBERTa, GROVER, MolFormer)
  • Running hyperparameter optimization for molecular ML models
  • Virtual screening and hit prioritization with trained models
  • Materials property prediction from crystal structures (CGCNN, MEGNet)
  • Protein-ligand interaction modeling and binding affinity prediction
  • For fingerprint-based cheminformatics without deep learning, use
    rdkit-cheminformatics
    instead
  • For featurization only (no model training), use
    molfeat-molecular-featurization
    instead

Prerequisites

  • Python packages:
    deepchem
    (core),
    torch
    or
    tensorflow
    (backend-dependent models)
  • GPU: Recommended for graph neural networks and transformer models; CPU sufficient for classical ML and fingerprint models
  • Data: SMILES strings with property labels (CSV), or MoleculeNet datasets (auto-downloaded)
# Core installation (includes RDKit, scikit-learn, XGBoost)
pip install deepchem

# With PyTorch backend (GNN models)
pip install deepchem[torch]

# With TensorFlow backend (legacy models)
pip install deepchem[tensorflow]

# Full installation (all backends + extras)
pip install deepchem[all]

Quick Start

import deepchem as dc

# Load MoleculeNet dataset with featurization + scaffold split
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets

# Train and evaluate a multitask regressor
model = dc.models.MultitaskRegressor(n_tasks=1, n_features=1024, dropouts=0.2)
model.fit(train, nb_epoch=50)
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
print(f"Test R2: {model.evaluate(test, [metric])}")  # {'pearson_r2_score': ~0.7}

Core API

Module 1: Data Loading and Processing

Load molecular data from CSV files or MoleculeNet benchmark datasets.

import deepchem as dc

# Load from CSV (SMILES + property columns)
loader = dc.data.CSVLoader(
    tasks=["measured_log_solubility"],
    feature_field="smiles",
    featurizer=dc.feat.CircularFingerprint(size=2048, radius=3)
)
dataset = loader.create_dataset("solubility_data.csv")
print(f"Samples: {dataset.X.shape[0]}, Features: {dataset.X.shape[1]}")
# Samples: 1128, Features: 2048

# Load from SDF (3D structures)
sdf_loader = dc.data.SDFLoader(
    tasks=["activity"],
    featurizer=dc.feat.CoulombMatrix(max_atoms=50)
)
dataset_3d = sdf_loader.create_dataset("molecules.sdf")
# Load MoleculeNet benchmark datasets (auto-download + featurize + split)
# Available: load_delaney, load_bbbp, load_tox21, load_hiv, load_qm7, load_qm9, etc.
tasks, datasets, transformers = dc.molnet.load_tox21(featurizer="ECFP", splitter="scaffold")
train, valid, test = datasets
print(f"Tasks: {len(tasks)}, Train: {len(train)}, Test: {len(test)}")
# Tasks: 12, Train: ~6264, Test: ~631

# Inverse-transform predictions back to original scale
y_pred = model.predict(test)
y_original = transformers[0].untransform(y_pred)

Module 2: Molecular Featurization

Convert molecules to numerical representations for ML. DeepChem provides 50+ featurizers spanning fingerprints, descriptors, graph features, and Coulomb matrices.

import deepchem as dc

smiles = ["CCO", "CC(=O)O", "c1ccccc1", "CC(C)O"]

# Fingerprints (most common for classical ML)
ecfp = dc.feat.CircularFingerprint(size=2048, radius=3)
fp_features = ecfp.featurize(smiles)
print(f"ECFP shape: {fp_features.shape}")  # (4, 2048)

# RDKit descriptors (interpretable physicochemical properties)
rdkit_desc = dc.feat.RDKitDescriptors()
desc_features = rdkit_desc.featurize(smiles)
print(f"Descriptor shape: {desc_features.shape}")  # (4, 208)

# Graph features (for GNN models — returns ConvMol objects)
graph_feat = dc.feat.ConvMolFeaturizer()
graphs = graph_feat.featurize(smiles)
print(f"Atoms in first mol: {graphs[0].get_num_atoms()}")  # 3

# Mol2Vec embeddings (pretrained word2vec on molecular substructures)
mol2vec = dc.feat.Mol2VecFingerprint()
embeddings = mol2vec.featurize(smiles)
print(f"Mol2Vec shape: {embeddings.shape}")  # (4, 300)

Module 3: Model Training and Evaluation

DeepChem provides MultitaskRegressor and MultitaskClassifier as general-purpose models, plus specialized architectures for graph and sequence data.

import deepchem as dc

# Load dataset
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets

# Regression model (fingerprint input)
model = dc.models.MultitaskRegressor(
    n_tasks=1,
    n_features=1024,
    layer_sizes=[1000, 500],
    dropouts=0.25,
    learning_rate=0.001,
    batch_size=64,
)
model.fit(train, nb_epoch=100)

# Evaluate with multiple metrics
metrics = [
    dc.metrics.Metric(dc.metrics.pearson_r2_score),
    dc.metrics.Metric(dc.metrics.mean_absolute_error),
    dc.metrics.Metric(dc.metrics.rms_score),
]
results = model.evaluate(test, metrics)
print(f"R2: {results['pearson_r2_score']:.3f}, MAE: {results['mean_absolute_error']:.3f}")
# Classification model (e.g., Tox21 toxicity prediction)
tasks, datasets, transformers = dc.molnet.load_tox21(featurizer="ECFP")
train, valid, test = datasets

clf = dc.models.MultitaskClassifier(
    n_tasks=len(tasks),
    n_features=1024,
    layer_sizes=[1000, 500],
    dropouts=0.5,
    learning_rate=0.001,
)
clf.fit(train, nb_epoch=50)
roc_metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)
print(f"Mean ROC-AUC: {clf.evaluate(test, [roc_metric])}")

Module 4: Graph Neural Networks

GNNs operate directly on molecular graphs (atoms as nodes, bonds as edges), avoiding information loss from fixed fingerprints.

import deepchem as dc

# Load with graph featurizer
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="GraphConv")
train, valid, test = datasets

# Graph Convolutional Network (Duvenaud et al.)
gcn_model = dc.models.GraphConvModel(
    n_tasks=1,
    mode="regression",
    graph_conv_layers=[64, 64],
    dense_layer_size=256,
    dropout=0.2,
    learning_rate=0.001,
    batch_size=64,
)
gcn_model.fit(train, nb_epoch=100)
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
print(f"GCN R2: {gcn_model.evaluate(test, [metric])}")
# AttentiveFP (Xiong et al.) — attention-based GNN, strong on molecular properties
tasks, datasets, transformers = dc.molnet.load_delaney(
    featurizer=dc.feat.MolGraphConvFeaturizer(use_edges=True)
)
train, valid, test = datasets

attfp_model = dc.models.AttentiveFPModel(
    n_tasks=1,
    mode="regression",
    num_layers=2,
    graph_feat_size=200,
    num_timesteps=2,
    dropout=0.2,
    learning_rate=0.001,
    batch_size=64,
)
attfp_model.fit(train, nb_epoch=100)
print(f"AttentiveFP R2: {attfp_model.evaluate(test, [metric])}")

Module 5: Transfer Learning

Fine-tune pretrained chemical language models for downstream tasks with limited data.

import deepchem as dc
from deepchem.models.torch_models import ChemBERTaModel

# ChemBERTa — SMILES-based transformer (pretrained on 77M molecules)
tasks, datasets, transformers = dc.molnet.load_bbbp(featurizer=dc.feat.SmilesTokenizer())
train, valid, test = datasets

chemberta = ChemBERTaModel(
    task="classification",
    n_tasks=1,
    model_dir="chemberta_finetuned/",
)
# Fine-tune on downstream task (BBB permeability)
chemberta.fit(train, nb_epoch=10)
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print(f"ChemBERTa ROC-AUC: {chemberta.evaluate(test, [metric])}")

Module 6: Predictions on New Molecules

Run inference on new molecules with a trained model.

import deepchem as dc
import numpy as np

# Assume trained model from Module 3
# Featurize new molecules using same featurizer
featurizer = dc.feat.CircularFingerprint(size=1024, radius=2)
new_smiles = ["c1cc(O)ccc1", "CC(=O)Nc1ccc(O)cc1", "OC(=O)c1ccccc1"]
new_features = featurizer.featurize(new_smiles)
new_dataset = dc.data.NumpyDataset(X=new_features)

predictions = model.predict(new_dataset)
for smi, pred in zip(new_smiles, predictions):
    print(f"{smi}: {pred[0]:.2f}")

# Ensemble predictions from multiple models for robustness
models = [model1, model2, model3]  # trained models
all_preds = np.array([m.predict(new_dataset) for m in models])
ensemble_mean = all_preds.mean(axis=0)
ensemble_std = all_preds.std(axis=0)
print(f"Ensemble prediction: {ensemble_mean[0][0]:.2f} +/- {ensemble_std[0][0]:.2f}")

Key Concepts

Unified API Pattern

All DeepChem workflows follow a consistent 5-step pattern:

Load Data → Featurize → Split → Train → Evaluate
  • Load:
    CSVLoader
    ,
    SDFLoader
    , or
    dc.molnet.load_*()
    (auto-loads MoleculeNet datasets)
  • Featurize: Pass featurizer to loader, or call
    featurizer.featurize(smiles)
    directly
  • Split:
    ScaffoldSplitter
    (recommended for drug discovery),
    RandomSplitter
    ,
    ButinaSplitter
  • Train:
    model.fit(train_dataset, nb_epoch=N)
  • Evaluate:
    model.evaluate(test_dataset, metrics_list)

Model Selection Guide

Data TypeModelKey FeatureUse When
SMILES + fingerprints
MultitaskRegressor
Fast, baselineFirst attempt, small datasets
SMILES + fingerprints
MultitaskClassifier
Multi-labelMulti-task classification (Tox21)
Molecular graphs
GraphConvModel
Learned fingerprintsMedium datasets, general properties
Molecular graphs
GATModel
Attention mechanismWhen atom importance matters
Molecular graphs
AttentiveFPModel
Graph + timestep attentionState-of-art molecular properties
Molecular graphs
MPNNModel
Message passingComplex molecular interactions
Molecular graphs
DMPNNModel
Directed MPNNBond-level predictions
SMILES strings
ChemBERTaModel
Pretrained transformerLow-data regime, transfer learning
SMILES strings
GROVERModel
Graph + transformerRich molecular representations
Crystal structures
CGCNNModel
Crystal graph CNNMaterials property prediction
Crystal structures
MEGNetModel
Graph networksMaterials and molecules
Protein sequences
ProteinLigandComplexModel
Complex modelingBinding affinity prediction
Tabular features
XGBoostModel
,
RandomForestModel
Classical MLInterpretability, baselines

Featurizer Selection Guide

FeaturizerClassOutputBest For
ECFP/Morgan
CircularFingerprint
Binary vector (1024-2048)General QSAR, fast baselines
MACCS Keys
MACCSKeysFingerprint
167-bit vectorSubstructure filtering
RDKit 2D
RDKitDescriptors
200+ descriptorsInterpretable models
Mol2Vec
Mol2VecFingerprint
300-dim embeddingSimilarity, clustering
ConvMol
ConvMolFeaturizer
Graph features
GraphConvModel
input
MolGraph
MolGraphConvFeaturizer
Node + edge features
AttentiveFPModel
,
MPNNModel
Weave
WeaveFeaturizer
Pair features
WeaveModel
input
Coulomb Matrix
CoulombMatrix
Atom-pair distancesQM property prediction
SMILES tokens
SmilesTokenizer
Token IDsChemBERTa, transformer models

Data Splitting Strategies

SplitterUse CaseWhy
ScaffoldSplitter
Drug discovery (default)Tests generalization to new chemotypes
RandomSplitter
Quick experimentsBaseline, but overestimates performance
ButinaSplitter
Diversity-basedClusters by Tanimoto similarity
FingerprintSplitter
Chemical similarityGroups structurally similar molecules
MaxMinSplitter
Maximum diversity testExtreme generalization test

Common Workflows

Workflow 1: QSAR from CSV Data

Goal: Build a property prediction model from a CSV file with SMILES and activity columns.

import deepchem as dc
import pandas as pd

# Step 1: Load and featurize CSV data
loader = dc.data.CSVLoader(
    tasks=["pIC50"],
    feature_field="smiles",
    featurizer=dc.feat.CircularFingerprint(size=2048, radius=3),
)
dataset = loader.create_dataset("bioactivity_data.csv")

# Step 2: Normalize targets
transformer = dc.trans.NormalizationTransformer(
    transform_y=True, dataset=dataset
)
dataset = transformer.transform(dataset)

# Step 3: Scaffold split (realistic for drug discovery)
splitter = dc.splits.ScaffoldSplitter()
train, valid, test = splitter.train_valid_test_split(dataset)
print(f"Train: {len(train)}, Valid: {len(valid)}, Test: {len(test)}")

# Step 4: Train model
model = dc.models.MultitaskRegressor(
    n_tasks=1, n_features=2048,
    layer_sizes=[1000, 500], dropouts=0.25,
    learning_rate=0.001, batch_size=64,
)
model.fit(train, nb_epoch=100)

# Step 5: Evaluate
metrics = [
    dc.metrics.Metric(dc.metrics.pearson_r2_score),
    dc.metrics.Metric(dc.metrics.mean_absolute_error),
]
results = model.evaluate(test, metrics)
print(f"R2: {results['pearson_r2_score']:.3f}, MAE: {results['mean_absolute_error']:.3f}")

Workflow 2: MoleculeNet Benchmark Comparison

Goal: Compare multiple models on a MoleculeNet benchmark dataset.

import deepchem as dc

# Load dataset with graph featurizer (supports both fingerprint and GNN models)
tasks, datasets, transformers = dc.molnet.load_bbbp(
    featurizer="GraphConv", splitter="scaffold"
)
train, valid, test = datasets
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)

# Model 1: Graph Convolutional Network
gcn = dc.models.GraphConvModel(n_tasks=1, mode="classification", dropout=0.2)
gcn.fit(train, nb_epoch=50)
gcn_score = gcn.evaluate(test, [metric])

# Model 2: Random Forest baseline (needs fingerprints)
tasks_fp, datasets_fp, _ = dc.molnet.load_bbbp(featurizer="ECFP", splitter="scaffold")
train_fp, _, test_fp = datasets_fp
rf = dc.models.SklearnModel(
    model=dc.models.sklearn_models.RandomForestClassifier(n_estimators=500),
    model_dir="rf_model/"
)
rf.fit(train_fp)
rf_score = rf.evaluate(test_fp, [metric])

print(f"GCN ROC-AUC: {gcn_score['roc_auc_score']:.3f}")
print(f"RF  ROC-AUC: {rf_score['roc_auc_score']:.3f}")

Workflow 3: Transfer Learning Pipeline

Goal: Fine-tune a pretrained model on a small dataset.

  1. Load pretrained ChemBERTa model (see Module 5 for code)
  2. Prepare downstream dataset with
    SmilesTokenizer
    featurizer
  3. Fine-tune with reduced learning rate (
    1e-5
    to
    5e-5
    ) for 5-15 epochs
  4. Evaluate on held-out scaffold split — expect gains over fingerprint baselines when training data < 1000 samples
  5. Save fine-tuned model:
    model.save_checkpoint()
  6. See
    references/workflows_model_catalog.md
    Workflow 1 for complete hyperparameter optimization code

Key Parameters

ParameterModuleDefaultRange / OptionsEffect
n_features
MultitaskRegressor/ClassifierRequiredMatches featurizer outputInput feature dimension
layer_sizes
MultitaskRegressor/Classifier
[1000]
[256]
to
[1000, 500, 250]
Hidden layer dimensions
dropouts
All neural models
0.0
0.0
-
0.5
Regularization strength
learning_rate
All neural models
0.001
1e-5
-
0.01
Training step size
batch_size
All neural models
100
16
-
256
Samples per gradient update
nb_epoch
model.fit()
10
10
-
300
Training iterations
size
CircularFingerprint
2048
512
-
4096
Fingerprint bit length
radius
CircularFingerprint
2
2
-
4
Substructure neighborhood radius
graph_conv_layers
GraphConvModel
[64, 64]
[32]
to
[128, 128, 64]
Graph convolution widths
num_layers
AttentiveFPModel
2
1
-
5
GNN message passing depth
graph_feat_size
AttentiveFPModel
200
64
-
512
Graph feature dimension
splitter
dc.molnet.load_*()
"scaffold"
"scaffold"
,
"random"
,
"butina"
Data splitting strategy

Best Practices

  1. Always use scaffold splitting for drug discovery: Random splits leak structural information and overestimate performance. Scaffold splits test generalization to novel chemotypes.

  2. Normalize regression targets: Apply

    NormalizationTransformer(transform_y=True)
    before training. Remember to
    untransform()
    predictions for interpretable values.

  3. Start with fingerprint baselines: Train

    MultitaskRegressor
    + ECFP first. Only move to GNNs if fingerprint baseline is insufficient — GNNs need more data and compute.

    # Baseline first
    baseline = dc.models.MultitaskRegressor(n_tasks=1, n_features=2048)
    
  4. Match featurizer to model: GNN models require graph featurizers (

    ConvMolFeaturizer
    ,
    MolGraphConvFeaturizer
    ). Fingerprint models need
    CircularFingerprint
    . Mixing causes silent errors.

  5. Anti-pattern -- Do not use random split for drug discovery benchmarks: Results with

    RandomSplitter
    are not publishable for molecular property prediction. Reviewers expect scaffold or temporal splits.

  6. Handle missing labels in multi-task datasets: Tox21 and many bioactivity datasets have missing values. DeepChem handles NaN labels automatically during training (masked loss), but verify with

    np.isnan(dataset.y).sum()
    .

  7. Use early stopping via validation set: Monitor validation loss to prevent overfitting, especially with GNN models.

Common Recipes

Recipe: Hyperparameter Search

When to use: Optimize model performance before final evaluation.

import deepchem as dc

tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets

# Define parameter grid
params = {
    "n_features": [1024],
    "layer_sizes": [[500], [1000, 500], [1000, 500, 250]],
    "dropouts": [0.1, 0.25, 0.5],
    "learning_rate": [0.001, 0.0005],
}

optimizer = dc.hyper.GridHyperparamOpt(lambda **p: dc.models.MultitaskRegressor(**p))
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
best_model, best_params, all_results = optimizer.hyperparam_search(
    params, train, valid, metric, logdir="hyperparam_logs/"
)
print(f"Best params: {best_params}")
print(f"Best R2: {best_model.evaluate(test, [metric])}")

Recipe: Save and Reload Models

When to use: Deploy trained models or resume training.

# Save model checkpoint
model.save_checkpoint(model_dir="saved_model/")

# Reload model
loaded_model = dc.models.MultitaskRegressor(n_tasks=1, n_features=2048)
loaded_model.restore(model_dir="saved_model/")
predictions = loaded_model.predict(test)

Recipe: Custom Metric

When to use: Evaluate models with domain-specific metrics.

import deepchem as dc
import numpy as np

def enrichment_factor(y_true, y_pred, top_fraction=0.01):
    """Enrichment factor at top X% of ranked predictions."""
    n = len(y_true)
    n_top = max(int(n * top_fraction), 1)
    top_indices = np.argsort(y_pred.flatten())[-n_top:]
    hits_in_top = y_true.flatten()[top_indices].sum()
    expected = y_true.sum() * top_fraction
    return hits_in_top / expected if expected > 0 else 0.0

ef_metric = dc.metrics.Metric(enrichment_factor, mode="regression")
print(f"EF@1%: {model.evaluate(test, [ef_metric])}")

Troubleshooting

ProblemCauseSolution
ModuleNotFoundError: torch
PyTorch not installed
pip install deepchem[torch]
for GNN models
ValueError: n_features mismatch
Featurizer output size does not match model
n_features
Check
dataset.X.shape[1]
and set
n_features
accordingly
NaN loss during trainingLearning rate too high or unnormalized targetsApply
NormalizationTransformer
, reduce learning rate to
1e-4
Low scaffold-split performanceModel memorizes scaffolds, not propertiesUse more data, try GNN models, or add regularization (dropout 0.3-0.5)
RuntimeError: CUDA out of memory
Batch size too large for GPUReduce
batch_size
(32 or 16), or use CPU for small datasets
FeaturizationError
on some SMILES
Invalid or complex SMILES stringsPre-filter with RDKit:
Chem.MolFromSmiles(smi) is not None
Model predicts constant valuesTargets not normalized or too few epochsApply
NormalizationTransformer
, increase
nb_epoch
Slow featurizationLarge dataset with expensive featurizerUse
CircularFingerprint
(fast) or parallelize with
n_jobs
parameter

Bundled Resources

  • references/workflows_model_catalog.md -- Extended workflows (hyperparameter optimization with full code, MolGAN generative models, materials property prediction with CGCNN/MEGNet, protein-ligand modeling, custom model architecture) plus complete model catalog (60+ models organized by category) and complete featurizer catalog (50+ featurizers). Covers: workflows 4-8 from original, extended model and featurizer inventories, MoleculeNet dataset catalog. Relocated inline: top 3 workflows (QSAR, MoleculeNet benchmark, transfer learning) are in Common Workflows; core model/featurizer tables are in Key Concepts. Omitted: detailed installation troubleshooting for TensorFlow 1.x (deprecated) and Docker-specific setup (covered by official docs).

Related Skills

  • rdkit-cheminformatics -- molecular manipulation, fingerprints, substructure search (upstream featurization)
  • molfeat-molecular-featurization -- 100+ featurizers with scikit-learn API (featurization-only alternative)
  • datamol-cheminformatics -- Pythonic molecular processing (upstream data prep)
  • pytdc-therapeutics-data-commons -- curated ADMET/DTI datasets with standardized splits (complementary data source)
  • torch-geometric-graph-neural-networks -- lower-level PyG for custom GNN architectures (alternative for advanced users)
  • scikit-learn-machine-learning -- classical ML baselines that DeepChem wraps via
    SklearnModel

References