Medical-research-skills torchdrug-english
PyTorch-native Graph Neural Network framework for molecules and proteins. Suitable for building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, and retrosynthesis. If you need pretrained models and diverse feature extractors, use deepchem; if you need benchmark datasets, use pytdc.
git clone https://github.com/aipoch/medical-research-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/TorchDrug-English" ~/.claude/skills/aipoch-medical-research-skills-torchdrug-english && rm -rf "$T"
scientific-skills/Data Analysis/TorchDrug-English/SKILL.mdTorchDrug
When to Use
- Use this skill when you need pytorch-native graph neural network framework for molecules and proteins. suitable for building custom gnn architectures for drug discovery, protein modeling, or knowledge graph reasoning. best for custom model development, protein property prediction, and retrosynthesis. if you need pretrained models and diverse feature extractors, use deepchem; if you need benchmark datasets, use pytdc in a reproducible workflow.
- Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when
is the most direct path to complete the request.the documented workflow in this package - Use this skill when you need the
package behavior rather than a generic answer.TorchDrug (English)
Key Features
- Scope-focused workflow aligned to: PyTorch-native Graph Neural Network framework for molecules and proteins. Suitable for building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, and retrosynthesis. If you need pretrained models and diverse feature extractors, use deepchem; if you need benchmark datasets, use pytdc.
- Documentation-first workflow with no packaged script requirement.
- Reference material available in
for task-specific guidance.references/ - Structured execution path designed to keep outputs consistent and reviewable.
Dependencies
:Python
. Repository baseline for current packaged skills.3.10+
:Third-party packages
. Add pinned versions if this skill needs stricter environment control.not explicitly version-pinned in this skill package
Example Usage
Skill directory: 20260316/scientific-skills/Data Analytics/TorchDrug-English No packaged executable script was detected. Use the documented workflow in SKILL.md together with the references/assets in this folder.
Example run plan:
- Read the skill instructions and collect the required inputs.
- Follow the documented workflow exactly.
- Use packaged references/assets from this folder when the task needs templates or rules.
- Return a structured result tied to the requested deliverable.
Implementation Details
See
## Overview above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: instruction-only workflow in
.SKILL.md - Reference guidance:
contains supporting rules, prompts, or checklists.references/ - Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
Overview
TorchDrug is a PyTorch-based comprehensive machine learning toolbox designed for drug discovery and molecular science. It applies graph neural networks, pretrained models, and task definitions to molecules, proteins, and biological knowledge graphs, covering molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, and more, including 40+ curated datasets and 20+ model architectures.
When to Use This Skill
Use this skill when dealing with the following:
Data Types:
- SMILES strings or molecular structures
- Protein sequences or 3D structures (PDB files)
- Reactions and retrosynthesis
- Biomedical knowledge graphs
- Drug discovery datasets
Tasks:
- Predict molecular properties (solubility, toxicity, activity)
- Protein function or structure prediction
- Drug-target binding prediction
- Generate new molecular structures
- Plan chemical synthesis routes
- Link prediction in biomedical knowledge bases
- Train graph neural networks on scientific data
Libraries and Integration:
- TorchDrug as core library
- Often with RDKit for cheminformatics
- PyTorch and PyTorch Lightning compatibility
- Integrated AlphaFold and ESM for proteins
Getting Started
Installation
pip install torchdrug
- Or install full version with optional dependencies
pip install torchdrug[full]
Quick Example
from torchdrug import datasets, models, tasks from torch.utils.data import DataLoader import torch # Load molecular dataset dataset = datasets.BBBP("~/molecule-datasets/") train_set, valid_set, test_set = dataset.split() # Define GNN model model = models.GIN( input_dim=dataset.node_feature_dim, hidden_dims=[256, 256, 256], edge_input_dim=dataset.edge_feature_dim, batch_norm=True, readout="mean" ) # Create property prediction task task = tasks.PropertyPrediction( model, task=dataset.tasks, criterion="bce", metric=["auroc", "auprc"] ) # Train with PyTorch optimizer = torch.optim.Adam(task.parameters(), lr=1e-3) train_loader = DataLoader(train_set, batch_size=32, shuffle=True) for epoch in range(100): for batch in train_loader: loss = task(batch) optimizer.zero_grad() loss.backward() optimizer.step()
Core Capabilities
1. Molecular Property Prediction
Predict chemical, physical, and biological properties from molecular structures.
Use Cases:
- Drug-likeness and ADMET properties
- Toxicity screening
- Quantum chemical properties
- Binding affinity prediction
Core Components:
- 20+ molecular datasets (BBBP, HIV, Tox21, QM9, etc.)
- GNN models (GIN, GAT, SchNet)
andPropertyPrediction
tasksMultipleBinaryClassification
Reference: See molecular_property_prediction.md
2. Protein Modeling
Process protein sequences, structures, and properties.
Use Cases:
- Enzyme function prediction
- Protein stability and solubility
- Subcellular localization
- Protein-protein interactions
- Structure prediction
Core Components:
- 15+ protein datasets (EnzymeCommission, GeneOntology, PDBBind, etc.)
- Sequence models (ESM, ProteinBERT, ProteinLSTM)
- Structure models (GearNet, SchNet)
- Multiple task types for different prediction levels
Reference: See protein_modeling.md
3. Knowledge Graph Reasoning
Predict missing links and relations in biomedical knowledge graphs.
Use Cases:
- Drug repurposing
- Disease mechanism discovery
- Gene-disease associations
- Multi-hop biomedical reasoning
Core Components:
- General KGs (FB15k, WN18) and biomedical KGs (Hetionet)
- Embedding models (TransE, RotatE, ComplEx)
taskKnowledgeGraphCompletion
Reference: See knowledge_graphs.md
4. Molecular Generation
Generate novel molecular structures with desired properties.
Use Cases:
- De novo drug design
- Lead compound optimization
- Chemical space exploration
- Property-directed generation
Core Components:
- Autoregressive generation
- GCPN (policy-based generation)
GraphAutoregressiveFlow- Property optimization workflows
Reference: See molecular_generation.md
5. Retrosynthesis
Predict synthesis routes from target molecules to starting materials.
Use Cases:
- Synthesis planning
- Route optimization
- Synthesizability assessment
- Multi-step planning
Core Components:
- USPTO-50k reaction dataset
(reaction center prediction)CenterIdentification
(reactant prediction)SynthonCompletion- End-to-end retrosynthesis pipeline
Reference: See retrosynthesis.md
6. Graph Neural Network Models
Comprehensive catalog of GNN architectures for different data types and tasks.
Available Models:
- General GNN: GCN, GAT, GIN, RGCN, MPNN
- 3D-aware: SchNet, GearNet
- Protein-specific: ESM, ProteinBERT, GearNet
- Knowledge graphs: TransE, RotatE, ComplEx, SimplE
- Generative: GraphAutoregressiveFlow
Reference: See models_architectures.md
7. Datasets
40+ curated datasets covering chemistry, biology, and knowledge graphs.
Categories:
- Molecular properties (drug discovery, quantum chemistry)
- Protein properties (function, structure, interactions)
- Knowledge graphs (general and biomedical)
- Retrosynthesis reactions
Reference: See datasets.md
Common Workflows
Workflow 1: Molecular Property Prediction
Scenario: Predict blood-brain barrier permeability for drug candidates. Steps:
- Load dataset: datasets.BBBP()
- Choose model: GNN for molecular graphs (e.g., GIN)
- Define task: PropertyPrediction with binary classification
- Train using scaffold split for realistic evaluation
- Evaluate with AUROC and AUPRC
Navigation: references/molecular_property_prediction.md → Dataset Selection → Model Selection → Training
Workflow 2: Protein Function Prediction
Scenario: Predict enzyme function from sequence. Steps:
- Load dataset: datasets.EnzymeCommission()
- Choose model: pretrained ESM or GearNet with structure
- Define task: PropertyPrediction with multi-class classification
- Finetune pretrained model or train from scratch
- Evaluate with accuracy and per-class metrics
Navigation: references/protein_modeling.md → Model Selection (Sequence vs Structure) → Pretraining Strategies
Workflow 3: Drug Repurposing via Knowledge Graph
Scenario: Find new disease treatments in Hetionet. Steps:
- Load dataset: datasets.Hetionet()
- Choose model: RotatE or ComplEx
- Define task: KnowledgeGraphCompletion
- Train with negative sampling
- Query predictions for compound-treats-disease
- Filter by plausibility and mechanism
Navigation: references/knowledge_graphs.md → Hetionet Dataset → Model Selection → Biomedical Applications