Medical-research-skills medchem
Medicinal chemistry screening filters for compound prioritization; use when you need to apply drug-likeness rules, PAINS/structural alerts, and complexity metrics to triage or optimize libraries.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Evidence Insight/medchem" ~/.claude/skills/aipoch-medical-research-skills-medchem && rm -rf "$T"
manifest:
scientific-skills/Evidence Insight/medchem/SKILL.mdsource content
When to Use
- Screening large compound libraries to quickly triage for drug-like candidates (e.g., Lipinski/Veber + alerts).
- Flagging problematic chemotypes (e.g., PAINS, reactive/toxicophores, curated structural alerts) before follow-up assays.
- Prioritizing lead-optimization candidates with stricter criteria (lead-like rules, demerit systems, complexity caps).
- Enforcing property constraints (MW/logP/TPSA/rotatable bonds) for target-specific design windows (e.g., CNS).
- Identifying molecules containing specific functional groups/scaffolds (e.g., Michael acceptors, hinge binders) for SAR or risk assessment.
Key Features
- Drug-likeness and medchem rule sets: Lipinski (Ro5), Veber, Oprea, CNS, lead-like (soft/strict), Rule of Three, REOS, Golden Triangle, etc.
- PAINS and structural alert filtering: curated alert catalogs and pattern-based screening.
- Curated industrial filter sets: e.g., NIBR filters; Lilly demerit scoring with pass/fail thresholds.
- Functional-group detection via SMARTS-based group matchers (hinge binders, phosphate binders, Michael acceptors, reactive groups, custom patterns).
- Named catalogs of curated structures (functional groups, protecting groups, reagents, fragments) for matching and annotation.
- Molecular complexity metrics (e.g., Bertz/Whitlock/Barone-style) and threshold-based complexity filters.
- Constraint-based filtering for property windows (MW/logP/TPSA/RB, etc.).
- Query language to combine heterogeneous criteria (rules + alerts + numeric thresholds) into a single expression.
Dependencies
(latest)medchem
(latest)datamol
(latest, for tabular workflows)pandas
Example Usage
# End-to-end, runnable example: # 1) load SMILES # 2) apply Ro5 + Veber # 3) apply common structural alerts # 4) compute complexity and filter # 5) export a CSV with decisions import pandas as pd import datamol as dm import medchem as mc smiles_list = [ "CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", # caffeine "c1ccccc1", # benzene ] df = pd.DataFrame({"smiles": smiles_list}) mols = [dm.to_mol(smi) for smi in df["smiles"]] # 1) Drug-likeness rules rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"]) rule_res = rule_filter(mols=mols, n_jobs=-1, progress=False) df["passes_rules"] = rule_res["pass"] # 2) Structural alerts alerts = mc.structural.CommonAlertsFilters() alert_res = alerts(mols=mols, n_jobs=-1, progress=False) df["has_alerts"] = alert_res["has_alerts"] # 3) Complexity (example threshold) complex_filter = mc.complexity.ComplexityFilter(max_complexity=500) complex_res = complex_filter(mols=mols, n_jobs=-1, progress=False) df["passes_complexity"] = complex_res["pass"] # 4) Final decision df["keep"] = df["passes_rules"] & (~df["has_alerts"]) & df["passes_complexity"] # 5) Save results df.to_csv("medchem_screening_results.csv", index=False) print(df)
Implementation Details
-
Rule evaluation (
)medchem.rules- Rules are implemented as callable checks over SMILES or RDKit-like molecule objects (commonly via
).datamol
applies multiple rules and returns a structured result (typically including an overallRuleFilters(rule_list=[...])
plus per-rule details).pass- Typical use: start broad (Ro5/Veber), then tighten (CNS/lead-like) as project constraints become clearer.
- Rules are implemented as callable checks over SMILES or RDKit-like molecule objects (commonly via
-
Structural alerts (
)medchem.structural- Alert systems are primarily SMARTS/pattern-based matchers curated from literature/industrial practice.
,CommonAlertsFilters
, andNIBRFilters
provide different philosophies:LillyDemeritsFilters- Common alerts: general-purpose red flags.
- NIBR: curated industrial filter set.
- Lilly demerits: assigns penalties per matched rule; a common convention is reject if total demerits > 100.
-
Complexity (
)medchem.complexity- Complexity scores approximate synthetic difficulty / structural intricacy using established heuristics (e.g., Bertz/Whitlock/Barone-style metrics).
converts a numeric score into a pass/fail gate for library triage.ComplexityFilter(max_complexity=...)
-
Constraints (
)medchem.constraints- Property windows (MW/logP/TPSA/rotatable bonds, etc.) are applied as hard filters.
- Use constraints to encode target-specific design hypotheses (e.g., CNS-like space) rather than universal “good/bad” judgments.
-
Groups and catalogs (
,medchem.groups
)medchem.catalogs- Group detection is SMARTS-driven and returns boolean matches and/or match details (substructure hits).
- Named catalogs provide curated sets for consistent annotation and matching across projects.
-
Parallelization
- Most batch APIs accept
; setn_jobs
to use all available CPU cores for large libraries.n_jobs=-1
- Most batch APIs accept