Computational-chemistry-agent-skills rdkit-conf
git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/molecular-conformer/rdkit-conf" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-rdkit-conf && rm -rf "$T"
molecular-conformer/rdkit-conf/SKILL.mdRDKit Conformer Generation
This skill provides practical command patterns for RDKit 3D/2D conformer generation using the standardized CLI wrapper:
<skill_path>/scripts/rdkit_conf_helper.py.
Key behaviors (important for Agents):
- The script prints environment detection (Python/RDKit/Pandas) by default.
- Multi-conformer sampling: embeds
conformers (default 10) per molecule via--num-confs
, optimizes each with the chosen force field, and keeps the lowest-energy one. SetEmbedMultipleConfs
to revert to single-conformer behavior.--num-confs 1 - 2D fallback: if all 3D embedding attempts fail,
is used instead and aCompute2DCoords
line is printed to stderr for that molecule.[WARN] - Bad/illegal SMILES are skipped entirely and logged to
(no crash).*.skipped.csv - Molecules that fell back to 2D are additionally logged to
.*.fallback.csv - Each run ends with a summary line and absolute output paths:
[INFO] Done: <N_3d> 3D, <N_2d> 2D-fallback, <N_skip> skipped (total input: <N>)[RESULT] conf_sdf=/abs/path.sdf[RESULT] conf_xyz=/abs/path.xyz
(only if any 2D fallbacks occurred)[RESULT] fallback_csv=/abs/path.fallback.csv
(only if any SMILES were skipped)[RESULT] skipped_csv=/abs/path.skipped.csv
Quick Start
Check CLI help:
uv run <skill_path>/scripts/rdkit_conf_helper.py --help uv run <skill_path>/scripts/rdkit_conf_helper.py conf --help
Disable environment printing (optional):
uv run <skill_path>/scripts/rdkit_conf_helper.py --no-env conf --smiles "CCO" --output out.sdf
Core Tasks
1) Generate 3D conformers (SDF output, default)
Single SMILES:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --smiles "CCO" \ --output /tmp/CCO.sdf
Single SMILES with a custom molecule name:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --smiles "c1ccccc1" \ --name benzene \ --output /tmp/benzene.sdf
From CSV (default SMILES column:
smiles):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv \ --smiles-col smiles \ --output data.sdf
From CSV with a name column:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv \ --smiles-col smiles \ --name-col compound_id \ --output data.sdf
From SMI (second token per line is used as name automatically):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file molecules.smi \ --output molecules.sdf
2) Control conformer sampling count
Default (10 conformers sampled, lowest-energy kept):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --output data.sdf
Single conformer (fastest, least thorough):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --num-confs 1 --output data.sdf
Increase sampling for flexible or macrocyclic molecules:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --num-confs 50 --output data.sdf
3) Choose force-field minimization
MMFF94s (default, falls back to UFF if unavailable):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --ff mmff94s --output data.mmff.sdf
UFF (universal force field):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --ff uff --output data.uff.sdf
Skip force-field optimization (raw ETKDG geometry only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --ff none --output data.etkdg_raw.sdf
4) XYZ output
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv \ --format xyz \ --output data.xyz
5) Tuning embedding for difficult molecules
Large or macrocyclic molecules sometimes fail standard ETKDG; try random initial coordinates:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file macrocycles.csv \ --use-random-coords \ --max-attempts 500 \ --output macrocycles.sdf
Use a different random seed (reproducibility):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --seed 123 --output data.seed123.sdf
Non-deterministic embedding (seed = -1):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --seed -1 --output data.sdf
6) Suppress hydrogen addition
By default explicit H atoms are added before embedding for more accurate 3D geometry. Use
--no-hs to keep the molecule as-is (heavy atoms only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv --no-hs --output data.noh.sdf
7) Custom log file paths
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \ --file data.csv \ --output data.sdf \ --error-log logs/skipped.csv \ --fallback-log logs/used_2d.csv
3D Embedding Pipeline Details
For each molecule, the script runs the following steps in order:
- Parse SMILES via
.Chem.MolFromSmiles - Add hydrogens (
) -- skipped withChem.AddHs
.--no-hs - Multi-conformer 3D embedding (
,EmbedMultipleConfs
candidates, default 10): tries ETKDGv3, then ETKDGv2, then ETDG, then ETDG+--num-confs
as a fallback chain until at least one conformer is embedded.useRandomCoords - Force-field minimization (if
is not--ff
): each successfully embedded conformer is individually optimized. MMFF94s transparently falls back to UFF if parameters are unavailable for that molecule.none - Lowest-energy selection: the conformer with the minimum post-optimization energy
is retained; all others are discarded. If
, the first embedded conformer is kept without energy ranking.--ff none - 2D fallback (if all 3D attempts yield zero conformers): generates a flat 2D layout
via
(Z=0 for all atoms), prints aCompute2DCoords
to stderr, and records the molecule in the fallback log.[WARN]
Output Format Notes
SDF output (
, default):--format sdf
- Standard V2000 multi-molecule SDF, one conformer per molecule.
- Molecule name (from
,--name
, or auto-generated--name-col
) is written to the SDF header line.mol_<i> - Compatible with most cheminformatics tools (RDKit, OpenBabel, Schrodinger, etc.).
XYZ output (
):--format xyz
- Concatenated XYZ blocks (element, x, y, z per atom).
- Molecule name is written as the comment line (second line of each block).
- Coordinates are in Angstroms.
- Note: if
is used, hydrogen atoms are absent from the XYZ.--no-hs
Fallback log (
):*.fallback.csv
- Written only when at least one molecule fell back to 2D.
- Columns:
,idx
,smiles
,name
(always 2),dim
(alwaysff
),2d_fallback
.note
Skipped log (
):*.skipped.csv
- Written only when at least one SMILES was skipped.
- Columns:
,idx
,smiles
.error
Agent Checklist
When using this skill for users:
- Confirm input format:
requires a SMILES column (default.csv
)smiles
uses the first token per line as SMILES, second token (if present) as name.smi
- Quote SMILES containing special shell characters (brackets/parentheses):
- Example:
--smiles "[C@@H](O)(F)Cl"
- Example:
- For CSV workflows, verify column names:
for the SMILES column--smiles-col
(optional) for molecule identifiers to embed in SDF/XYZ headers--name-col
- Check the
summary line for the 3D/2D/skip breakdown.[INFO] Done: - If 2D fallbacks occurred, inspect
:*.fallback.csv- Consider
or--use-random-coords
tuning for the affected SMILES.--max-attempts - 2D conformers have Z=0 and are not suitable for 3D-based applications (docking, 3D QSAR).
- Consider
- Always capture absolute output paths:
- Look for
in stdout.[RESULT] ...=/abs/path
- Look for
- If debugging is needed, enable full traceback:
RDKIT_CONF_HELPER_TRACE=1 uv run <skill_path>/scripts/rdkit_conf_helper.py ...
References
- RDKit conformer generation guide: https://www.rdkit.org/docs/GettingStartedInPython.html#working-with-3d-molecules
- ETKDG paper: Riniker & Landrum, J. Chem. Inf. Model. 2015, 55, 2562
- ETKDGv3: Wang et al., J. Chem. Inf. Model. 2020, 60, 2044