Computational-chemistry-agent-skills rdkit-conf

install

source · Clone the upstream repo

git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/molecular-conformer/rdkit-conf" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-rdkit-conf && rm -rf "$T"

manifest: molecular-conformer/rdkit-conf/SKILL.md

source content

RDKit Conformer Generation

This skill provides practical command patterns for RDKit 3D/2D conformer generation using the standardized CLI wrapper:

<skill_path>/scripts/rdkit_conf_helper.py

Key behaviors (important for Agents):

The script prints environment detection (Python/RDKit/Pandas) by default.
Multi-conformer sampling: embeds
```
--num-confs
```
conformers (default 10) per molecule via
```
EmbedMultipleConfs
```
, optimizes each with the chosen force field, and keeps the lowest-energy one. Set
```
--num-confs 1
```
to revert to single-conformer behavior.
2D fallback: if all 3D embedding attempts fail,
```
Compute2DCoords
```
is used instead and a
```
[WARN]
```
line is printed to stderr for that molecule.
Bad/illegal SMILES are skipped entirely and logged to
```
*.skipped.csv
```
(no crash).
Molecules that fell back to 2D are additionally logged to
```
*.fallback.csv
```
.

Each run ends with a summary line and absolute output paths:

[INFO] Done: <N_3d> 3D, <N_2d> 2D-fallback, <N_skip> skipped (total input: <N>)

```
[RESULT] conf_sdf=/abs/path.sdf
```
```
[RESULT] conf_xyz=/abs/path.xyz
```

[RESULT] fallback_csv=/abs/path.fallback.csv

(only if any 2D fallbacks occurred)

[RESULT] skipped_csv=/abs/path.skipped.csv

(only if any SMILES were skipped)

Quick Start

Check CLI help:

uv run <skill_path>/scripts/rdkit_conf_helper.py --help
uv run <skill_path>/scripts/rdkit_conf_helper.py conf --help

Disable environment printing (optional):

uv run <skill_path>/scripts/rdkit_conf_helper.py --no-env conf --smiles "CCO" --output out.sdf

Core Tasks

1) Generate 3D conformers (SDF output, default)

Single SMILES:

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --smiles "CCO" \
    --output /tmp/CCO.sdf

Single SMILES with a custom molecule name:

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --smiles "c1ccccc1" \
    --name benzene \
    --output /tmp/benzene.sdf

From CSV (default SMILES column:

smiles

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv \
    --smiles-col smiles \
    --output data.sdf

From CSV with a name column:

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv \
    --smiles-col smiles \
    --name-col compound_id \
    --output data.sdf

From SMI (second token per line is used as name automatically):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file molecules.smi \
    --output molecules.sdf

2) Control conformer sampling count

Default (10 conformers sampled, lowest-energy kept):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --output data.sdf

Single conformer (fastest, least thorough):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --num-confs 1 --output data.sdf

Increase sampling for flexible or macrocyclic molecules:

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --num-confs 50 --output data.sdf

3) Choose force-field minimization

MMFF94s (default, falls back to UFF if unavailable):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --ff mmff94s --output data.mmff.sdf

UFF (universal force field):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --ff uff --output data.uff.sdf

Skip force-field optimization (raw ETKDG geometry only):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --ff none --output data.etkdg_raw.sdf

4) XYZ output

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv \
    --format xyz \
    --output data.xyz

5) Tuning embedding for difficult molecules

Large or macrocyclic molecules sometimes fail standard ETKDG; try random initial coordinates:

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file macrocycles.csv \
    --use-random-coords \
    --max-attempts 500 \
    --output macrocycles.sdf

Use a different random seed (reproducibility):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --seed 123 --output data.seed123.sdf

Non-deterministic embedding (seed = -1):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --seed -1 --output data.sdf

6) Suppress hydrogen addition

By default explicit H atoms are added before embedding for more accurate 3D geometry. Use

--no-hs

to keep the molecule as-is (heavy atoms only):

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv --no-hs --output data.noh.sdf

7) Custom log file paths

uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
    --file data.csv \
    --output data.sdf \
    --error-log logs/skipped.csv \
    --fallback-log logs/used_2d.csv

3D Embedding Pipeline Details

For each molecule, the script runs the following steps in order:

Parse SMILES via
```
Chem.MolFromSmiles
```
.
Add hydrogens (
```
Chem.AddHs
```
) -- skipped with
```
--no-hs
```
.
Multi-conformer 3D embedding (
```
EmbedMultipleConfs
```
,
```
--num-confs
```
candidates, default 10): tries ETKDGv3, then ETKDGv2, then ETDG, then ETDG+
```
useRandomCoords
```
as a fallback chain until at least one conformer is embedded.
Force-field minimization (if
```
--ff
```
is not
```
none
```
): each successfully embedded conformer is individually optimized. MMFF94s transparently falls back to UFF if parameters are unavailable for that molecule.
Lowest-energy selection: the conformer with the minimum post-optimization energy is retained; all others are discarded. If
```
--ff none
```
, the first embedded conformer is kept without energy ranking.
2D fallback (if all 3D attempts yield zero conformers): generates a flat 2D layout via
```
Compute2DCoords
```
(Z=0 for all atoms), prints a
```
[WARN]
```
to stderr, and records the molecule in the fallback log.

Output Format Notes

SDF output (

--format sdf

, default):

Standard V2000 multi-molecule SDF, one conformer per molecule.
Molecule name (from
```
--name
```
,
```
--name-col
```
, or auto-generated
```
mol_<i>
```
) is written to the SDF header line.
Compatible with most cheminformatics tools (RDKit, OpenBabel, Schrodinger, etc.).

XYZ output (

--format xyz

Concatenated XYZ blocks (element, x, y, z per atom).
Molecule name is written as the comment line (second line of each block).
Coordinates are in Angstroms.
Note: if
```
--no-hs
```
is used, hydrogen atoms are absent from the XYZ.

Fallback log (

*.fallback.csv

Written only when at least one molecule fell back to 2D.
Columns:
```
idx
```
,
```
smiles
```
,
```
name
```
,
```
dim
```
(always 2),
```
ff
```
(always
```
2d_fallback
```
),
```
note
```
.

Skipped log (

*.skipped.csv

Written only when at least one SMILES was skipped.
Columns:
```
idx
```
,
```
smiles
```
,
```
error
```
.

Agent Checklist

When using this skill for users:

Confirm input format:
- ```
.csv
```
  requires a SMILES column (default
```
smiles
```
  )
- ```
.smi
```
  uses the first token per line as SMILES, second token (if present) as name
Quote SMILES containing special shell characters (brackets/parentheses):
- Example:
```
--smiles "[C@@H](O)(F)Cl"
```
For CSV workflows, verify column names:
- ```
--smiles-col
```
  for the SMILES column
- ```
--name-col
```
  (optional) for molecule identifiers to embed in SDF/XYZ headers
Check the
```
[INFO] Done:
```
summary line for the 3D/2D/skip breakdown.
If 2D fallbacks occurred, inspect
```
*.fallback.csv
```
:
- Consider
```
--use-random-coords
```
  or
```
--max-attempts
```
  tuning for the affected SMILES.
- 2D conformers have Z=0 and are not suitable for 3D-based applications (docking, 3D QSAR).
Always capture absolute output paths:
- Look for
```
[RESULT] ...=/abs/path
```
  in stdout.

If debugging is needed, enable full traceback:

RDKIT_CONF_HELPER_TRACE=1 uv run <skill_path>/scripts/rdkit_conf_helper.py ...

References

RDKit conformer generation guide: https://www.rdkit.org/docs/GettingStartedInPython.html#working-with-3d-molecules
ETKDG paper: Riniker & Landrum, J. Chem. Inf. Model. 2015, 55, 2562
ETKDGv3: Wang et al., J. Chem. Inf. Model. 2020, 60, 2044