Computational-chemistry-agent-skills unimol
install
source · Clone the upstream repo
git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/molecular-representation/unimol" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-unimol && rm -rf "$T"
manifest:
molecular-representation/unimol/SKILL.mdsource content
Uni-Mol
This skill provides practical command patterns for Uni-Mol molecular representation / training / prediction using the standardized CLI wrapper:
<skill_path>/scripts/unimol_helper.py.
Key behaviors (important for Agents):
- The script prints environment detection (Python/Torch/CUDA) by default.
- Bad/illegal SMILES are skipped and logged to
(no crash).*.skipped.csv - Each run ends by printing absolute output paths like:
[RESULT] repr_npy=/abs/path.npy[RESULT] model_dir=/abs/model_dir[RESULT] pred_csv=/abs/pred.csv
Quick Start
Check CLI help:
uv run python <skill_path>/scripts/unimol_helper.py --help
Check subcommand help:
uv run python <skill_path>/scripts/unimol_helper.py repr --help uv run python <skill_path>/scripts/unimol_helper.py train --help uv run python <skill_path>/scripts/unimol_helper.py predict --help
Disable environment printing (optional):
uv run python <skill_path>/scripts/unimol_helper.py --no-env repr --smiles "CCO" --output out.npy
Core Tasks
1) Extract molecular representations (embedding) to .npy
Single SMILES:
uv run python <skill_path>/scripts/unimol_helper.py repr \ --smiles "CCO" \ --output /tmp/ccO.repr.npy
From CSV (default SMILES column is
smiles):
uv run python <skill_path>/scripts/unimol_helper.py repr \ --file data.csv \ --smiles-col smiles \ --output data.repr.npy
From SMI:
uv run python <skill_path>/scripts/unimol_helper.py repr \ --file molecules.smi \ --output molecules.repr.npy
Force CPU / GPU:
# Force CPU uv run python <skill_path>/scripts/unimol_helper.py repr --smiles "CCO" --no-gpu --output out.npy # Force GPU (will warn & fall back if CUDA is unavailable) uv run python <skill_path>/scripts/unimol_helper.py repr --smiles "CCO" --use-gpu --output out.npy
2) Train a property model (classification / regression / multilabel_*)
Regression training (CSV must contain
smiles and target columns):
uv run python <skill_path>/scripts/unimol_helper.py train \ --task regression \ --input train.csv \ --smiles-col smiles \ --target-col target \ --epochs 50 \ --output ./model_reg
Classification training:
uv run python <skill_path>/scripts/unimol_helper.py train \ --task classification \ --input train.csv \ --smiles-col smiles \ --target-col target \ --epochs 50 \ --output ./model_cls
Multilabel regression training (explicit multi-target columns):
uv run python <skill_path>/scripts/unimol_helper.py train \ --task multilabel_regression \ --input train.csv \ --smiles-col smiles \ --target-cols target_0,target_1,target_2 \ --epochs 50 \ --output ./model_mreg
Multilabel classification training:
uv run python <skill_path>/scripts/unimol_helper.py train \ --task multilabel_classification \ --input train.csv \ --smiles-col smiles \ --target-cols y_cls_0,y_cls_1,y_cls_2 \ --epochs 50 \ --output ./model_mcls
Target recognition for training:
- Single-task (
/classification
): useregression
(default--target-col
).target - Multilabel tasks: prefer
(comma-separated).--target-cols - If
is omitted for multilabel tasks, the helper auto-detects columns named--target-cols
or prefixed withtarget
(case-insensitive).target_
Force CPU:
uv run python <skill_path>/scripts/unimol_helper.py train \ --task regression \ --input train.csv \ --epochs 50 \ --output ./model_cpu \ --no-cuda
3) Predict properties to .csv
Predict from CSV:
uv run python <skill_path>/scripts/unimol_helper.py predict \ --model ./model_reg \ --input test.csv \ --smiles-col smiles \ --output pred.csv
Predict from SMI:
uv run python <skill_path>/scripts/unimol_helper.py predict \ --model ./model_reg \ --input test.smi \ --output pred.csv
Notes:
- Output CSV contains the input rows (for valid SMILES) plus
/pred
columns.pred_* - If there are bad SMILES, they are skipped and saved to
(or yourpred.csv.skipped.csv
path).--error-log
Agent Checklist
When using this skill for users:
- Confirm input format:
requires a SMILES column (default.csv
)smiles
uses the first token of each line as SMILES.smi
- Quote SMILES containing special characters (brackets/parentheses):
- Example:
--smiles "[C]([H])([H])[H]"
- Example:
- For CSV workflows, verify column names:
:repr--smiles-col
:train
and--smiles-col
/--target-col--target-cols
:predict--smiles-col
- Watch for skipped SMILES:
- Check
and decide whether to fix or permanently drop them*.skipped.csv
- Check
- Always capture absolute output paths:
- Look for
in stdout[RESULT] ...=/abs/path
- Look for
- If debugging is needed, enable full traceback:
UNIMOL_HELPER_TRACE=1 uv run python <skill_path>/scripts/unimol_helper.py ...
References
- Uni-Mol project: https://github.com/fanxiaoyu0/Uni-Mol
- RDKit: https://www.rdkit.org/