Computational-chemistry-agent-skills dpgen-simplify
Prepare, explain, validate, and run DP-GEN simplify workflows for reducing repeated or redundant DeepMD datasets. Use when the user wants to generate or modify `param.json` and `machine.json`, run `dpgen simplify param.json machine.json`, organize repeated simplify experiments, or inspect simplify outputs.
git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/machine-learning-potentials/dpgen-simplify" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-dpgen-simplify && rm -rf "$T"
machine-learning-potentials/dpgen-simplify/SKILL.mdDP-GEN Simplify
Use this skill when the user wants to prepare, explain, validate, or execute the
dpgen simplify workflow.
This skill is for dataset simplification workflows where the user already has candidate data in DeepMD-compatible format and wants to reduce repeated or redundant structures through iterative selection.
Core Rule (Critical)
DP-GEN simplify always uses two parameter classes and therefore two JSON files:
- Workflow parameters ->
param.json - Execution / machine parameters ->
machine.json
Run exactly:
dpgen simplify param.json machine.json
Environment boundary rule:
- Outer layer: run
in an activated environment wheredpgen simplify param.json machine.json
works.dpgen --version - Inner layer: for scheduler stages, explicitly activate runtime in
on the server side.resources.source_list
Agent responsibilities
When using this skill, the agent should:
- confirm that the task is a simplify workflow
- check whether existing configs or templates are already available
- collect only the missing dataset, training, FP, and machine inputs
- generate or patch
param.json - generate or patch
machine.json - explain important simplify parameters in plain language when asked
- validate the workflow before execution
- provide the exact command for running simplify
- if requested, help structure repeated experiments
- after execution, summarize outputs and next inspection targets
Working policy
1. Ask only for missing inputs
Do not ask the user for everything if part of the configuration is already available.
If the user already provides:
- a partial
param.json - a partial
machine.json - a known training template
- a known cluster template
then patch those files instead of rebuilding everything from scratch.
2. Preserve the user's scientific choices
Do not silently change:
- descriptor family
- fitting net structure
- fp backend
- trust thresholds
orderingtype_map
If a value looks scientifically questionable, explain the concern instead of silently replacing it.
3. Keep local and scheduler execution explicit
If the user wants local execution, produce local-friendly commands.
If the user wants scheduler execution, produce scheduler-friendly commands and keep queue, partition, and resource requests explicit.
Do not invent scheduler module names or executable paths.
4. Do not invent environment activation commands
If the user already has a working activation command such as:
conda activate ...module load ...source ...
reuse it exactly.
If execution is requested and the activation method is unknown, ask the user for the precise activation command.
Do not guess conda environment names, module names, or site-specific paths.
4.1 Outer launcher policy
Use an activated DP-GEN environment and verify with:
dpgen --version
Do not start simplify from a shell where
dpgen is unavailable.
4.2 Outer vs inner runtime boundaries (critical)
Treat simplify execution as two separate environment layers:
- Outer layer: the shell that launches
(must havedpgen simplify param.json machine.json
in PATH)dpgen - Inner layer: stage tasks dispatched by DP-GEN (
/train
/model_devi
) on server/runtime sidefp
Even if the outer layer is correct, inner stage tasks still need explicit runtime setup in
machine.json.
Do not assume the outer shell environment will be inherited by dispatched stage jobs.
For scheduler-style execution, resources.source_list must explicitly activate the required runtime environment.
5. Prefer reproducible output layout
When generating a simplify workflow, keep files organized and predictable.
Recommended structure:
project/ ├── param.json ├── machine.json ├── run.sh ├── logs/ └── summary/
For repeated experiments:
project/ ├── base/ ├── exp_01/ ├── exp_02/ ├── exp_03/ └── summary/
Minimum required inputs
Collect the following information before generating files.
Dataset information
pick_datasys_configsinit_data_prefixinit_data_syssys_batch_size- dataset format
type_map
if neededmass_maplabeled
Simplify controls
init_pick_numberiter_pick_numbermodel_devi_f_trust_lomodel_devi_f_trust_hi
/model_devi_e_trust_lo
if energy trust is usedmodel_devi_e_trust_hi
if not already specifiednumb_models
Training setup
if required by environment (for exampletrain_backend
)pytorchdefault_training_param- descriptor settings
- fitting network settings
- learning rate settings
- loss settings
- training step settings
FP setup
fp_style- If data is already labeled (energy/force/virial available) and no re-labeling is requested, set
tofp_style
.none - if
, collect matching FP runtime settings such as:fp_style != "none"fp_task_maxfp_task_minfp_params- pseudopotential or backend file paths if required
Execution setup
For each stage
train, model_devi, and fp, collect or preserve:
commandmachine.batch_typemachine.context_typemachine.local_rootmachine.remote_rootresources.number_noderesources.cpu_per_noderesources.gpu_per_noderesources.group_size
(required for scheduler jobs; use it to activate environment explicitly)resources.source_list- any explicit queue / partition / custom scheduler flags if the user already uses them
Choose a runtime profile first, then fill the matching template:
- server-local Slurm:
assets/machine.template.server-local-slurm.json - local machine -> remote Slurm via SSH:
assets/machine.template.ssh-remote-slurm.json - pure local shell testing:
assets/machine.template.local-shell.json
How to build param.json
param.jsonConstruct
param.json around these logical blocks:
- element and mass definitions
- data source and batch settings
- model ensemble count
- default DeePMD training parameters
- FP backend settings
- simplify pick settings
- trust thresholds
Key fields usually include:
type_mapmass_mappick_datainit_data_prefixinit_data_syssys_batch_sizenumb_modelsdefault_training_paramfp_styleshuffle_poscarfp_task_maxfp_task_minfp_pp_pathfp_pp_filesfp_paramsinit_pick_numberiter_pick_numbermodel_devi_f_trust_lomodel_devi_f_trust_hi
If the user is doing grid experiments, keep a base template and derive variants from it.
Official reference example (QM7-style, adapted with path placeholders):
assets/param.example.qm7.from-official-docs.json
How to build machine.json
machine.jsonConstruct
machine.json with separate stage blocks for:
trainmodel_devifp
For each stage, keep the following explicit:
command- machine or context configuration
- resources
- queue or partition if needed
- cpu and gpu counts
- custom scheduler flags
- environment activation commands
Do not merge all stages into one vague machine block.
Validation before run
Before execution, validate the workflow in this order:
- confirm outer-layer
is available:dpgen
dpgen --version
- validate JSON syntax:
python -m json.tool param.json python -m json.tool machine.json
- verify required dataset paths exist
- verify stage commands match the selected software stack
- if
isfp_style
, do not require FP-specific backend settingsnone - only then run:
dpgen simplify param.json machine.json
Output contract
Always provide:
- final absolute paths to
andparam.jsonmachine.json - the exact simplify command to run (
)dpgen simplify param.json machine.json - a short pre-run checklist
- any unresolved required fields
- if execution was performed, the main output locations and next files to inspect
Guardrails
- Never merge workflow and machine parameters into one file.
- Never run
before both JSON files are present.dpgen simplify - Never hardcode personal cluster, account, queue, or path settings as universal defaults.
- Never silently change the user's scientific choices.
- Keep
ordering consistent with dataset typing.type_map - If required inputs are missing, stop and ask instead of guessing.
- If
isfp_style
, skip FP-specific prompts and keep FP-specific settings disabled or unset.none - If data is already labeled and the user does not request new labels, enforce
and do not require active FP runtime fields.fp_style = "none" - Do not assume outer-shell activation is inherited by stage jobs; for scheduler execution, require explicit
per stage.source_list - If the user already has working templates, patch them rather than overwriting them blindly.
References and bundled files
Use these bundled files:
assets/param.template.jsonassets/param.example.qm7.from-official-docs.jsonassets/machine.template.jsonassets/machine.template.server-local-slurm.jsonassets/machine.template.ssh-remote-slurm.jsonassets/machine.template.local-shell.jsonreferences/param-fields.mdreferences/machine-fields.mdreferences/workflow-notes.md
External references:
- DP-GEN simplify overview: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify.html
- simplify parameter definitions: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html
- simplify machine definitions: https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-mdata.html