Computational-chemistry-agent-skills deepmd-train-se-e2-a
Train a DeePMD-kit model using the SE_E2_A (DeepPot-SE) descriptor with the PyTorch backend. Use when the user wants to train a classical deep potential model for a specific system, prepare training input JSON, run `dp --pt train`, monitor learning curves, freeze the model, and test it. SE_E2_A is the foundational two-body embedding descriptor suitable for most condensed-phase systems.
git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/machine-learning-potentials/deepmd-train-se-e2-a" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-deepmd-train-se-e2-a && rm -rf "$T"
machine-learning-potentials/deepmd-train-se-e2-a/SKILL.mdDeePMD-kit Training: SE_E2_A
Train a deep potential model using the SE_E2_A (Smooth Edition, two-body embedding, all information) descriptor. This is the foundational DeepPot-SE architecture suitable for most condensed-phase systems.
Quick Start
dp --pt train input.json
Agent Responsibilities
- Confirm the user has a working deepmd-kit environment with PyTorch backend.
- Collect the minimum required information:
- Training data paths (deepmd/npy or deepmd/hdf5 format)
- Validation data paths
- Element types (type_map)
- Target number of training steps
- Generate a complete
training configuration.input.json - Explain key hyperparameters if the user is unfamiliar.
- Run training and monitor the learning curve (
).lcurve.out - Freeze the trained model to
format..pth - Optionally test the model with
.dp test
Workflow
Step 1: Prepare Training Data
Training data must be in DeePMD format (deepmd/npy or deepmd/hdf5). Each system directory should contain:
system_dir/ ├── type.raw # atom type indices, one integer per atom ├── type_map.raw # element names, one per line └── set.000/ ├── coord.npy # coordinates (nframes, natoms*3) ├── energy.npy # energies (nframes, 1) ├── force.npy # forces (nframes, natoms*3) └── box.npy # cell vectors (nframes, 9)
If the user has DFT output (VASP OUTCAR, etc.), refer to the
dpdata-cli skill for format conversion.
Step 2: Write input.json
A complete SE_E2_A training configuration:
{ "model": { "type_map": [ "O", "H" ], "descriptor": { "type": "se_e2_a", "sel": [ 46, 92 ], "rcut_smth": 0.5, "rcut": 6.0, "neuron": [ 25, 50, 100 ], "resnet_dt": false, "axis_neuron": 16, "type_one_side": true, "seed": 1 }, "fitting_net": { "neuron": [ 240, 240, 240 ], "resnet_dt": true, "seed": 1 } }, "learning_rate": { "type": "exp", "decay_steps": 5000, "start_lr": 0.001, "stop_lr": 3.51e-08 }, "loss": { "type": "ener", "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0.02, "limit_pref_v": 1 }, "training": { "training_data": { "systems": [ "./data/train_system_0", "./data/train_system_1" ], "batch_size": "auto" }, "validation_data": { "systems": [ "./data/valid_system_0" ], "batch_size": 1, "numb_btch": 3 }, "numb_steps": 400000, "seed": 10, "disp_file": "lcurve.out", "disp_freq": 100, "save_freq": 10000 } }
If you do not want to train on virial, set the virial prefactors to 0.
SE_E2_A uses different default loss prefactors compared to DPA3 (e: 0.02→1, f: 1000→1 vs. e: 0.2→20, f: 100→60, v: 0.02→1).
The meaning of each parameter can be generated through
dp doc-train-input.
Considering the output RST documentation on the screen is very long, use grep to find the documentation of a specific parameter:
dp doc-train-input | grep -A 7 training/numb_steps dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[se_e2_a\]/sel'
Step 3: Run Training
dp --pt train input.json
To restart from a checkpoint:
dp --pt train input.json --restart model.ckpt.pt
To initialize from an existing model:
dp --pt train input.json --init-model model.ckpt.pt
Step 4: Monitor Training
The learning curve is written to
lcurve.out with columns:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn rmse_v_val rmse_v_trn lr
: energy RMSE per atom (eV/atom)rmse_e_*
: force RMSE (eV/A)rmse_f_*
: virial RMSE (eV/atom, only present if virial data is available)rmse_v_*
: current learning ratelr
Quick visualization:
import numpy as np import matplotlib.pyplot as plt data = np.genfromtxt("lcurve.out", names=True) for name in data.dtype.names[1:-1]: plt.plot(data["step"], data[name], label=name) plt.legend() plt.xlabel("Step") plt.ylabel("Loss") plt.xscale("symlog") plt.yscale("log") plt.grid() plt.show()
Step 5: Freeze the Model
dp --pt freeze -o model.pth
Step 6: Test the Model
dp --pt test -m model.pth -s /path/to/test_system -n 30
Key Hyperparameters
Descriptor
| Parameter | Description | Typical Value |
|---|---|---|
| Cutoff radius (A) | 6.0 |
| Smooth cutoff start (A) | 0.5 |
| Max neighbors per type | System-dependent |
| Embedding net sizes | [25, 50, 100] |
| Axis matrix dimension | 16 |
| Share embedding across center types | true |
Fitting Net
| Parameter | Description | Typical Value |
|---|---|---|
| Hidden layer sizes | [240, 240, 240] |
| Use timestep in ResNet | true |
Loss Prefactors
| JSON keys | Description | Start | Limit |
|---|---|---|---|
/ | Energy weight | 0.02 | 1 |
/ | Force weight | 1000 | 1 |
/ | Virial weight (optional) | 0.02 | 1 |
Here,
start_pref_* and limit_pref_* set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. For virial, set to 0 if not training on virial data.
Training
| Parameter | Description | Typical Value |
|---|---|---|
| Total training steps | 400000-1000000 |
| Frames per step | "auto" or "auto:32" |
| Initial learning rate | 0.001 |
| Final learning rate | 3.51e-8 |
| LR decay interval | 5000 |
Setting sel
selsel is a list with one entry per element type, specifying the maximum number of neighbors of that type within rcut. To determine appropriate values:
dp --pt neighbor-stat -s /path/to/data -r 6.0 -t O H
Use values slightly above the reported maximum.
Agent Checklist
- Training data exists and is in deepmd format
-
matches the elements in the datatype_map -
is appropriate for the system (usesel
if unsure)dp neighbor-stat -
is reasonable for the system (typically 6.0-9.0 A)rcut - Training completes without NaN in
lcurve.out - Model is frozen to
after training.pth - Test RMSE values are reported to the user