Computational-chemistry-agent-skills deepmd-train-se-e2-a

Train a DeePMD-kit model using the SE_E2_A (DeepPot-SE) descriptor with the PyTorch backend. Use when the user wants to train a classical deep potential model for a specific system, prepare training input JSON, run `dp --pt train`, monitor learning curves, freeze the model, and test it. SE_E2_A is the foundational two-body embedding descriptor suitable for most condensed-phase systems.

install
source · Clone the upstream repo
git clone https://github.com/jinzhezenggroup/computational-chemistry-agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jinzhezenggroup/computational-chemistry-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/machine-learning-potentials/deepmd-train-se-e2-a" ~/.claude/skills/jinzhezenggroup-computational-chemistry-agent-skills-deepmd-train-se-e2-a && rm -rf "$T"
manifest: machine-learning-potentials/deepmd-train-se-e2-a/SKILL.md
source content

DeePMD-kit Training: SE_E2_A

Train a deep potential model using the SE_E2_A (Smooth Edition, two-body embedding, all information) descriptor. This is the foundational DeepPot-SE architecture suitable for most condensed-phase systems.

Quick Start

dp --pt train input.json

Agent Responsibilities

  1. Confirm the user has a working deepmd-kit environment with PyTorch backend.
  2. Collect the minimum required information:
    • Training data paths (deepmd/npy or deepmd/hdf5 format)
    • Validation data paths
    • Element types (type_map)
    • Target number of training steps
  3. Generate a complete
    input.json
    training configuration.
  4. Explain key hyperparameters if the user is unfamiliar.
  5. Run training and monitor the learning curve (
    lcurve.out
    ).
  6. Freeze the trained model to
    .pth
    format.
  7. Optionally test the model with
    dp test
    .

Workflow

Step 1: Prepare Training Data

Training data must be in DeePMD format (deepmd/npy or deepmd/hdf5). Each system directory should contain:

system_dir/
├── type.raw          # atom type indices, one integer per atom
├── type_map.raw      # element names, one per line
└── set.000/
    ├── coord.npy     # coordinates (nframes, natoms*3)
    ├── energy.npy    # energies (nframes, 1)
    ├── force.npy     # forces (nframes, natoms*3)
    └── box.npy       # cell vectors (nframes, 9)

If the user has DFT output (VASP OUTCAR, etc.), refer to the

dpdata-cli
skill for format conversion.

Step 2: Write input.json

A complete SE_E2_A training configuration:

{
  "model": {
    "type_map": [
      "O",
      "H"
    ],
    "descriptor": {
      "type": "se_e2_a",
      "sel": [
        46,
        92
      ],
      "rcut_smth": 0.5,
      "rcut": 6.0,
      "neuron": [
        25,
        50,
        100
      ],
      "resnet_dt": false,
      "axis_neuron": 16,
      "type_one_side": true,
      "seed": 1
    },
    "fitting_net": {
      "neuron": [
        240,
        240,
        240
      ],
      "resnet_dt": true,
      "seed": 1
    }
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 5000,
    "start_lr": 0.001,
    "stop_lr": 3.51e-08
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0.02,
    "limit_pref_v": 1
  },
  "training": {
    "training_data": {
      "systems": [
        "./data/train_system_0",
        "./data/train_system_1"
      ],
      "batch_size": "auto"
    },
    "validation_data": {
      "systems": [
        "./data/valid_system_0"
      ],
      "batch_size": 1,
      "numb_btch": 3
    },
    "numb_steps": 400000,
    "seed": 10,
    "disp_file": "lcurve.out",
    "disp_freq": 100,
    "save_freq": 10000
  }
}

If you do not want to train on virial, set the virial prefactors to 0.

SE_E2_A uses different default loss prefactors compared to DPA3 (e: 0.02→1, f: 1000→1 vs. e: 0.2→20, f: 100→60, v: 0.02→1).

The meaning of each parameter can be generated through

dp doc-train-input
. Considering the output RST documentation on the screen is very long, use
grep
to find the documentation of a specific parameter:

dp doc-train-input | grep -A 7 training/numb_steps
dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[se_e2_a\]/sel'

Step 3: Run Training

dp --pt train input.json

To restart from a checkpoint:

dp --pt train input.json --restart model.ckpt.pt

To initialize from an existing model:

dp --pt train input.json --init-model model.ckpt.pt

Step 4: Monitor Training

The learning curve is written to

lcurve.out
with columns:

#  step  rmse_val  rmse_trn  rmse_e_val  rmse_e_trn  rmse_f_val  rmse_f_trn  rmse_v_val  rmse_v_trn  lr
  • rmse_e_*
    : energy RMSE per atom (eV/atom)
  • rmse_f_*
    : force RMSE (eV/A)
  • rmse_v_*
    : virial RMSE (eV/atom, only present if virial data is available)
  • lr
    : current learning rate

Quick visualization:

import numpy as np
import matplotlib.pyplot as plt

data = np.genfromtxt("lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
    plt.plot(data["step"], data[name], label=name)
plt.legend()
plt.xlabel("Step")
plt.ylabel("Loss")
plt.xscale("symlog")
plt.yscale("log")
plt.grid()
plt.show()

Step 5: Freeze the Model

dp --pt freeze -o model.pth

Step 6: Test the Model

dp --pt test -m model.pth -s /path/to/test_system -n 30

Key Hyperparameters

Descriptor

ParameterDescriptionTypical Value
rcut
Cutoff radius (A)6.0
rcut_smth
Smooth cutoff start (A)0.5
sel
Max neighbors per typeSystem-dependent
neuron
Embedding net sizes[25, 50, 100]
axis_neuron
Axis matrix dimension16
type_one_side
Share embedding across center typestrue

Fitting Net

ParameterDescriptionTypical Value
neuron
Hidden layer sizes[240, 240, 240]
resnet_dt
Use timestep in ResNettrue

Loss Prefactors

JSON keysDescriptionStartLimit
start_pref_e
/
limit_pref_e
Energy weight0.021
start_pref_f
/
limit_pref_f
Force weight10001
start_pref_v
/
limit_pref_v
Virial weight (optional)0.021

Here,

start_pref_*
and
limit_pref_*
set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. For virial, set to 0 if not training on virial data.

Training

ParameterDescriptionTypical Value
numb_steps
Total training steps400000-1000000
batch_size
Frames per step"auto" or "auto:32"
start_lr
Initial learning rate0.001
stop_lr
Final learning rate3.51e-8
decay_steps
LR decay interval5000

Setting
sel

sel
is a list with one entry per element type, specifying the maximum number of neighbors of that type within
rcut
. To determine appropriate values:

dp --pt neighbor-stat -s /path/to/data -r 6.0 -t O H

Use values slightly above the reported maximum.

Agent Checklist

  • Training data exists and is in deepmd format
  • type_map
    matches the elements in the data
  • sel
    is appropriate for the system (use
    dp neighbor-stat
    if unsure)
  • rcut
    is reasonable for the system (typically 6.0-9.0 A)
  • Training completes without NaN in
    lcurve.out
  • Model is frozen to
    .pth
    after training
  • Test RMSE values are reported to the user

References