Awesome-Agent-Skills-for-Empirical-Research computational-chemistry-guide

DFT, molecular simulation, and reaction prediction tools for chemists

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/domains/chemistry/computational-chemistry-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-computational-che && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/domains/chemistry/computational-chemistry-guide/SKILL.md
source content

Computational Chemistry Guide

Overview

Computational chemistry bridges quantum mechanics and practical chemistry, enabling researchers to predict molecular properties, reaction mechanisms, and material behaviors without stepping into a wet lab. From drug design to catalyst optimization, computational methods accelerate discovery by screening thousands of candidates before committing to synthesis.

This guide covers the major computational chemistry paradigms: Density Functional Theory (DFT) for electronic structure calculations, molecular dynamics (MD) for simulating atomic motion, machine learning potentials for scaling up simulations, and reaction prediction tools for retrosynthesis and mechanism elucidation. Each section includes tool recommendations, typical workflows, and code examples.

Whether you are a chemistry PhD student running your first Gaussian calculations, a materials scientist exploring new alloys with VASP, or a medicinal chemist using ML-based property prediction, this skill provides the conceptual framework and practical recipes to get productive quickly.

Density Functional Theory (DFT)

When to Use DFT

DFT is the workhorse of quantum chemistry. It provides a good balance of accuracy and computational cost for systems of up to a few hundred atoms.

PropertyDFT SuitabilityTypical Error
Molecular geometryExcellent< 0.02 Angstrom
Vibrational frequenciesGood3-5%
Reaction barriersGood with correction2-5 kcal/mol
Band gapsFair (tends to underestimate)0.5-1.0 eV
Van der Waals interactionsRequires dispersion correctionVaries
Excited statesFair (TD-DFT)0.2-0.5 eV

Software Comparison

SoftwareLicenseStrengthsBasis Sets
GaussianCommercialBroad functionality, well-documentedGaussian-type
ORCAFree (academic)DFT + wavefunction methods, excellent supportGaussian-type
VASPCommercialPeriodic systems, materials sciencePlane-wave
Quantum ESPRESSOOpen sourcePeriodic DFT, phononsPlane-wave
Psi4Open sourceReference implementations, Python APIGaussian-type
CP2KOpen sourceMixed Gaussian/plane-wave, large systemsMixed

ORCA DFT Workflow Example

# geometry_optimization.inp
! B3LYP def2-TZVP D3BJ OPT FREQ
# B3LYP functional, triple-zeta basis, D3 dispersion, optimize + frequencies

%pal
  nprocs 8
end

%maxcore 4000

* xyz 0 1
C  0.000  0.000  0.000
O  1.200  0.000  0.000
H -0.500  0.866  0.000
H -0.500 -0.866  0.000
*

Run with:

orca geometry_optimization.inp > geometry_optimization.out

Analyzing DFT Results with Python

from ase.io import read
from ase.visualize import view

# Read optimized geometry from ORCA output
atoms = read('geometry_optimization.xyz')

# Extract energies from output file
import re

with open('geometry_optimization.out') as f:
    text = f.read()

# Total energy
energy = float(re.search(r'FINAL SINGLE POINT ENERGY\s+([-\d.]+)', text).group(1))
print(f"Total energy: {energy:.6f} Hartree")
print(f"Total energy: {energy * 627.509:.2f} kcal/mol")

# Thermochemistry
gibbs_match = re.search(r'Final Gibbs free energy\s+\.\.\.\s+([-\d.]+)', text)
if gibbs_match:
    gibbs = float(gibbs_match.group(1))
    print(f"Gibbs free energy: {gibbs:.6f} Hartree")

Molecular Dynamics Simulations

MD Pipeline

Initial Structure (.pdb/.mol2)
    |
    v
[Parameterization] --> Force field assignment (AMBER, CHARMM, OPLS)
    |
    v
[Solvation] --> Add solvent box, ions
    |
    v
[Minimization] --> Energy minimization (steepest descent)
    |
    v
[Equilibration] --> NVT then NPT ensemble (100 ps - 1 ns)
    |
    v
[Production] --> NPT ensemble (10 ns - microseconds)
    |
    v
[Analysis] --> RMSD, RMSF, hydrogen bonds, free energy

OpenMM Quick Start

from openmm.app import *
from openmm import *
from openmm.unit import *

# Load structure
pdb = PDBFile('protein.pdb')
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')

# Create system
modeller = Modeller(pdb.topology, pdb.positions)
modeller.addSolvent(forcefield, model='tip3p', padding=1.0*nanometers)

system = forcefield.createSystem(
    modeller.topology,
    nonbondedMethod=PME,
    nonbondedCutoff=1.0*nanometers,
    constraints=HBonds
)

# Set up simulation
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)

# Minimize
simulation.minimizeEnergy()

# Run production (10 ns)
simulation.reporters.append(DCDReporter('trajectory.dcd', 1000))
simulation.reporters.append(
    StateDataReporter('log.csv', 1000, step=True,
                      potentialEnergy=True, temperature=True)
)
simulation.step(2500000)  # 10 ns at 4 fs timestep

Machine Learning in Computational Chemistry

ML Potential Energy Surfaces

Machine learning potentials achieve near-DFT accuracy at a fraction of the cost:

MethodSpeed vs DFTAccuracyTraining Data
ANI1000x faster~1 kcal/molPre-trained
SchNet100-1000x~1 kcal/mol1K-100K configs
MACE100-1000x< 1 kcal/mol1K-100K configs
GemNet100-1000x< 1 kcal/mol1K-100K configs

Property Prediction with RDKit

from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem
import numpy as np

def compute_molecular_features(smiles):
    """Compute molecular descriptors from SMILES string."""
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None

    features = {
        'molecular_weight': Descriptors.MolWt(mol),
        'logp': Descriptors.MolLogP(mol),
        'hbd': Descriptors.NumHDonors(mol),
        'hba': Descriptors.NumHAcceptors(mol),
        'tpsa': Descriptors.TPSA(mol),
        'rotatable_bonds': Descriptors.NumRotatableBonds(mol),
        'aromatic_rings': Descriptors.NumAromaticRings(mol),
        'heavy_atoms': mol.GetNumHeavyAtoms(),
    }

    # Morgan fingerprint (ECFP4)
    fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)
    features['fingerprint'] = np.array(fp)

    return features

# Lipinski's Rule of Five check
def check_druglikeness(smiles):
    feats = compute_molecular_features(smiles)
    if feats is None:
        return False
    return (feats['molecular_weight'] <= 500 and
            feats['logp'] <= 5 and
            feats['hbd'] <= 5 and
            feats['hba'] <= 10)

Reaction Prediction

Retrosynthesis Tools

ToolApproachAccess
ASKCOSTemplate-based + MLMIT, web interface
IBM RXNTransformer-basedFree API
SyntheseusMulti-model frameworkOpen source
RetroTRAETransformerOpen source

Using RDKit for Reaction Processing

from rdkit.Chem import AllChem, Draw

# Define a reaction (Suzuki coupling)
rxn_smarts = '[c:1][B](O)O.[c:2][Cl]>>[c:1][c:2]'
rxn = AllChem.ReactionFromSmarts(rxn_smarts)

# Apply reaction
reactant1 = Chem.MolFromSmiles('c1ccc(B(O)O)cc1')  # Phenylboronic acid
reactant2 = Chem.MolFromSmiles('c1ccc(Cl)cc1')      # Chlorobenzene

products = rxn.RunReactants((reactant1, reactant2))
for product_set in products:
    for product in product_set:
        print(Chem.MolToSmiles(product))  # Biphenyl

Best Practices

  • Benchmark your method. Always validate your computational protocol against known experimental data before applying it to new systems.
  • Use appropriate levels of theory. Do not use MP2 when B3LYP suffices, and do not use B3LYP when you need CCSD(T) accuracy.
  • Include dispersion corrections. D3BJ or D4 corrections are essential for non-covalent interactions.
  • Check convergence. Verify that geometry optimizations, SCF calculations, and MD simulations have properly converged.
  • Report computational details completely. Functional, basis set, dispersion correction, solvent model, and software version should all be stated.
  • Archive your input/output files. Computational chemistry is reproducible only if all parameters are preserved.

References