SciAgent-Skills mdanalysis-trajectory

Python library for analyzing molecular dynamics (MD) trajectories from GROMACS, AMBER, NAMD, CHARMM, and LAMMPS. Reads topology and trajectory files into Universe objects; supports RMSD, RMSF, radius of gyration, contact maps, hydrogen bond analysis, PCA, and custom distance/angle calculations across millions of frames. Use for structural analysis after MD simulations; use OpenMM or GROMACS directly for running simulations.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/structural-biology-drug-discovery/mdanalysis-trajectory" ~/.claude/skills/jaechang-hits-sciagent-skills-mdanalysis-trajectory && rm -rf "$T"
manifest: skills/structural-biology-drug-discovery/mdanalysis-trajectory/SKILL.md
source content

MDAnalysis — Molecular Dynamics Trajectory Analysis

Overview

MDAnalysis provides a uniform Python interface for reading and analyzing molecular dynamics trajectories regardless of MD engine (GROMACS, AMBER, NAMD, CHARMM, LAMMPS, OpenMM). It represents molecular systems as

Universe
objects containing an
AtomGroup
with positions, velocities, forces, and topology data. Trajectories are iterated frame-by-frame or analyzed in bulk using analysis modules for RMSD, RMSF, radius of gyration, hydrogen bonds, solvent-accessible surface area, and PCA. MDAnalysis integrates with NumPy, pandas, and matplotlib, making it the standard tool for post-simulation structural analysis in computational chemistry and drug discovery.

When to Use

  • Computing RMSD and RMSF of protein backbone or specific residue groups after MD simulation
  • Analyzing ligand binding stability: pocket RMSD, contact persistence, hydrogen bond occupancy
  • Performing principal component analysis (PCA) on trajectory conformations
  • Computing solvent-accessible surface area (SASA), radius of gyration, and end-to-end distance
  • Extracting representative cluster structures from long MD trajectories for visualization
  • Use GROMACS or AMBER analysis tools (
    gmx rms
    ,
    cpptraj
    ) instead for engine-specific analysis within a HPC pipeline
  • Use OpenMM or GROMACS directly for running MD simulations; MDAnalysis is for post-simulation analysis

Prerequisites

  • Python packages:
    MDAnalysis
    ,
    numpy
    ,
    matplotlib
    ,
    pandas
  • Input: topology file (.psf, .prmtop, .gro, .pdb) + trajectory file (.dcd, .trr, .xtc, .nc, .dms)
# Install MDAnalysis
pip install MDAnalysis

# Install with all analysis extras
pip install "MDAnalysis[analysis]"

# Verify
python -c "import MDAnalysis as mda; print(mda.__version__)"
# 2.7.0

Quick Start

import MDAnalysis as mda
import numpy as np

# Load a GROMACS topology + trajectory
u = mda.Universe("protein.gro", "trajectory.xtc")

print(f"Atoms: {u.atoms.n_atoms}")
print(f"Residues: {u.residues.n_residues}")
print(f"Frames: {u.trajectory.n_frames}")
print(f"First frame positions (first 3 atoms):\n{u.atoms.positions[:3]}")

Core API

Module 1: Universe and AtomGroup — Loading and Selecting Atoms

Load trajectories and select atom subsets.

import MDAnalysis as mda

# Load topology + trajectory (GROMACS xtc format)
u = mda.Universe("system.gro", "md_production.xtc")

# AtomGroup selections (CHARMM-style selection language)
protein = u.select_atoms("protein")
backbone = u.select_atoms("backbone")
ca_atoms = u.select_atoms("name CA")
ligand = u.select_atoms("resname LIG")
binding_site = u.select_atoms("protein and around 5.0 resname LIG")

print(f"Protein atoms: {protein.n_atoms}")
print(f"CA atoms: {ca_atoms.n_atoms}")
print(f"Ligand atoms: {ligand.n_atoms}")
print(f"Binding site residues: {binding_site.residues.n_residues}")

# Access atom properties at current frame
print(f"CA positions shape: {ca_atoms.positions.shape}")  # (N, 3)
print(f"Protein mass: {protein.total_mass():.1f} Da")

Module 2: Trajectory Iteration — Per-Frame Analysis

Iterate over trajectory frames for time-series analysis.

import MDAnalysis as mda
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")
backbone = u.select_atoms("backbone")

times = []
rg_values = []

for ts in u.trajectory:
    times.append(u.trajectory.time)
    rg_values.append(backbone.radius_of_gyration())

import pandas as pd
df = pd.DataFrame({"time_ps": times, "Rg_A": rg_values})
print(f"Frames analyzed: {len(df)}")
print(f"Mean Rg: {df['Rg_A'].mean():.2f} Å")
print(f"Rg std: {df['Rg_A'].std():.2f} Å")
df.to_csv("radius_of_gyration.csv", index=False)

Module 3: RMSD Analysis — Structural Drift Over Time

Compute backbone RMSD relative to a reference structure.

import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# RMSD of Cα atoms relative to first frame
rmsd = rms.RMSD(u, select="name CA")
rmsd.run()

# Results: frame, time (ps), RMSD (Å)
results = rmsd.results.rmsd
print(f"Mean RMSD: {results[:, 2].mean():.2f} Å")
print(f"Max RMSD: {results[:, 2].max():.2f} Å")

# Plot
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(results[:, 1] / 1000, results[:, 2], color="steelblue", lw=0.8)
ax.set_xlabel("Time (ns)")
ax.set_ylabel("RMSD (Å)")
ax.set_title("Backbone RMSD")
plt.tight_layout()
plt.savefig("rmsd.png", dpi=150)
print("Saved: rmsd.png")

Module 4: RMSF Analysis — Per-Residue Flexibility

Compute root-mean-square fluctuations to identify flexible regions.

import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# RMSF per Cα atom (after aligning trajectory)
ca_atoms = u.select_atoms("name CA")
rmsf_analysis = rms.RMSF(ca_atoms)
rmsf_analysis.run()

rmsf_values = rmsf_analysis.results.rmsf
resids = ca_atoms.resids

print(f"Most flexible residue: {resids[np.argmax(rmsf_values)]} ({rmsf_values.max():.2f} Å)")
print(f"Most rigid residue:    {resids[np.argmin(rmsf_values)]} ({rmsf_values.min():.2f} Å)")

# Plot B-factor-like profile
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(resids, rmsf_values, color="coral", lw=1)
ax.fill_between(resids, 0, rmsf_values, alpha=0.3, color="coral")
ax.set_xlabel("Residue ID")
ax.set_ylabel("RMSF (Å)")
ax.set_title("Per-residue RMSF")
plt.tight_layout()
plt.savefig("rmsf.png", dpi=150)

Module 5: Hydrogen Bond Analysis

Identify and count hydrogen bonds between protein and ligand over the trajectory.

import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis
import pandas as pd

u = mda.Universe("complex.gro", "trajectory.xtc")

# Protein-ligand hydrogen bond analysis
hbonds = HydrogenBondAnalysis(
    universe=u,
    donors_sel="protein",
    acceptors_sel="resname LIG",
    d_h_cutoff=1.2,      # donor-hydrogen distance cutoff (Å)
    d_a_cutoff=3.0,      # donor-acceptor distance cutoff (Å)
    d_h_a_angle_cutoff=150.0,  # angle cutoff (degrees)
)
hbonds.run()

# Get occupancy: fraction of frames with each H-bond
hbonds.generate_table()
df = pd.DataFrame(hbonds.table)
print(f"Total unique H-bonds observed: {len(df['donor_resid'].unique())}")

# Count H-bonds per frame
counts = hbonds.count_by_time()
print(f"Mean H-bonds per frame: {counts[:, 1].mean():.2f}")
print(f"Max H-bonds in one frame: {counts[:, 1].max():.0f}")

Module 6: PCA and Conformational Clustering

Extract principal modes of motion and cluster conformations.

import MDAnalysis as mda
from MDAnalysis.analysis import pca, align
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# Align trajectory to first frame
aligner = align.AlignTraj(u, u, select="backbone", in_memory=True)
aligner.run()

# PCA on backbone Cα atoms
ca = u.select_atoms("name CA")
pc = pca.PCA(u, select="backbone")
pc.run()

# Explained variance
cumvar = np.cumsum(pc.results.variance / pc.results.variance.sum())
n_for_90 = np.searchsorted(cumvar, 0.90) + 1
print(f"PCs to explain 90% variance: {n_for_90}")
print(f"PC1 variance: {pc.results.variance[0] / pc.results.variance.sum() * 100:.1f}%")

# Project trajectory onto PC1-PC2 space
transformed = pc.transform(ca, n_components=2)
plt.scatter(transformed[:, 0], transformed[:, 1], c=range(len(transformed)),
            cmap="viridis", s=2)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.colorbar(label="Frame")
plt.title("PCA Conformational Landscape")
plt.savefig("pca.png", dpi=150)
print("Saved: pca.png")

Key Parameters

ParameterModuleDefaultEffect
select
(Universe)
AtomGroupMDAnalysis selection language string (e.g.,
"protein and name CA"
)
in_memory
align.AlignTraj
False
Load full trajectory into RAM for faster analysis
step
trajectory loop
1
Analyze every Nth frame;
step=10
reduces computation 10×
start/stop
analysis.run()allFrame range;
run(start=100, stop=500)
analyzes frames 100-500
d_a_cutoff
HydrogenBondAnalysis
3.0
Donor-acceptor distance cutoff (Å)
d_h_a_angle_cutoff
HydrogenBondAnalysis
150.0
H-bond angle cutoff (degrees)
n_components
PCA.transformallNumber of PCs to project onto
groupselection
RMSD
None
Secondary selection for fitting; primary used for RMSD calculation
weights
RMSD/align
None
"mass"
for mass-weighted RMSD
verbose
analysis.run()
False
Print progress bar during analysis

Common Workflows

Workflow 1: Complete Protein-Ligand Binding Stability Analysis

import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

u = mda.Universe("complex.gro", "md_100ns.xtc")

# Align trajectory to initial frame
align.AlignTraj(u, u, select="protein and backbone", in_memory=True).run()

protein_bb = u.select_atoms("backbone")
ligand = u.select_atoms("resname LIG")

results = {"time_ns": [], "protein_rmsd": [], "ligand_rmsd": [], "pocket_rg": []}
ref_positions = {"protein": protein_bb.positions.copy(), "ligand": ligand.positions.copy()}

for ts in u.trajectory:
    results["time_ns"].append(ts.time / 1000)
    results["protein_rmsd"].append(
        rms.rmsd(protein_bb.positions, ref_positions["protein"], superposition=False))
    results["ligand_rmsd"].append(
        rms.rmsd(ligand.positions, ref_positions["ligand"], superposition=False))
    results["pocket_rg"].append(
        u.select_atoms("protein and around 6.0 resname LIG").radius_of_gyration())

df = pd.DataFrame(results)
df.to_csv("binding_stability.csv", index=False)
print(f"Ligand mean RMSD: {df['ligand_rmsd'].mean():.2f} Å")
print(f"Pocket stable: {'Yes' if df['pocket_rg'].std() < 0.5 else 'No'}")

Workflow 2: Extract Minimum RMSD Representative Structures

import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")

# Compute RMSD for all frames
rmsd_analysis = rms.RMSD(u, select="backbone")
rmsd_analysis.run()
rmsd_values = rmsd_analysis.results.rmsd[:, 2]

# Find frame with lowest RMSD (most representative)
min_frame = np.argmin(rmsd_values)
print(f"Most representative frame: {min_frame} (RMSD: {rmsd_values[min_frame]:.2f} Å)")

# Extract and save that frame as PDB
u.trajectory[min_frame]
with mda.Writer("representative_structure.pdb", u.atoms.n_atoms) as writer:
    writer.write(u.atoms)
print("Saved: representative_structure.pdb")

Common Recipes

Recipe 1: Compute Contact Map Between Two Protein Domains

import MDAnalysis as mda
from MDAnalysis.analysis import contacts
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")

# Define two domains
domain_A = u.select_atoms("resid 1-100 and name CA")
domain_B = u.select_atoms("resid 200-300 and name CA")

# Count domain-domain contacts (< 8 Å) over time
contact_counts = []
for ts in u.trajectory[::10]:  # every 10th frame
    dist_matrix = np.sqrt(np.sum(
        (domain_A.positions[:, None] - domain_B.positions[None, :]) ** 2, axis=-1
    ))
    contact_counts.append((dist_matrix < 8.0).sum())

print(f"Mean contacts: {np.mean(contact_counts):.1f}")
print(f"Contact count range: {min(contact_counts)} – {max(contact_counts)}")

Recipe 2: Write Trajectory Subset to New File

import MDAnalysis as mda

u = mda.Universe("system.gro", "long_trajectory.xtc")

# Write only protein atoms, every 10th frame, frames 500-2000
protein = u.select_atoms("protein")
with mda.Writer("protein_subset.xtc", protein.n_atoms) as writer:
    for ts in u.trajectory[500:2000:10]:
        writer.write(protein)

print(f"Subset trajectory written: {(2000-500)//10} frames")

Troubleshooting

ProblemCauseSolution
ValueError: Universe has no file
Missing trajectory argumentProvide both topology AND trajectory:
mda.Universe(top, traj)
RMSD drift despite alignmentWrong selection for fittingUse
"backbone"
for fitting; verify topology matches trajectory
KeyError: 'resname LIG'
Non-standard residue nameCheck
u.residues.resnames
; use exact name from topology
Very slow frame iterationLarge trajectory in memoryUse
step=10
to skip frames; convert xtc to smaller DCD
Hydrogen bond count is 0No explicit hydrogens in topologyUse topology with explicit H atoms; add hydrogens with
psfgen
or
tleap
NoDataError: xtc has no charges
Property not in trajectoryUse topology file that contains charges (.psf, .prmtop, .gro with itp)
Memory error with
in_memory=True
Trajectory too large for RAMRemove
in_memory=True
; increase RAM; or process in chunks
Import error on Apple SiliconBinary not compiled for arm64Install via conda-forge:
conda install -c conda-forge MDAnalysis

References