SciAgent-Skills pymatgen
pymatgen (Python Materials Genomics) is a materials science Python library for structure analysis, thermodynamics, and electronic property calculation. Parse and create crystal structures (CIF, POSCAR, CIF), query the Materials Project database for DFT-computed properties, analyze phase diagrams and pourbaix diagrams, compute X-ray diffraction patterns, and generate DFT input files for VASP, Quantum ESPRESSO, and CP2K. Alternatives: ASE (Atomic Simulation Environment) for MD/geometry; AFLOW for high-throughput; OVITO for visualization.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/pymatgen" ~/.claude/skills/jaechang-hits-sciagent-skills-pymatgen && rm -rf "$T"
skills/scientific-computing/pymatgen/SKILL.mdpymatgen
Overview
pymatgen is the standard Python library for materials science computation. Its core data model —
Structure (periodic crystalline materials) and Molecule (non-periodic) — provides a unified representation for input/output across 30+ file formats (CIF, POSCAR/CONTCAR, XYZ, PDB, Gaussian, VASP). The library integrates with the Materials Project REST API (mp_api) to retrieve 150,000+ DFT-computed structures with band gaps, formation energies, and elastic constants. pymatgen is the foundation of the atomate2 and Custodian workflow frameworks for high-throughput DFT.
When to Use
- Parsing and converting crystal structure files between CIF, POSCAR, XYZ, and other formats
- Querying the Materials Project API for computed band gaps, formation energies, and stability data
- Constructing and analyzing phase diagrams and Pourbaix diagrams for thermodynamic stability
- Generating VASP, Quantum ESPRESSO, or CP2K input files from structure objects
- Computing X-ray diffraction (XRD) and neutron diffraction patterns for comparison with experiment
- Analyzing symmetry, space groups, and Wyckoff positions of crystal structures
- Use ASE when running molecular dynamics or interfacing with multiple MD/DFT codes via a unified runner
Prerequisites
- Python packages:
,pymatgen
(Materials Project client)mp-api - Data requirements: structure files (CIF, POSCAR) or Materials Project API key
- API key: free at materialsproject.org — set
env varPMG_MAPI_KEY
pip install pymatgen mp-api # Set API key export PMG_MAPI_KEY="your_api_key_here" # Or via pymatgen config python -c "from pymatgen.core import SETTINGS; SETTINGS['PMG_MAPI_KEY'] = 'your_key'"
Quick Start
from pymatgen.core import Structure, Lattice, Species # Build silicon diamond cubic structure from scratch a = 5.431 # Angstroms lattice = Lattice.cubic(a) silicon = Structure( lattice=lattice, species=["Si", "Si"], coords=[[0, 0, 0], [0.25, 0.25, 0.25]], ) print(f"Silicon: {silicon.formula}, {silicon.volume:.2f} ų") print(f"Space group: {silicon.get_space_group_info()}") # Silicon: Si2, 40.89 ų # Space group: ('Fd-3m', 227)
Core API
Module 1: Structure and Lattice
Core data structures for periodic crystals.
from pymatgen.core import Structure, Lattice, Element, Species import numpy as np # From lattice parameters lattice = Lattice.from_parameters(a=4.05, b=4.05, c=4.05, alpha=90, beta=90, gamma=90) # Build FCC aluminum al_fcc = Structure(lattice, ["Al", "Al", "Al", "Al"], [[0, 0, 0], [0.5, 0.5, 0], [0.5, 0, 0.5], [0, 0.5, 0.5]]) print(f"Formula: {al_fcc.formula}") print(f"Sites: {len(al_fcc)}") print(f"Volume: {al_fcc.volume:.3f} ų") print(f"Density: {al_fcc.density:.3f} g/cm³") # Access sites for site in al_fcc: print(f" {site.species_string} at {site.frac_coords}")
# Load from file from pymatgen.core import Structure # From CIF (most common exchange format) struct = Structure.from_file("material.cif") # From POSCAR (VASP format) struct_vasp = Structure.from_file("POSCAR") # Get neighbors within cutoff site = struct[0] neighbors = struct.get_neighbors(site, r=3.0) print(f"Neighbors within 3 Å: {len(neighbors)}") for nn in neighbors[:3]: print(f" {nn.species_string}: {nn.nn_distance:.3f} Å")
Module 2: Materials Project API Query
Retrieve DFT-computed properties for 150,000+ materials.
from mp_api.client import MPRester import os api_key = os.environ.get("PMG_MAPI_KEY", "your_key") with MPRester(api_key) as mpr: # Search by chemical system docs = mpr.materials.summary.search( chemsys=["Li-Fe-O"], fields=["material_id", "formula_pretty", "energy_above_hull", "band_gap", "is_stable"] ) print(f"Li-Fe-O materials: {len(docs)}") for d in docs[:5]: print(f" {d.material_id}: {d.formula_pretty}, " f"Eg={d.band_gap:.2f} eV, above_hull={d.energy_above_hull:.3f} eV/atom")
# Get specific material by MP ID with MPRester(api_key) as mpr: doc = mpr.materials.summary.get_data_by_id( "mp-149", # Silicon fields=["structure", "band_gap", "formation_energy_per_atom", "density", "is_stable", "symmetry"] ) struct = doc.structure print(f"Si mp-149: band_gap={doc.band_gap:.3f} eV, " f"density={doc.density:.3f} g/cm³") print(f"Space group: {doc.symmetry.symbol}")
Module 3: Symmetry Analysis
from pymatgen.symmetry.analyzer import SpacegroupAnalyzer from pymatgen.core import Structure struct = Structure.from_file("material.cif") # Symmetry analysis sga = SpacegroupAnalyzer(struct, symprec=0.1) print(f"Space group: {sga.get_space_group_symbol()} ({sga.get_space_group_number()})") print(f"Crystal system: {sga.get_crystal_system()}") print(f"Point group: {sga.get_point_group_symbol()}") # Get conventional / primitive cell primitive = sga.get_primitive_standard_structure() conventional = sga.get_conventional_standard_structure() print(f"Primitive: {len(primitive)} sites | Conventional: {len(conventional)} sites") # Wyckoff positions sym_dataset = sga.get_symmetry_dataset() print(f"Wyckoff letters: {set(sym_dataset['wyckoffs'])}")
Module 4: Phase Diagrams
Thermodynamic stability and phase boundary analysis.
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter from mp_api.client import MPRester import os api_key = os.environ.get("PMG_MAPI_KEY", "your_key") with MPRester(api_key) as mpr: # Get all entries in the Li-Fe-P-O chemical system entries = mpr.get_pourbaix_entries(["Li", "Fe"]) # For phase diagram, use computed entries with MPRester(api_key) as mpr: entries = mpr.get_entries_in_chemsys(["Li", "Fe", "O"]) pd = PhaseDiagram(entries) print(f"Stable phases: {len(pd.stable_entries)}") # Check stability of a specific composition from pymatgen.core import Composition comp = Composition("LiFeO2") e_hull = pd.get_e_above_hull(pd.qhull_entries[0]) print(f"E above hull: {e_hull:.3f} eV/atom") # Plot (requires matplotlib) plotter = PDPlotter(pd, backend="matplotlib") plotter.show()
Module 5: XRD Pattern Simulation
from pymatgen.analysis.diffraction.xrd import XRDCalculator from pymatgen.core import Structure import matplotlib.pyplot as plt struct = Structure.from_file("material.cif") # or build programmatically # Calculate XRD pattern (Cu Kα radiation, λ = 1.5406 Å) calculator = XRDCalculator(wavelength="CuKa") pattern = calculator.get_pattern(struct, two_theta_range=(10, 80)) print(f"Diffraction peaks: {len(pattern.x)}") for two_theta, intensity, hkl in zip(pattern.x[:5], pattern.y[:5], pattern.hkls[:5]): print(f" 2θ={two_theta:.2f}°, I={intensity:.1f}, hkl={hkl}") # Plot fig, ax = plt.subplots(figsize=(10, 4)) ax.bar(pattern.x, pattern.y, width=0.3, color="black") ax.set_xlabel("2θ (degrees)") ax.set_ylabel("Intensity (arb. units)") ax.set_title(f"XRD Pattern — {struct.formula}") plt.tight_layout() plt.savefig("xrd_pattern.pdf", bbox_inches="tight")
Module 6: DFT Input File Generation
Generate VASP input sets for DFT calculations.
from pymatgen.io.vasp.sets import MPRelaxSet, MPStaticSet from pymatgen.core import Structure struct = Structure.from_file("material.cif") # Generate VASP relaxation input set (Materials Project standard) relax_set = MPRelaxSet(struct) # Write to directory import os os.makedirs("vasp_relax", exist_ok=True) relax_set.write_input("vasp_relax") print("Generated: POSCAR, INCAR, KPOINTS, POTCAR (requires VASP pseudopotentials)") # Inspect key INCAR settings incar = relax_set.incar print(f"ENCUT: {incar.get('ENCUT')} eV") print(f"KPOINTS: {relax_set.kpoints}") # For static calculation after relaxation static_set = MPStaticSet.from_prev_calc(prev_calc_dir="vasp_relax") static_set.write_input("vasp_static")
Key Concepts
Fractional vs Cartesian Coordinates
pymatgen
Structure stores atomic positions in fractional coordinates (relative to lattice vectors, range 0–1). Convert to/from Cartesian (Angstroms) using struct.lattice.get_cartesian_coords(frac) or struct.lattice.get_fractional_coords(cart). Most file formats use Cartesian; pymatgen converts automatically on read/write.
Composition and Oxidation States
Composition("LiFePO4") parses chemical formulas. Structure.add_oxidation_state_by_guess() uses bond valence to assign formal charges (+Li, -O, etc.) needed for Pourbaix diagrams and some property calculations.
Common Workflows
Workflow 1: High-Throughput Stability Screen
from mp_api.client import MPRester from pymatgen.analysis.phase_diagram import PhaseDiagram import pandas as pd, os api_key = os.environ.get("PMG_MAPI_KEY", "your_key") # Screen lithium-transition-metal oxides for stability systems = [f"Li-{m}-O" for m in ["Mn", "Co", "Ni", "Fe", "V"]] results = [] with MPRester(api_key) as mpr: for system in systems: docs = mpr.materials.summary.search( chemsys=[system], fields=["material_id", "formula_pretty", "energy_above_hull", "band_gap", "is_stable", "formation_energy_per_atom"] ) for d in docs: results.append({ "system": system, "mpid": d.material_id, "formula": d.formula_pretty, "e_above_hull": d.energy_above_hull, "band_gap": d.band_gap, "stable": d.is_stable, }) df = pd.DataFrame(results) stable = df[df["stable"] == True].sort_values("band_gap") print(f"Stable phases: {len(stable)}/{len(df)}") print(stable[["formula", "system", "band_gap", "e_above_hull"]].head(10)) stable.to_csv("stability_screen.csv", index=False)
Workflow 2: Structure Manipulation and Export
from pymatgen.core import Structure from pymatgen.transformations.standard_transformations import ( SupercellTransformation, SubstitutionTransformation ) # Load and analyze structure struct = Structure.from_file("material.cif") print(f"Original: {struct.formula}, {len(struct)} sites") # Create 2×2×2 supercell sc_matrix = [[2, 0, 0], [0, 2, 0], [0, 0, 2]] supercell = SupercellTransformation(sc_matrix).apply_transformation(struct) print(f"Supercell: {len(supercell)} sites") # Substitute element (e.g., 10% Fe doping on Mn sites) sub = SubstitutionTransformation({"Mn": {"Mn": 0.9, "Fe": 0.1}}) doped = sub.apply_transformation(struct) print(f"Doped composition: {doped.composition.reduced_formula}") # Export in multiple formats struct.to(filename="output.cif") # CIF struct.to(filename="POSCAR") # VASP POSCAR struct.to(filename="output.xyz") # XYZ print("Exported CIF, POSCAR, XYZ")
Key Parameters
| Parameter | Module/Function | Default | Range / Options | Effect |
|---|---|---|---|---|
| | 0.01 | 0.01–0.5 Å | Symmetry detection tolerance; larger = more permissive |
| | | , , float (Å) | X-ray wavelength for diffraction simulation |
| | | tuple of degrees | Angular range for XRD pattern |
| INCAR | 520 eV | 300–800 eV | Plane-wave energy cutoff for VASP |
| | — | | Chemical system filter for Materials Project query |
| | all | list of strings | Limit returned fields to reduce API transfer |
| | — | 1–8 Å | Neighbor search cutoff radius |
Common Recipes
Recipe: Batch CIF to POSCAR Conversion
from pymatgen.core import Structure from pathlib import Path cif_dir = Path("cif_files") poscar_dir = Path("poscar_files") poscar_dir.mkdir(exist_ok=True) for cif_path in cif_dir.glob("*.cif"): try: struct = Structure.from_file(str(cif_path)) out_path = poscar_dir / f"{cif_path.stem}_POSCAR" struct.to(filename=str(out_path)) print(f"Converted: {cif_path.name} → {out_path.name}") except Exception as e: print(f"FAILED {cif_path.name}: {e}")
Recipe: Get Band Gap for List of Materials
from mp_api.client import MPRester import pandas as pd, os mp_ids = ["mp-149", "mp-2815", "mp-1265"] # Si, GaAs, TiO2 api_key = os.environ.get("PMG_MAPI_KEY", "your_key") with MPRester(api_key) as mpr: docs = mpr.materials.summary.get_data_by_ids( mp_ids, fields=["material_id", "formula_pretty", "band_gap", "is_gap_direct"] ) df = pd.DataFrame([{ "mpid": d.material_id, "formula": d.formula_pretty, "band_gap_eV": d.band_gap, "direct_gap": d.is_gap_direct, } for d in docs]) print(df.to_string(index=False))
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| not set or expired | Set env var: ; regenerate key at materialsproject.org |
returns wrong space group | Atom positions have small disorder; too tight | Increase from 0.01 to 0.1; use first |
fails for CIF | Disorder, partial occupancies, or non-standard CIF | Use ; set |
hangs or times out | Large query returning thousands of results | Add to reduce payload; use |
| POTCAR not found in VASP input set | VASP pseudopotential library not configured | Run |
| Memory error on large supercell | Supercell has thousands of atoms | Reduce supercell size; use then write incrementally |
gives zero peaks | Structure has only 1 site or all-same species | Ensure multi-site structure; check that structure loaded correctly with |
Related Skills
— multi-objective optimization for materials property screening using pymatgen descriptorspymoo
— structure preparation workflow analogous to pymatgen for molecular dockingautodock-vina-docking
— efficient storage of large arrays from MD trajectories or property databaseszarr-python
References
- pymatgen documentation — full API reference, tutorials, and compatibility matrix
- Materials Project API docs — REST API and mp-api client reference
- pymatgen paper: Ong et al. (2013), CMS — original publication and design philosophy
- Materials Project paper: Jain et al. (2013), APL Materials — database description and computed properties
- pymatgen GitHub — source code and examples