Materials-simulation-skills slurm-job-script-generator

install

source · Clone the upstream repo

git clone https://github.com/HeshamFS/materials-simulation-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/HeshamFS/materials-simulation-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/hpc-deployment/slurm-job-script-generator" ~/.claude/skills/heshamfs-materials-simulation-skills-slurm-job-script-generator && rm -rf "$T"

manifest: skills/hpc-deployment/slurm-job-script-generator/SKILL.md

source content

SLURM Job Script Generator

Goal

Generate a correct, copy-pasteable SLURM job script (

.sbatch

) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).

Requirements

Python 3.8+
No external dependencies (Python standard library only)
Works on Linux, macOS, and Windows (script generation only)

Inputs to Gather

Input	Description	Example
Job name	Short identifier for the job	`phasefield-strong-scaling`
Walltime	SLURM time limit	`00:30:00`
Partition	Cluster partition/queue (if required)	`compute`
Account	Project/account (if required)	`matsim`
Nodes	Number of nodes to allocate	`2`
MPI tasks	Total tasks, or tasks per node	`128` or `64` per node
Threads	CPUs per task (OpenMP threads)	`2`
Memory	`--mem` or `--mem-per-cpu` (cluster policy dependent)	`32G`
GPUs	GPUs per node (optional)	`4`
Working directory	Where the run should execute	`$SLURM_SUBMIT_DIR`
Modules	Environment modules to load (optional)	`gcc/12` , `openmpi/4.1`
Run command	The command to launch under SLURM	`./simulate --config cfg.json`

Decision Guidance

MPI vs MPI+OpenMP layout

Does the code use OpenMP / threading?
├── NO  → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
          and export OMP_NUM_THREADS = cpus-per-task

Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).

Memory flag selection

Use either
```
--mem
```
(per node) or
```
--mem-per-cpu
```
(per CPU), not both.
Follow your cluster’s documentation; some sites enforce one style.
SLURM
```
--mem
```
units are integer MB by default, or an integer with suffix
```
K/M/G/T
```
(and
```
--mem=0
```
commonly means “all memory on node”).

Script Outputs (JSON Fields)

Script Key Outputs

scripts/slurm_script_generator.py

results.script

results.directives

results.derived

results.warnings

Workflow

Gather cluster constraints (partition/account, GPU policy, memory policy).
Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
Generate the script with
```
slurm_script_generator.py
```
.
Inspect warnings (conflicts, suspicious layouts).
Save the generated script as
```
job.sbatch
```
.
Submit with
```
sbatch job.sbatch
```
and monitor with
```
squeue
```
.

CLI Examples

# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --partition compute \
  --nodes 1 \
  --ntasks-per-node 8 \
  --cpus-per-task 2 \
  --mem 16G \
  --module gcc/12 \
  --module openmpi/4.1 \
  -- \
  ./simulate --config config.json

# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
  --job-name phasefield \
  --time 00:10:00 \
  --nodes 1 \
  --ntasks 16 \
  --cpus-per-task 1 \
  --out job.sbatch \
  --json \
  -- \
  /bin/echo hello

Conversational Workflow Example

User: I need an

sbatch

script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.

Agent workflow:

Confirm partition/account and whether GPUs are needed.

Generate a hybrid job script:

python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate

Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (
```
OMP_NUM_THREADS=2
```
  )
If the user provides node core counts, sanity-check oversubscription using
```
--cores-per-node
```
.

Error Handling

Error	Cause	Resolution
`time must be HH:MM:SS or D-HH:MM:SS`	Bad walltime format	Use `00:30:00` or `1-00:00:00`
`nodes must be positive`	Non-positive nodes	Provide `--nodes >= 1`
`Provide either --mem or --mem-per-cpu, not both`	Conflicting memory directives	Choose one memory style
`Provide a run command after --`	Missing launch command	Add `-- ./simulate ...`

Security

Input Validation

```
--time
```
is validated against strict
```
HH:MM:SS
```
or
```
D-HH:MM:SS
```
format via regex
```
--nodes
```
,
```
--ntasks
```
,
```
--ntasks-per-node
```
,
```
--cpus-per-task
```
,
```
--gpus
```
are validated as positive integers with upper bounds
```
--mem
```
and
```
--mem-per-cpu
```
are validated against SLURM's accepted format (
```
<int>[K|M|G|T]
```
); providing both simultaneously is rejected
```
--job-name
```
is validated against
```
[a-zA-Z0-9_.-]+
```
(no shell metacharacters)
```
--partition
```
and
```
--account
```
are validated against safe-character allowlists
```
--module
```
values are validated to prevent shell injection (no
```
;
```
,
```
|
```
,
```
&
```
, backticks, or
```
$
```
)

File Access

The script reads no external files; all inputs are provided via CLI arguments
```
--out
```
writes the generated sbatch script to a single specified file path
The generated script is a plain-text shell script with
```
#SBATCH
```
directives; it contains no dynamically generated code

Tool Restrictions

Read: Used to inspect script source, references, and existing job scripts
Bash: Used to execute
```
slurm_script_generator.py
```
with explicit argument lists; the generated script itself is NOT executed by the agent
Write: Used to save the generated
```
.sbatch
```
file; writes are scoped to the user's working directory
Grep/Glob: Used to locate existing scripts, configs, and cluster documentation

Safety Measures

No
```
eval()
```
,
```
exec()
```
, or dynamic code generation
All subprocess calls use explicit argument lists (no
```
shell=True
```
)
The run command (after
```
--
```
) is included verbatim in the generated script but is never executed by the skill itself
Module names are sanitized to prevent injection into
```
module load
```
directives
Generated scripts use
```
set -euo pipefail
```
for safe shell execution on the cluster

Limitations

Does not query cluster hardware or site policies; it can only validate internal consistency.
SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.

References

```
references/slurm_directives.md
```
- Common
```
#SBATCH
```
directives and mapping tips

Version History

v1.0.0 (2026-02-25): Initial SLURM job script generator