Materials-simulation-skills performance-profiling

install
source · Clone the upstream repo
git clone https://github.com/HeshamFS/materials-simulation-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HeshamFS/materials-simulation-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/simulation-workflow/performance-profiling" ~/.claude/skills/heshamfs-materials-simulation-skills-performance-profiling && rm -rf "$T"
manifest: skills/simulation-workflow/performance-profiling/SKILL.md
source content

Performance Profiling

Goal

Provide tools to analyze simulation performance, identify bottlenecks, and recommend optimization strategies for computational materials science simulations.

Requirements

  • Python 3.8+
  • No external dependencies (uses Python standard library only)
  • Works on Linux, macOS, and Windows

Inputs to Gather

Before running profiling scripts, collect from the user:

InputDescriptionExample
Simulation logLog file with timing information
simulation.log
Scaling dataJSON with multi-run performance data
scaling_data.json
Simulation parametersJSON with mesh, fields, solver config
params.json
Available memorySystem memory in GB (optional)
16.0

Decision Guidance

When to Use Each Script

Need to identify slow phases?
├── YES → Use timing_analyzer.py
│         └── Parse simulation logs for timing data
│
Need to understand parallel performance?
├── YES → Use scaling_analyzer.py
│         └── Analyze strong or weak scaling efficiency
│
Need to estimate memory requirements?
├── YES → Use memory_profiler.py
│         └── Estimate memory from problem parameters
│
Need optimization recommendations?
└── YES → Use bottleneck_detector.py
          └── Combine analyses and get actionable advice

Choosing Analysis Thresholds

MetricGoodAcceptablePoor
Phase dominance<30%30-50%>50%
Parallel efficiency>0.800.70-0.80<0.70
Memory usage<60%60-80%>80%

Script Outputs (JSON Fields)

ScriptKey Outputs
timing_analyzer.py
timing_data.phases
,
timing_data.slowest_phase
,
timing_data.total_time
scaling_analyzer.py
scaling_analysis.results
,
scaling_analysis.efficiency_threshold_processors
memory_profiler.py
memory_profile.total_memory_gb
,
memory_profile.per_process_gb
,
memory_profile.warnings
bottleneck_detector.py
bottlenecks
,
recommendations

Workflow

Complete Profiling Workflow

  1. Analyze timing from simulation logs
  2. Analyze scaling from multi-run data (if available)
  3. Profile memory from simulation parameters
  4. Detect bottlenecks and get recommendations
  5. Implement optimizations based on recommendations
  6. Re-profile to verify improvements

Quick Profiling (Timing Only)

  1. Run timing analyzer on simulation log
  2. Identify dominant phases (>50% of runtime)
  3. Apply targeted optimizations to dominant phases

CLI Examples

Timing Analysis

# Basic timing analysis
python3 scripts/timing_analyzer.py \
    --log simulation.log \
    --json

# Custom timing pattern
python3 scripts/timing_analyzer.py \
    --log simulation.log \
    --pattern 'Step\s+(\w+)\s+took\s+([\d.]+)s' \
    --json

Scaling Analysis

# Strong scaling (fixed problem size)
python3 scripts/scaling_analyzer.py \
    --data scaling_data.json \
    --type strong \
    --json

# Weak scaling (constant work per processor)
python3 scripts/scaling_analyzer.py \
    --data scaling_data.json \
    --type weak \
    --json

Memory Profiling

# Estimate memory requirements
python3 scripts/memory_profiler.py \
    --params simulation_params.json \
    --available-gb 16.0 \
    --json

Bottleneck Detection

# Detect bottlenecks from timing only
python3 scripts/bottleneck_detector.py \
    --timing timing_results.json \
    --json

# Comprehensive analysis with all inputs
python3 scripts/bottleneck_detector.py \
    --timing timing_results.json \
    --scaling scaling_results.json \
    --memory memory_results.json \
    --json

Conversational Workflow Example

User: My simulation is taking too long. Can you help me identify what's slow?

Agent workflow:

  1. Ask for simulation log file
  2. Run timing analyzer:
    python3 scripts/timing_analyzer.py --log simulation.log --json
    
  3. Interpret results:
    • If solver dominates (>50%): Recommend preconditioner tuning
    • If assembly dominates: Recommend caching or vectorization
    • If I/O dominates: Recommend reducing output frequency
  4. If user has multi-run data, analyze scaling:
    python3 scripts/scaling_analyzer.py --data scaling.json --type strong --json
    
  5. Generate comprehensive recommendations:
    python3 scripts/bottleneck_detector.py --timing timing.json --scaling scaling.json --json
    

Interpretation Guidance

Timing Analysis

ScenarioMeaningAction
Solver >70%Solver-dominatedTune preconditioner, check tolerance
Assembly >50%Assembly-dominatedCache matrices, vectorize, parallelize
I/O >30%I/O-dominatedReduce frequency, use parallel I/O
Balanced (<30% each)Well-balancedLook for algorithmic improvements

Scaling Analysis

EfficiencyMeaningAction
>0.80Excellent scalingContinue scaling up
0.70-0.80Good scalingMonitor at larger scales
0.50-0.70Poor scalingInvestigate communication/load balance
<0.50Very poor scalingReduce processor count or redesign

Memory Profile

UsageMeaningAction
<60% availableSafeNo action needed
60-80% availableModerateMonitor, consider optimization
>80% availableHighReduce resolution or increase processors
>100% availableExceeds capacityMust reduce problem size

Error Handling

ErrorCauseResolution
Log file not found
Invalid pathVerify log file path
No timing data found
Pattern mismatchProvide custom pattern with --pattern
At least 2 runs required
Insufficient dataProvide more scaling runs
Missing required parameters
Incomplete paramsAdd mesh and fields to params file

Optimization Strategies by Bottleneck Type

Solver Bottlenecks

  • Use algebraic multigrid (AMG) preconditioner
  • Tighten solver tolerance if over-solving
  • Consider direct solver for small problems
  • Profile matrix assembly vs solve time

Assembly Bottlenecks

  • Cache element matrices if geometry is static
  • Use vectorized assembly routines
  • Consider matrix-free methods
  • Parallelize assembly with coloring

I/O Bottlenecks

  • Reduce output frequency
  • Use parallel I/O (HDF5, MPI-IO)
  • Write to fast scratch storage
  • Compress output data

Scaling Bottlenecks

  • Investigate communication overhead
  • Check for load imbalance
  • Reduce synchronization points
  • Use asynchronous communication
  • Consider hybrid MPI+OpenMP

Memory Bottlenecks

  • Reduce mesh resolution
  • Use iterative solver (lower memory than direct)
  • Enable out-of-core computation
  • Increase number of processors
  • Use single precision where appropriate

Security

Input Validation

  • User-supplied
    --pattern
    regex values are validated for length (500 chars max) and rejected if they contain constructs prone to catastrophic backtracking (ReDoS)
  • Scaling data entries are validated for finite time values, integer processor counts, and bounded run count (10,000 max)
  • available_gb
    is validated as a positive finite number; mesh dimensions and field parameters are validated as positive integers
  • --type
    (scaling type) is validated against a fixed allowlist (
    strong
    ,
    weak
    )
  • All loaded JSON files must have an object (dict) as root element

File Access

  • timing_analyzer.py
    reads a single log file specified by
    --log
    ; log files are capped at 500 MB and rejected before parsing
  • scaling_analyzer.py
    ,
    memory_profiler.py
    , and
    bottleneck_detector.py
    read JSON files capped at 100 MB
  • Phase names extracted from log files are truncated to 200 characters and stripped of control characters to prevent prompt-injection payloads from propagating into agent context
  • No scripts write to the filesystem; all output goes to stdout

Tool Restrictions

  • Read: Used to inspect script source, references, simulation logs, and result files
  • Write: Used to save profiling reports or optimization recommendations; writes are scoped to the user's working directory
  • Grep/Glob: Used to locate log files, result files, and search references
  • The skill's
    allowed-tools
    excludes
    Bash
    to prevent the agent from executing arbitrary commands when processing untrusted simulation logs or result files

Safety Measures

  • No
    eval()
    ,
    exec()
    , or dynamic code generation
  • All subprocess calls use explicit argument lists (no
    shell=True
    )
  • Reduced tool surface (no Bash) limits the agent to read/write operations only
  • Phase names and diagnostic strings are sanitized before inclusion in output to prevent injection

Limitations

  • Log parsing: Depends on pattern matching; may miss unusual formats
  • Scaling analysis: Requires at least 2 runs for meaningful results
  • Memory estimation: Approximate; actual usage may vary
  • Recommendations: General guidance; may need domain-specific tuning

References

  • references/profiling_guide.md
    - Profiling concepts and interpretation
  • references/optimization_strategies.md
    - Detailed optimization approaches

Version History

  • v1.0.0 (2025-01-22): Initial release with 4 profiling scripts