Materials-simulation-skills performance-profiling
install
source · Clone the upstream repo
git clone https://github.com/HeshamFS/materials-simulation-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/HeshamFS/materials-simulation-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/simulation-workflow/performance-profiling" ~/.claude/skills/heshamfs-materials-simulation-skills-performance-profiling && rm -rf "$T"
manifest:
skills/simulation-workflow/performance-profiling/SKILL.mdsource content
Performance Profiling
Goal
Provide tools to analyze simulation performance, identify bottlenecks, and recommend optimization strategies for computational materials science simulations.
Requirements
- Python 3.8+
- No external dependencies (uses Python standard library only)
- Works on Linux, macOS, and Windows
Inputs to Gather
Before running profiling scripts, collect from the user:
| Input | Description | Example |
|---|---|---|
| Simulation log | Log file with timing information | |
| Scaling data | JSON with multi-run performance data | |
| Simulation parameters | JSON with mesh, fields, solver config | |
| Available memory | System memory in GB (optional) | |
Decision Guidance
When to Use Each Script
Need to identify slow phases? ├── YES → Use timing_analyzer.py │ └── Parse simulation logs for timing data │ Need to understand parallel performance? ├── YES → Use scaling_analyzer.py │ └── Analyze strong or weak scaling efficiency │ Need to estimate memory requirements? ├── YES → Use memory_profiler.py │ └── Estimate memory from problem parameters │ Need optimization recommendations? └── YES → Use bottleneck_detector.py └── Combine analyses and get actionable advice
Choosing Analysis Thresholds
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Phase dominance | <30% | 30-50% | >50% |
| Parallel efficiency | >0.80 | 0.70-0.80 | <0.70 |
| Memory usage | <60% | 60-80% | >80% |
Script Outputs (JSON Fields)
| Script | Key Outputs |
|---|---|
| , , |
| , |
| , , |
| , |
Workflow
Complete Profiling Workflow
- Analyze timing from simulation logs
- Analyze scaling from multi-run data (if available)
- Profile memory from simulation parameters
- Detect bottlenecks and get recommendations
- Implement optimizations based on recommendations
- Re-profile to verify improvements
Quick Profiling (Timing Only)
- Run timing analyzer on simulation log
- Identify dominant phases (>50% of runtime)
- Apply targeted optimizations to dominant phases
CLI Examples
Timing Analysis
# Basic timing analysis python3 scripts/timing_analyzer.py \ --log simulation.log \ --json # Custom timing pattern python3 scripts/timing_analyzer.py \ --log simulation.log \ --pattern 'Step\s+(\w+)\s+took\s+([\d.]+)s' \ --json
Scaling Analysis
# Strong scaling (fixed problem size) python3 scripts/scaling_analyzer.py \ --data scaling_data.json \ --type strong \ --json # Weak scaling (constant work per processor) python3 scripts/scaling_analyzer.py \ --data scaling_data.json \ --type weak \ --json
Memory Profiling
# Estimate memory requirements python3 scripts/memory_profiler.py \ --params simulation_params.json \ --available-gb 16.0 \ --json
Bottleneck Detection
# Detect bottlenecks from timing only python3 scripts/bottleneck_detector.py \ --timing timing_results.json \ --json # Comprehensive analysis with all inputs python3 scripts/bottleneck_detector.py \ --timing timing_results.json \ --scaling scaling_results.json \ --memory memory_results.json \ --json
Conversational Workflow Example
User: My simulation is taking too long. Can you help me identify what's slow?
Agent workflow:
- Ask for simulation log file
- Run timing analyzer:
python3 scripts/timing_analyzer.py --log simulation.log --json - Interpret results:
- If solver dominates (>50%): Recommend preconditioner tuning
- If assembly dominates: Recommend caching or vectorization
- If I/O dominates: Recommend reducing output frequency
- If user has multi-run data, analyze scaling:
python3 scripts/scaling_analyzer.py --data scaling.json --type strong --json - Generate comprehensive recommendations:
python3 scripts/bottleneck_detector.py --timing timing.json --scaling scaling.json --json
Interpretation Guidance
Timing Analysis
| Scenario | Meaning | Action |
|---|---|---|
| Solver >70% | Solver-dominated | Tune preconditioner, check tolerance |
| Assembly >50% | Assembly-dominated | Cache matrices, vectorize, parallelize |
| I/O >30% | I/O-dominated | Reduce frequency, use parallel I/O |
| Balanced (<30% each) | Well-balanced | Look for algorithmic improvements |
Scaling Analysis
| Efficiency | Meaning | Action |
|---|---|---|
| >0.80 | Excellent scaling | Continue scaling up |
| 0.70-0.80 | Good scaling | Monitor at larger scales |
| 0.50-0.70 | Poor scaling | Investigate communication/load balance |
| <0.50 | Very poor scaling | Reduce processor count or redesign |
Memory Profile
| Usage | Meaning | Action |
|---|---|---|
| <60% available | Safe | No action needed |
| 60-80% available | Moderate | Monitor, consider optimization |
| >80% available | High | Reduce resolution or increase processors |
| >100% available | Exceeds capacity | Must reduce problem size |
Error Handling
| Error | Cause | Resolution |
|---|---|---|
| Invalid path | Verify log file path |
| Pattern mismatch | Provide custom pattern with --pattern |
| Insufficient data | Provide more scaling runs |
| Incomplete params | Add mesh and fields to params file |
Optimization Strategies by Bottleneck Type
Solver Bottlenecks
- Use algebraic multigrid (AMG) preconditioner
- Tighten solver tolerance if over-solving
- Consider direct solver for small problems
- Profile matrix assembly vs solve time
Assembly Bottlenecks
- Cache element matrices if geometry is static
- Use vectorized assembly routines
- Consider matrix-free methods
- Parallelize assembly with coloring
I/O Bottlenecks
- Reduce output frequency
- Use parallel I/O (HDF5, MPI-IO)
- Write to fast scratch storage
- Compress output data
Scaling Bottlenecks
- Investigate communication overhead
- Check for load imbalance
- Reduce synchronization points
- Use asynchronous communication
- Consider hybrid MPI+OpenMP
Memory Bottlenecks
- Reduce mesh resolution
- Use iterative solver (lower memory than direct)
- Enable out-of-core computation
- Increase number of processors
- Use single precision where appropriate
Security
Input Validation
- User-supplied
regex values are validated for length (500 chars max) and rejected if they contain constructs prone to catastrophic backtracking (ReDoS)--pattern - Scaling data entries are validated for finite time values, integer processor counts, and bounded run count (10,000 max)
is validated as a positive finite number; mesh dimensions and field parameters are validated as positive integersavailable_gb
(scaling type) is validated against a fixed allowlist (--type
,strong
)weak- All loaded JSON files must have an object (dict) as root element
File Access
reads a single log file specified bytiming_analyzer.py
; log files are capped at 500 MB and rejected before parsing--log
,scaling_analyzer.py
, andmemory_profiler.py
read JSON files capped at 100 MBbottleneck_detector.py- Phase names extracted from log files are truncated to 200 characters and stripped of control characters to prevent prompt-injection payloads from propagating into agent context
- No scripts write to the filesystem; all output goes to stdout
Tool Restrictions
- Read: Used to inspect script source, references, simulation logs, and result files
- Write: Used to save profiling reports or optimization recommendations; writes are scoped to the user's working directory
- Grep/Glob: Used to locate log files, result files, and search references
- The skill's
excludesallowed-tools
to prevent the agent from executing arbitrary commands when processing untrusted simulation logs or result filesBash
Safety Measures
- No
,eval()
, or dynamic code generationexec() - All subprocess calls use explicit argument lists (no
)shell=True - Reduced tool surface (no Bash) limits the agent to read/write operations only
- Phase names and diagnostic strings are sanitized before inclusion in output to prevent injection
Limitations
- Log parsing: Depends on pattern matching; may miss unusual formats
- Scaling analysis: Requires at least 2 runs for meaningful results
- Memory estimation: Approximate; actual usage may vary
- Recommendations: General guidance; may need domain-specific tuning
References
- Profiling concepts and interpretationreferences/profiling_guide.md
- Detailed optimization approachesreferences/optimization_strategies.md
Version History
- v1.0.0 (2025-01-22): Initial release with 4 profiling scripts