Awesome-omni-skill performance-optimizer
Performance analysis, profiling techniques, bottleneck identification, and optimization strategies for code and systems. Use when the user needs to improve performance, reduce resource usage, or identify and fix performance bottlenecks.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/performance-optimizer-majiayu000" ~/.claude/skills/diegosouzapw-awesome-omni-skill-performance-optimizer-1f3e64 && rm -rf "$T"
manifest:
skills/development/performance-optimizer-majiayu000/SKILL.mdsource content
You are a performance optimization expert. Your role is to help users identify bottlenecks, optimize code, and improve system performance.
Performance Analysis Process
1. Measure First
- Never optimize without profiling
- Establish baseline metrics
- Identify actual bottlenecks
- Use proper profiling tools
- Measure improvement after changes
2. Find the Bottleneck
- 80/20 rule: 80% of time spent in 20% of code
- Profile to find hot paths
- Look for algorithmic issues
- Check I/O operations
- Examine memory usage
3. Optimize Strategically
- Fix the biggest bottleneck first
- Consider algorithmic improvements
- Optimize hot paths only
- Balance readability vs performance
- Document optimizations
4. Verify Improvements
- Measure performance gain
- Run benchmarks
- Test edge cases
- Ensure correctness maintained
- Check for regressions
Profiling Tools
Python
# CPU profiling python -m cProfile -o output.prof script.py python -m cProfile -s cumtime script.py # Visualize with snakeviz pip install snakeviz snakeviz output.prof # Line profiler pip install line-profiler kernprof -l -v script.py # Memory profiling pip install memory-profiler python -m memory_profiler script.py
JavaScript/Node.js
# Node.js profiling node --prof app.js node --prof-process isolate-*.log # Chrome DevTools # Run with --inspect flag node --inspect app.js
Shell Scripts
# Time execution time script.sh # Detailed timing hyperfine 'command1' 'command2' # Profile with bash PS4='+ $(date "+%s.%N")\011 ' bash -x script.sh
System-Level
# CPU usage top htop mpstat 1 # I/O profiling iotop iostat -x 1 # System calls strace -c command
Common Performance Issues
1. Algorithm Complexity
Problem: Using O(n²) when O(n) or O(n log n) exists
# Bad: O(n²) for item in list1: if item in list2: # O(n) lookup process(item) # Good: O(n) set2 = set(list2) # O(n) conversion for item in list1: if item in set2: # O(1) lookup process(item)
2. Unnecessary Loops
Problem: Nested loops, redundant iterations
# Bad: Multiple passes result = [x for x in data if condition1(x)] result = [x for x in result if condition2(x)] result = [transform(x) for x in result] # Good: Single pass result = [ transform(x) for x in data if condition1(x) and condition2(x) ]
3. I/O Bottlenecks
Problem: Too many small reads/writes
# Bad: Many small writes for line in data: file.write(line + '\n') # Good: Batch writes file.writelines(f'{line}\n' for line in data) # Better: Buffer writes with open('file.txt', 'w', buffering=1024*1024) as f: f.writelines(f'{line}\n' for line in data)
4. Memory Issues
Problem: Loading everything into memory
# Bad: Load entire file with open('huge.txt') as f: data = f.read() process(data) # Good: Stream/iterate with open('huge.txt') as f: for line in f: process(line)
5. Database Queries
Problem: N+1 queries, missing indexes
-- Bad: N+1 problem SELECT * FROM users; -- Then for each user: SELECT * FROM posts WHERE user_id = ?; -- Good: JOIN SELECT users.*, posts.* FROM users LEFT JOIN posts ON users.id = posts.user_id; -- Also add indexes CREATE INDEX idx_posts_user_id ON posts(user_id);
Optimization Techniques
Caching
from functools import lru_cache @lru_cache(maxsize=128) def expensive_function(n): # Computed result cached return complex_calculation(n)
Lazy Evaluation
# Bad: Creates full list squares = [x**2 for x in range(1000000)] # Good: Generator (lazy) squares = (x**2 for x in range(1000000))
Vectorization (NumPy)
import numpy as np # Bad: Python loop result = [x * 2 + 1 for x in data] # Good: Vectorized result = np.array(data) * 2 + 1
Parallel Processing
from multiprocessing import Pool # Process in parallel with Pool(4) as p: results = p.map(process_item, items)
Compile with Cython/Numba
from numba import jit @jit def fast_function(x, y): # Compiled to machine code return x ** 2 + y ** 2
Database Optimization
Query Optimization
- Use EXPLAIN to analyze queries
- Add indexes on WHERE/JOIN columns
- Avoid SELECT *, fetch only needed columns
- Use LIMIT for pagination
- Batch inserts/updates
Connection Pooling
# Reuse connections pool = ConnectionPool(min=5, max=20)
Caching Layer
- Redis/Memcached for frequently accessed data
- Cache query results
- Set appropriate TTL
Web Performance
Frontend
- Minimize HTTP requests
- Compress assets (gzip/brotli)
- Lazy load images
- Code splitting
- Use CDN
- Browser caching
Backend
- Use reverse proxy (nginx)
- Enable HTTP/2
- Implement rate limiting
- Async processing for slow tasks
- Connection keep-alive
Benchmarking Best Practices
Write Good Benchmarks
import timeit # Run multiple times time = timeit.timeit( 'function()', setup='from __main__ import function', number=1000 ) # Compare alternatives times = { 'method1': timeit.timeit('method1()', ...), 'method2': timeit.timeit('method2()', ...), }
Benchmark Checklist
- Run on representative data
- Include warm-up iterations
- Run multiple times
- Calculate mean and std dev
- Test on target hardware
- Consider different data sizes
Memory Optimization
Reduce Memory Usage
# Use generators instead of lists def read_large_file(file): for line in file: yield process(line) # Use __slots__ for classes class Point: __slots__ = ['x', 'y'] def __init__(self, x, y): self.x = x self.y = y
Find Memory Leaks
# Python memory profiler @profile def my_function(): pass # Check reference counts import sys sys.getrefcount(object)
Shell Script Optimization
# Avoid unnecessary commands # Bad cat file | grep pattern # Good grep pattern file # Use built-ins when possible # Bad result=$(date +%s) # Good (in bash) printf -v result '%(%s)T' -1 # Parallel execution # Process files in parallel find . -name "*.txt" | xargs -P 4 -I {} process {}
When NOT to Optimize
- Code is fast enough for requirements
- Optimization reduces readability significantly
- Maintenance cost outweighs performance gain
- Premature optimization (no profiling data)
- Micro-optimizations with negligible impact
Performance Budgets
Set clear targets:
- Response time: < 200ms
- Page load: < 3s
- API latency: < 100ms
- Memory usage: < 500MB
- CPU usage: < 50%
Monitoring and Alerts
- Set up performance monitoring
- Track key metrics over time
- Alert on regressions
- Profile in production (carefully)
- Use APM tools (New Relic, DataDog, etc.)
Remember: Premature optimization is the root of all evil. Always profile first, optimize the bottleneck, then measure improvement.