Marketplace when-profiling-performance-use-performance-profiler
Comprehensive performance profiling, bottleneck detection, and optimization system
install
source · Clone the upstream repo
git clone https://github.com/aiskillstore/marketplace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/dnyoussef/when-profiling-performance-use-performance-profiler" ~/.claude/skills/aiskillstore-marketplace-when-profiling-performance-use-performance-profiler && rm -rf "$T"
manifest:
skills/dnyoussef/when-profiling-performance-use-performance-profiler/SKILL.mdsource content
Performance Profiler Skill
Overview
When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.
MECE Breakdown
Mutually Exclusive Components:
- Baseline Phase: Establish current performance metrics
- Detection Phase: Identify bottlenecks and hot paths
- Analysis Phase: Root cause analysis and impact assessment
- Optimization Phase: Generate and prioritize recommendations
- Implementation Phase: Apply optimizations with agent assistance
- Validation Phase: Benchmark improvements and verify gains
Collectively Exhaustive Coverage:
- CPU Profiling: Function execution time, hot paths, call graphs
- Memory Profiling: Heap usage, allocations, leaks, garbage collection
- I/O Profiling: File system, database, network latency
- Network Profiling: Request timing, bandwidth, connection pooling
- Concurrency: Thread utilization, lock contention, async operations
- Algorithm Analysis: Time complexity, space complexity
- Cache Analysis: Hit rates, cache misses, invalidation patterns
- Database: Query performance, N+1 problems, index usage
Features
Core Capabilities:
- Multi-dimensional performance profiling (CPU, memory, I/O, network)
- Automated bottleneck detection with prioritization
- Real-time profiling and historical analysis
- Flame graph generation for visual analysis
- Memory leak detection and heap snapshots
- Database query optimization
- Algorithmic complexity analysis
- A/B comparison of before/after optimizations
- Production-safe profiling with minimal overhead
- Integration with APM tools (New Relic, DataDog, etc.)
Profiling Modes:
- Quick Scan: 30-second lightweight profiling
- Standard: 5-minute comprehensive analysis
- Deep: 30-minute detailed investigation
- Continuous: Long-running production monitoring
- Stress Test: Load-based profiling under high traffic
Usage
Slash Command:
/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]
Subagent Invocation:
Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")
MCP Tool:
mcp__performance-profiler__analyze({ project_path: "./app", profiling_mode: "standard", targets: ["cpu", "memory", "io"], generate_optimizations: true })
Architecture
Phase 1: Baseline Measurement
- Establish current performance metrics
- Define performance budgets
- Set up monitoring infrastructure
- Capture baseline snapshots
Phase 2: Bottleneck Detection
- CPU profiling (sampling or instrumentation)
- Memory profiling (heap analysis)
- I/O profiling (syscall tracing)
- Network profiling (packet analysis)
- Database profiling (query logs)
Phase 3: Root Cause Analysis
- Correlate metrics across dimensions
- Identify causal relationships
- Calculate performance impact
- Prioritize issues by severity
Phase 4: Optimization Generation
- Algorithmic improvements
- Caching strategies
- Parallelization opportunities
- Database query optimization
- Memory optimization
- Network optimization
Phase 5: Implementation
- Generate optimized code with coder agent
- Apply database optimizations
- Configure caching layers
- Implement parallelization
Phase 6: Validation
- Run benchmark suite
- Compare before/after metrics
- Verify no regressions
- Generate performance report
Output Formats
Performance Report:
{ "project": "my-app", "profiling_mode": "standard", "duration_seconds": 300, "baseline": { "requests_per_second": 1247, "avg_response_time_ms": 123, "p95_response_time_ms": 456, "p99_response_time_ms": 789, "cpu_usage_percent": 67, "memory_usage_mb": 512, "error_rate_percent": 0.1 }, "bottlenecks": [ { "type": "cpu", "severity": "high", "function": "processData", "time_percent": 34.5, "calls": 123456, "avg_time_ms": 2.3, "recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)" } ], "optimizations": [...], "estimated_improvement": { "throughput_increase": "3.2x", "latency_reduction": "68%", "memory_reduction": "45%" } }
Flame Graph:
Interactive SVG flame graph showing call stack with time proportions
Heap Snapshot:
Memory allocation breakdown with retention paths
Optimization Report:
Prioritized list of actionable improvements with code examples
Examples
Example 1: Quick CPU Profiling
/profile ./my-app --mode quick --target cpu
Example 2: Deep Memory Analysis
/profile ./my-app --mode deep --target memory --detect-leaks
Example 3: Full Stack Optimization
/profile ./my-app --mode standard --target all --optimize --benchmark
Example 4: Database Query Optimization
/profile ./my-app --mode standard --target io --database --explain-queries
Integration with Claude-Flow
Coordination Pattern:
// Step 1: Initialize profiling swarm mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 }) // Step 2: Spawn specialized agents [Parallel Execution]: Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer") Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer") Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer") Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer") Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer") // Step 3: Implementation agent applies optimizations [Sequential Execution]: Task("Coder", "Implement recommended optimizations from profiling analysis", "coder") Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")
Configuration
Default Settings:
{ "profiling": { "sampling_rate_hz": 99, "stack_depth": 128, "include_native_code": false, "track_allocations": true }, "thresholds": { "cpu_hot_path_percent": 10, "memory_leak_growth_mb": 10, "slow_query_ms": 100, "slow_request_ms": 1000 }, "optimization": { "auto_apply": false, "require_approval": true, "run_tests_before": true, "run_benchmarks_after": true }, "output": { "flame_graph": true, "heap_snapshot": true, "call_tree": true, "recommendations": true } }
Profiling Techniques
CPU Profiling:
- Sampling: Periodic stack sampling (low overhead)
- Instrumentation: Function entry/exit hooks (accurate but higher overhead)
- Tracing: Event-based profiling
Memory Profiling:
- Heap Snapshots: Point-in-time memory state
- Allocation Tracking: Record all allocations
- Leak Detection: Compare snapshots over time
- GC Analysis: Garbage collection patterns
I/O Profiling:
- Syscall Tracing: Track system calls (strace, dtrace)
- File System: Monitor read/write operations
- Database: Query logging and EXPLAIN ANALYZE
- Network: Packet capture and request timing
Concurrency Profiling:
- Thread Analysis: CPU utilization per thread
- Lock Contention: Identify blocking operations
- Async Operations: Promise/callback timing
Performance Optimization Strategies
Algorithmic:
- Reduce time complexity (O(n²) → O(n log n))
- Use appropriate data structures
- Eliminate unnecessary work
- Memoization and dynamic programming
Caching:
- In-memory caching (Redis, Memcached)
- CDN for static assets
- HTTP caching headers
- Query result caching
Parallelization:
- Multi-threading
- Worker pools
- Async I/O
- Batching operations
Database:
- Add missing indexes
- Optimize queries
- Reduce N+1 queries
- Connection pooling
- Read replicas
Memory:
- Object pooling
- Reduce allocations
- Stream processing
- Compression
Network:
- Connection keep-alive
- HTTP/2 or HTTP/3
- Compression
- Request batching
- Rate limiting
Performance Budgets
Frontend:
- Time to First Byte (TTFB): < 200ms
- First Contentful Paint (FCP): < 1.8s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Total Blocking Time (TBT): < 200ms
- Cumulative Layout Shift (CLS): < 0.1
Backend:
- API Response Time (p50): < 100ms
- API Response Time (p95): < 500ms
- API Response Time (p99): < 1000ms
- Throughput: > 1000 req/s
- Error Rate: < 0.1%
- CPU Usage: < 70%
- Memory Usage: < 80%
Database:
- Query Time (p50): < 10ms
- Query Time (p95): < 50ms
- Query Time (p99): < 100ms
- Connection Pool Utilization: < 80%
Best Practices
- Profile production workloads when possible
- Use production-like data volumes
- Profile under realistic load
- Measure multiple times for consistency
- Focus on p95/p99, not just averages
- Optimize bottlenecks in order of impact
- Always benchmark before and after
- Monitor for regressions in CI/CD
- Set up continuous profiling
- Track performance over time
Troubleshooting
Issue: High CPU usage but no obvious hot path
Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation
Issue: Memory grows continuously
Solution: Run heap snapshot comparison to identify leak sources
Issue: Slow database queries
Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans
Issue: High latency but low CPU
Solution: Profile I/O operations, check for blocking synchronous calls
See Also
- PROCESS.md - Detailed step-by-step profiling workflow
- README.md - Quick start guide
- subagent-performance-profiler.md - Agent implementation details
- slash-command-profile.sh - Command-line interface
- mcp-performance-profiler.json - MCP tool schema