Claude-skill-registry cc-performance-tuning
Enforce measure-first discipline for code optimization using a 7-step gated decision tree and 40-item checklist. Use when code is too slow, has performance issues, timeouts, OOM errors, high CPU/memory, or doesn't scale. Triggers on: profiler hot spots, latency complaints, unresponsive UI, memory allocation slow, needs optimization. Produce violation/warning/pass table with evidence.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cc-performance-tuning" ~/.claude/skills/majiayu000-claude-skill-registry-cc-performance-tuning && rm -rf "$T"
skills/data/cc-performance-tuning/SKILL.mdSkill: cc-performance-tuning
STOP - Measure First
- Don't optimize based on intuition—profile first
- Correctness before speed - Make it work, then make it fast
- <4% of code causes >50% of runtime - Find the hot spot before touching anything
CRITICAL: Even In Emergencies
Production down? Losing money? User panicking? Especially then:
- Guessing wrong costs MORE time than 60-second profiling
- Multiple shotgun changes make rollback impossible
- Wrong "fix" can mask the real problem for days
Minimum crisis protocol (non-negotiable):
- 60-second profiler check OR recent deployment/config check
- ONE change at a time
- Revert immediately if no improvement within 5 minutes
Scope & Limitations
This skill covers: Single-threaded, single-process code tuning for general-purpose computing.
NOT covered (need specialized guidance):
- Concurrency: Lock contention often dominates; profile thread states, not just CPU
- Distributed systems: Network latency ~10,000x memory; optimize RPC/serialization first
- Real-time systems: Need worst-case latency, not average; caching adds variance
- Embedded/constrained: Memory/power budgets require different tradeoffs
Quick Reference
| Threshold/Rule | Value | Source |
|---|---|---|
| Hot spot concentration | <4% causes >50% runtime | Knuth 1971 |
| Failed optimization rate | >50% produce negligible or negative results | p.607 |
| Compiler optimization gains | 40-59% improvement possible | p.596 |
| Interpreted vs compiled | PHP/Python >100x slower than C++ | Table 25-1 |
| I/O vs memory | ~1000x difference | p.591 |
Key Principles:
- Make it correct first, then make it fast
- Measure before AND after every optimization
- Profile to find hot spots; never guess
- Compiler optimization often beats manual tuning
- Code tuning is the LAST resort, not first
Core Patterns
PREREQUISITE: Only apply these patterns AFTER profiling confirms the specific code is in the <4% hot path. Applying without measurement is a skill violation.
Page Fault Loop Ordering
// BEFORE: Causes page fault on every access [ANTI-PATTERN] for (column = 0; column < MAX_COLUMNS; column++) { for (row = 0; row < MAX_ROWS; row++) { table[row][column] = BlankTableElement(); } } // AFTER: Page fault only when switching rows (up to 1000x faster) for (row = 0; row < MAX_ROWS; row++) { for (column = 0; column < MAX_COLUMNS; column++) { table[row][column] = BlankTableElement(); } }
Sentinel Value in Search Loop
// BEFORE: Compound test every iteration [ANTI-PATTERN] found = false; i = 0; while (!found && i < count) { if (item[i] == target) found = true; i++; } // AFTER: Single test per iteration (23-65% faster) // Place sentinel past end of search range item[count] = target; // sentinel i = 0; while (item[i] != target) { i++; } if (i < count) { // found at position i }
Loop Unswitching
// BEFORE: Testing invariant condition every iteration [ANTI-PATTERN] for (i = 0; i < count; i++) { if (type == TYPE_A) { processTypeA(item[i]); } else { processTypeB(item[i]); } } // AFTER: Test once outside loop (19-28% faster) if (type == TYPE_A) { for (i = 0; i < count; i++) { processTypeA(item[i]); } } else { for (i = 0; i < count; i++) { processTypeB(item[i]); } }
Strength Reduction in Expression
// BEFORE: Expensive operation [ANTI-PATTERN] if (Math.sqrt(x) < Math.sqrt(y)) { // ... } // AFTER: Algebraically equivalent, 90-99.9% faster if (x < y) { // when x,y >= 0 // ... }
APPLIER: When to Optimize
Decision Tree (STRICT ORDER - Do NOT skip steps):
Each step is a gate. Skipping steps = wasted effort or masked problems.
1. Is the program correct and complete? NO -> Make it correct first. STOP optimization. YES -> Continue 2. Have you measured to find the actual bottleneck? NO -> Profile/measure first. Do NOT guess. YES -> Continue 3. Can requirements be relaxed? YES -> Relax requirements. Done. NO -> Continue 4. Can design/architecture solve it? YES -> Fix design. Done. NO -> Continue 5. Can algorithm/data structure solve it? YES -> Change algorithm. Done. NO -> Continue 6. Can compiler flags help? YES -> Enable optimizations. Measure. NO -> Continue 7. Is it in the <4% that causes >50% of runtime? NO -> Do NOT optimize this code. Find actual hot spot. YES -> PROCEED with code tuning
Code Tuning Procedure (STRICT ORDER)
1. Save working version (cannot revert without backup) 2. Make ONE change (multiple changes = unmeasurable) 3. Measure improvement (same workload, before/after) 4. Keep if faster, revert if not (no "close enough") 5. Repeat
Technique Priority (by category)
Logic:
- Stop testing when answer known (use break, short-circuit)
- Order tests by frequency (most common first)
- Substitute table lookups for complex logic
- Use lazy evaluation
Loops:
- Unswitch (move invariant tests outside)
- Jam/fuse loops operating on same range
- Put busiest loop on inside
- Minimize work inside loops
- Use sentinel values for search loops
- Unroll ONLY if measured (can be -27% in Python!)
Data:
- Use integers instead of floating-point when possible
- Use fewest array dimensions
- Cache frequently computed values
- Precompute results where practical
Expressions:
- Initialize at compile time
- Exploit algebraic identities
- Use strength reduction (multiplication -> addition)
- Eliminate common subexpressions
CHECKER: Review for Anti-Patterns
Checklist: checklists.md Output Format:
| Item | Status | Evidence | Location |
|---|---|---|---|
| Measured before tuning? | VIOLATION | No profiler/measurement found | N/A |
| Loop unswitching opportunity | WARNING | Invariant inside loop | app.py:142 |
| Severity: |
- VIOLATION: Clear anti-pattern present
- WARNING: Potential issue (needs measurement)
- PASS: No obvious performance issues
Rationalization Counters
| Excuse | Reality |
|---|---|
| "Fewer lines of code is faster" | No predictable relationship; unrolled loops often 60-74% faster [p.603] |
| "I know this operation is slow" | You must measure; rules change with every environment change |
| "I'll optimize as I go" | You'll spend 96% of time on code that doesn't matter [Pareto] |
| "Experience tells me where bottlenecks are" | No programmer has ever predicted bottlenecks without data [Newcomer] |
| "This clever trick will be faster" | Compilers optimize straightforward code better than tricky code [p.596] |
| "We need to rewrite in assembler now" | Usually <4% of code causes >50% of runtime; find it first [Knuth 1971] |
| "Fast code is as important as correct code" | Correct first, fast second. Always. |
| "I already optimized this; it will stay optimized" | Re-profile after any compiler/library/environment change |
| "This optimization always works" | Results vary wildly by language; Python -27% for loop unrolling [p.623] |
| "The theory says this should be faster" | Theory doesn't always hold; Visual Basic -94% on polynomial strength reduction [p.636] |
| "I don't need to profile small changes" | If not worth profiling, not worth degrading readability for [p.609] |
| Crisis: "We're losing $X/minute!" | Guessing wrong = paying that $X until you find real cause. 60-sec profile saves hours. |
| Crisis: "No time to profile!" | Wrong guess costs more time than profiling. Panic causes cascading errors. |
| Sunk cost: "I already spent 4 hours optimizing" | Time invested doesn't validate method. Revert all, apply with measurement. |
| Sunk cost: "It seems faster now" | "Seems faster" is not data. You may have made some faster, others slower. |
| Success streak: "I've been right 5 times" | Past success doesn't change physics. Calibration illusion: 5 wins don't predict win 6. |
Chain
| After | Next |
|---|---|
| Optimization complete | Verify design not degraded |
| Structure degraded | cc-refactoring-guidance |