Claude-skill-registry Infrastructure Sizing and Capacity Planning
Methods for determining the optimal resource allocation for compute, database, and network systems to balance cost and performance.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/infra-sizing" ~/.claude/skills/majiayu000-claude-skill-registry-infrastructure-sizing-and-capacity-planning && rm -rf "$T"
skills/data/infra-sizing/SKILL.mdInfrastructure Sizing and Capacity Planning
Overview
Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).
Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."
1. Right-Sizing Principles
Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.
| Principle | Description |
|---|---|
| Utilization Thresholds | Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky. |
| Vertical first... | Increase resource limits for single-threaded or monolithic apps. |
| ...Horizontal usually | Spread load across multiple small instances for resilience and elasticity. |
| Metric-Based | Use P95 or P99 metrics for latency, but Average for base capacity sizing. |
2. Compute Sizing (EC2, VMs, GCE)
Step 1: Resource Profiling
Run your app in a staging environment and measure:
- CPU: Is the app CPU-bound (mathematical calculations, compression)?
- Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
- Thread Usage: How many concurrent requests can one CPU core handle?
Step 2: Instance Family Selection
| Family | Best For | AWS Example | GCP Example |
|---|---|---|---|
| General Purpose | Balanced workloads, small DBs | , | , |
| Compute Optimized | Batch processing, high-traffic APIs | , | , |
| Memory Optimized | Redis, high-RAM DBs, Analytics | , | , |
Sizing Formula (Basic)
Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)
3. Database Sizing (RDS, Cloud SQL, Azure SQL)
IOPS (Input/Output Operations Per Second)
Disk performance is often the bottleneck, not CPU.
- GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
- Provisioned IOPS (io2): For high-performance transactional DBs.
Storage Growth Calculation
Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)
- Buffer: Always keep 20% free to allow for indexing and temp file creation.
Connection Pool Sizing
Max Connections = (Instance RAM / 10MB) - (System Reserve)
- Too many connections lead to high "Context Switching" and performance degradation.
4. Cache Sizing (Redis/Memcached)
Caching is a trade-off between Memory Cost and Latency Benefits.
Formula: Working Set Size
Not all data needs to be in cache. Only store the Working Set (frequently accessed data).
- Measure Total Data Size.
- Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
- Cache Size = 20-30% of Total Data Size.
Eviction Policy Impact
- allkeys-lru: Best for general caching.
- noeviction: Returns errors when full (dangerous).
5. Container Sizing (Kubernetes)
Understanding the difference between Requests and Limits is critical for both stability and cost.
| Metric | Purpose | Cost Impact |
|---|---|---|
| Requests | Kubernetes guarantees this capacity. Used for scheduling. | High: Cloud Providers charge based on requests. |
| Limits | The maximum a pod can burst to. | Low: Generally doesn't impact cost unless using serverless K8s. |
The "OOMKill" Trap
If
Memory Requests < Actual Usage, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).
6. Serverless Sizing (Lambda / Cloud Functions)
Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.
- Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
- Strategy: Use
to find the "Sweet Spot" where performance and cost intersect.AWS Lambda Power Tuning
| Memory (MB) | Duration (ms) | Cost ($) | Result |
|---|---|---|---|
| 128 | 1000 | 0.0000021 | Slow |
| 512 | 200 | 0.0000016 | Winner (Faster & Cheaper) |
| 1024 | 150 | 0.0000025 | Diminishing returns |
7. Network and CDN Sizing
- Throughput: Measure P99 payload size * Peak requests per second.
- CDN Coverage: What % of your traffic can be served from the edge?
- Goal: > 80% Cache Hit Ratio for static assets.
- Impact: CDN bandwidth is 50-70% cheaper than origin egress.
8. Load Testing for Capacity Planning
Never size based on assumptions. Use tools like k6, Locust, or JMeter.
- Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
- Soak Test: Run at 80% load for 24 hours to find memory leaks.
- Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.
9. Monitoring for Right-Sizing
The Dashboard Template (Grafana/Datadog)
- CPU Heatmap: Identify idle periods (e.g., weekends).
- RAM Saturation: Identify "Memory Bloat".
- Disk Queue Depth: Identify IOPS bottlenecks.
- Network In/Out: Identify efficient vs inefficient regions.
Automated Right-Sizing Tools
- AWS Compute Optimizer: Provides JSON recommendations for instance types.
- VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
- Goldilocks: A K8s dashboard that visualizes VPA recommendations.
10. Capacity Planning Template
| Component | Metric | Current Load | Growth (6mo) | Buffer | Target Spec |
|---|---|---|---|---|---|
| Web Tier | Peak Req/sec | 500 | 2x (1000) | 20% | 4x c6g.large |
| Database | Storage | 500GB | +100GB/mo | 30% | 1.5TB GP3 |
| Cache | Working Set | 8GB | 12GB | 10% | 16GB Node |
11. Real Sizing Scenario: SaaS API
- Initial Setup: 10 nodes of
(4 vCPU, 16GB RAM). Monthly cost: $1,400.m5.xlarge - Observation: CPU average 12%, RAM average 40%.
- Analysis: The app is memory-bound, but CPU is idle.
- Action: Switched to 5 nodes of
($350/mo) + enabled Auto-scaling.t3.large - Result: 75% cost reduction while maintaining the same performance metrics.
Related Skills
40-system-resilience/graceful-degradation42-cost-engineering/cloud-cost-models42-cost-engineering/budget-guardrails