Claude-skill-registry Infrastructure Sizing and Capacity Planning

Methods for determining the optimal resource allocation for compute, database, and network systems to balance cost and performance.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/infra-sizing" ~/.claude/skills/majiayu000-claude-skill-registry-infrastructure-sizing-and-capacity-planning && rm -rf "$T"

manifest: skills/data/infra-sizing/SKILL.md

Infrastructure Sizing and Capacity Planning

Overview

Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).

Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."

1. Right-Sizing Principles

Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.

Principle	Description
Utilization Thresholds	Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky.
Vertical first...	Increase resource limits for single-threaded or monolithic apps.
...Horizontal usually	Spread load across multiple small instances for resilience and elasticity.
Metric-Based	Use P95 or P99 metrics for latency, but Average for base capacity sizing.

2. Compute Sizing (EC2, VMs, GCE)

Step 1: Resource Profiling

Run your app in a staging environment and measure:

CPU: Is the app CPU-bound (mathematical calculations, compression)?
Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
Thread Usage: How many concurrent requests can one CPU core handle?

Step 2: Instance Family Selection

Family Best For AWS Example GCP Example

General Purpose

Balanced workloads, small DBs

t3

m6g

n2

e2

Compute Optimized

Batch processing, high-traffic APIs

c6g

c7i

c2

c3

Memory Optimized

Redis, high-RAM DBs, Analytics

r6g

x2

m1

m2

Sizing Formula (Basic)

Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)

3. Database Sizing (RDS, Cloud SQL, Azure SQL)

IOPS (Input/Output Operations Per Second)

Disk performance is often the bottleneck, not CPU.

GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
Provisioned IOPS (io2): For high-performance transactional DBs.

Storage Growth Calculation

Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)

Buffer: Always keep 20% free to allow for indexing and temp file creation.

Connection Pool Sizing

Max Connections = (Instance RAM / 10MB) - (System Reserve)

Too many connections lead to high "Context Switching" and performance degradation.

4. Cache Sizing (Redis/Memcached)

Caching is a trade-off between Memory Cost and Latency Benefits.

Formula: Working Set Size

Not all data needs to be in cache. Only store the Working Set (frequently accessed data).

Measure Total Data Size.
Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
Cache Size = 20-30% of Total Data Size.

Eviction Policy Impact

allkeys-lru: Best for general caching.
noeviction: Returns errors when full (dangerous).

5. Container Sizing (Kubernetes)

Understanding the difference between Requests and Limits is critical for both stability and cost.

Metric	Purpose	Cost Impact
Requests	Kubernetes guarantees this capacity. Used for scheduling.	High: Cloud Providers charge based on requests.
Limits	The maximum a pod can burst to.	Low: Generally doesn't impact cost unless using serverless K8s.

The "OOMKill" Trap

Memory Requests < Actual Usage

, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).

6. Serverless Sizing (Lambda / Cloud Functions)

Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.

Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
Strategy: Use
```
AWS Lambda Power Tuning
```
to find the "Sweet Spot" where performance and cost intersect.

Memory (MB)	Duration (ms)	Cost ($)	Result
128	1000	0.0000021	Slow
512	200	0.0000016	Winner (Faster & Cheaper)
1024	150	0.0000025	Diminishing returns

7. Network and CDN Sizing

Throughput: Measure P99 payload size * Peak requests per second.
CDN Coverage: What % of your traffic can be served from the edge?
- Goal: > 80% Cache Hit Ratio for static assets.
- Impact: CDN bandwidth is 50-70% cheaper than origin egress.

8. Load Testing for Capacity Planning

Never size based on assumptions. Use tools like k6, Locust, or JMeter.

Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
Soak Test: Run at 80% load for 24 hours to find memory leaks.
Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.

9. Monitoring for Right-Sizing

The Dashboard Template (Grafana/Datadog)

CPU Heatmap: Identify idle periods (e.g., weekends).
RAM Saturation: Identify "Memory Bloat".
Disk Queue Depth: Identify IOPS bottlenecks.
Network In/Out: Identify efficient vs inefficient regions.

Automated Right-Sizing Tools

AWS Compute Optimizer: Provides JSON recommendations for instance types.
VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
Goldilocks: A K8s dashboard that visualizes VPA recommendations.

10. Capacity Planning Template

Component	Metric	Current Load	Growth (6mo)	Buffer	Target Spec
Web Tier	Peak Req/sec	500	2x (1000)	20%	4x c6g.large
Database	Storage	500GB	+100GB/mo	30%	1.5TB GP3
Cache	Working Set	8GB	12GB	10%	16GB Node

11. Real Sizing Scenario: SaaS API

Initial Setup: 10 nodes of
```
m5.xlarge
```
(4 vCPU, 16GB RAM). Monthly cost: $1,400.
Observation: CPU average 12%, RAM average 40%.
Analysis: The app is memory-bound, but CPU is idle.
Action: Switched to 5 nodes of
```
t3.large
```
($350/mo) + enabled Auto-scaling.
Result: 75% cost reduction while maintaining the same performance metrics.

Related Skills

40-system-resilience/graceful-degradation

```
42-cost-engineering/cloud-cost-models
```
```
42-cost-engineering/budget-guardrails
```