Claude-skill-registry Infrastructure Sizing and Capacity Planning

Methods for determining the optimal resource allocation for compute, database, and network systems to balance cost and performance.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/infra-sizing" ~/.claude/skills/majiayu000-claude-skill-registry-infrastructure-sizing-and-capacity-planning && rm -rf "$T"
manifest: skills/data/infra-sizing/SKILL.md
source content

Infrastructure Sizing and Capacity Planning

Overview

Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both Over-provisioning (wasted money) and Under-provisioning (poor performance/outages).

Core Principle: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics."


1. Right-Sizing Principles

Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses Demand-Driven Allocation.

PrincipleDescription
Utilization ThresholdsTarget 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky.
Vertical first...Increase resource limits for single-threaded or monolithic apps.
...Horizontal usuallySpread load across multiple small instances for resilience and elasticity.
Metric-BasedUse P95 or P99 metrics for latency, but Average for base capacity sizing.

2. Compute Sizing (EC2, VMs, GCE)

Step 1: Resource Profiling

Run your app in a staging environment and measure:

  • CPU: Is the app CPU-bound (mathematical calculations, compression)?
  • Memory: Is it memory-bound (caching, large payloads, in-memory DBs)?
  • Thread Usage: How many concurrent requests can one CPU core handle?

Step 2: Instance Family Selection

FamilyBest ForAWS ExampleGCP Example
General PurposeBalanced workloads, small DBs
t3
,
m6g
n2
,
e2
Compute OptimizedBatch processing, high-traffic APIs
c6g
,
c7i
c2
,
c3
Memory OptimizedRedis, high-RAM DBs, Analytics
r6g
,
x2
m1
,
m2

Sizing Formula (Basic)

Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)


3. Database Sizing (RDS, Cloud SQL, Azure SQL)

IOPS (Input/Output Operations Per Second)

Disk performance is often the bottleneck, not CPU.

  • GP3 (AWS): Baseline 3,000 IOPS included. Provision more for heavy writes.
  • Provisioned IOPS (io2): For high-performance transactional DBs.

Storage Growth Calculation

Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)

  • Buffer: Always keep 20% free to allow for indexing and temp file creation.

Connection Pool Sizing

Max Connections = (Instance RAM / 10MB) - (System Reserve)

  • Too many connections lead to high "Context Switching" and performance degradation.

4. Cache Sizing (Redis/Memcached)

Caching is a trade-off between Memory Cost and Latency Benefits.

Formula: Working Set Size

Not all data needs to be in cache. Only store the Working Set (frequently accessed data).

  1. Measure Total Data Size.
  2. Analyze Access Distribution (Pareto Principle: 80% access to 20% data).
  3. Cache Size = 20-30% of Total Data Size.

Eviction Policy Impact

  • allkeys-lru: Best for general caching.
  • noeviction: Returns errors when full (dangerous).

5. Container Sizing (Kubernetes)

Understanding the difference between Requests and Limits is critical for both stability and cost.

MetricPurposeCost Impact
RequestsKubernetes guarantees this capacity. Used for scheduling.High: Cloud Providers charge based on requests.
LimitsThe maximum a pod can burst to.Low: Generally doesn't impact cost unless using serverless K8s.

The "OOMKill" Trap

If

Memory Requests < Actual Usage
, the pod might be scheduled on a node that runs out of RAM, leading to an OOMKill (Out Of Memory).


6. Serverless Sizing (Lambda / Cloud Functions)

Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you.

  • Power Tuning: In AWS Lambda, increasing Memory also increases CPU proportionaly.
  • Strategy: Use
    AWS Lambda Power Tuning
    to find the "Sweet Spot" where performance and cost intersect.
Memory (MB)Duration (ms)Cost ($)Result
12810000.0000021Slow
5122000.0000016Winner (Faster & Cheaper)
10241500.0000025Diminishing returns

7. Network and CDN Sizing

  • Throughput: Measure P99 payload size * Peak requests per second.
  • CDN Coverage: What % of your traffic can be served from the edge?
    • Goal: > 80% Cache Hit Ratio for static assets.
    • Impact: CDN bandwidth is 50-70% cheaper than origin egress.

8. Load Testing for Capacity Planning

Never size based on assumptions. Use tools like k6, Locust, or JMeter.

  1. Stepping Test: Gradually increase users until latency spikes (The "Knee" of the curve).
  2. Soak Test: Run at 80% load for 24 hours to find memory leaks.
  3. Stress Test: Find the "Breaking Point" to configure failover/auto-scaling.

9. Monitoring for Right-Sizing

The Dashboard Template (Grafana/Datadog)

  • CPU Heatmap: Identify idle periods (e.g., weekends).
  • RAM Saturation: Identify "Memory Bloat".
  • Disk Queue Depth: Identify IOPS bottlenecks.
  • Network In/Out: Identify efficient vs inefficient regions.

Automated Right-Sizing Tools

  • AWS Compute Optimizer: Provides JSON recommendations for instance types.
  • VPA (Vertical Pod Autoscaler): Automatically adjusts K8s requests/limits.
  • Goldilocks: A K8s dashboard that visualizes VPA recommendations.

10. Capacity Planning Template

ComponentMetricCurrent LoadGrowth (6mo)BufferTarget Spec
Web TierPeak Req/sec5002x (1000)20%4x c6g.large
DatabaseStorage500GB+100GB/mo30%1.5TB GP3
CacheWorking Set8GB12GB10%16GB Node

11. Real Sizing Scenario: SaaS API

  • Initial Setup: 10 nodes of
    m5.xlarge
    (4 vCPU, 16GB RAM). Monthly cost: $1,400.
  • Observation: CPU average 12%, RAM average 40%.
  • Analysis: The app is memory-bound, but CPU is idle.
  • Action: Switched to 5 nodes of
    t3.large
    ($350/mo) + enabled Auto-scaling.
  • Result: 75% cost reduction while maintaining the same performance metrics.

Related Skills

  • 40-system-resilience/graceful-degradation
  • 42-cost-engineering/cloud-cost-models
  • 42-cost-engineering/budget-guardrails