Claude-code-plugins replit-load-scale
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/replit-pack/skills/replit-load-scale" ~/.claude/skills/jeremylongshore-claude-code-plugins-replit-load-scale && rm -rf "$T"
manifest:
plugins/saas-packs/replit-pack/skills/replit-load-scale/SKILL.mdsource content
Replit Load & Scale
Overview
Load testing, scaling strategies, and capacity planning for Replit deployments. Covers Autoscale behavior tuning, Reserved VM right-sizing, cold start optimization, database connection scaling, and capacity benchmarking.
Prerequisites
- Replit app deployed (Autoscale or Reserved VM)
- Load testing tool: k6, autocannon, or curl
- Health endpoint implemented
Replit Scaling Model
| Deployment Type | Scaling Behavior | Cold Start | Best For |
|---|---|---|---|
| Autoscale | 0 to N instances based on traffic | Yes (5-30s) | Variable traffic |
| Reserved VM | Fixed resources, always-on | No | Consistent traffic |
| Static | CDN-backed, infinite scale | No | Frontend assets |
Instructions
Step 1: Baseline Benchmark
# Quick benchmark with autocannon (built into Node.js ecosystem) npx autocannon -c 10 -d 30 https://your-app.replit.app/health # -c 10: 10 concurrent connections # -d 30: 30 seconds duration # Output shows: # - Requests/sec # - Latency (p50, p95, p99) # - Throughput (bytes/sec) # - Error count
Step 2: Load Test with k6
// load-test.js — comprehensive Replit load test import http from 'k6/http'; import { check, sleep } from 'k6'; import { Rate, Trend } from 'k6/metrics'; const errorRate = new Rate('errors'); const coldStartTrend = new Trend('cold_start_duration'); export const options = { stages: [ { duration: '1m', target: 5 }, // Warm up { duration: '3m', target: 20 }, // Normal load { duration: '2m', target: 50 }, // Peak load { duration: '1m', target: 0 }, // Cool down ], thresholds: { http_req_duration: ['p(95)<2000'], // 95% of requests under 2s errors: ['rate<0.05'], // Error rate under 5% }, }; const BASE_URL = __ENV.DEPLOY_URL || 'https://your-app.replit.app'; export default function () { // Health check const healthRes = http.get(`${BASE_URL}/health`); check(healthRes, { 'health returns 200': (r) => r.status === 200, 'health under 1s': (r) => r.timings.duration < 1000, }); errorRate.add(healthRes.status !== 200); // Detect cold start if (healthRes.timings.duration > 5000) { coldStartTrend.add(healthRes.timings.duration); } // API endpoint const apiRes = http.get(`${BASE_URL}/api/status`); check(apiRes, { 'api returns 200': (r) => r.status === 200, }); sleep(1); }
# Run k6 load test k6 run --env DEPLOY_URL=https://your-app.replit.app load-test.js # With JSON output k6 run --out json=results.json load-test.js
Step 3: Cold Start Optimization (Autoscale)
Autoscale cold starts happen when: - First request after period of no traffic - Replit needs to start a new container instance - Typical: 5-30 seconds depending on app size Reduction strategies: 1. Minimize startup imports (lazy-load heavy modules) 2. Use smaller Nix dependency set 3. Pre-connect database in background (don't block startup) 4. Keep package count low 5. Use compiled JavaScript (not tsx at runtime) Before (slow cold start): run = "npx tsx src/index.ts" → compiles TS at startup After (fast cold start): build = "npm run build" → compiles during deploy run = "node dist/index.js" → runs pre-compiled JS
# .replit — optimized for fast cold start [deployment] build = ["sh", "-c", "npm ci --production && npm run build"] run = ["sh", "-c", "node dist/index.js"] deploymentTarget = "autoscale"
Step 4: Reserved VM Sizing
Choose VM size based on load test results: If peak CPU < 30% → downsize (save money) If peak CPU > 70% → upsize (prevent throttling) If peak memory > 80% → upsize (prevent OOM) Machine sizes: 0.25 vCPU / 512 MB → Simple APIs, < 50 req/s 0.5 vCPU / 1 GB → Standard apps, < 200 req/s 1 vCPU / 2 GB → Moderate traffic, < 500 req/s 2 vCPU / 4 GB → High traffic, < 1000 req/s 4 vCPU / 8-16 GB → Compute-heavy, > 1000 req/s To change: Deployment Settings > Machine Size > Select new tier Redeployment required to apply
Step 5: Database Connection Scaling
// Tune PostgreSQL pool for Replit container limits import { Pool } from 'pg'; // Small container (0.25 vCPU / 512 MB) const smallPool = new Pool({ connectionString: process.env.DATABASE_URL, ssl: { rejectUnauthorized: false }, max: 3, // Few connections idleTimeoutMillis: 10000, // Release quickly }); // Medium container (1 vCPU / 2 GB) const mediumPool = new Pool({ connectionString: process.env.DATABASE_URL, ssl: { rejectUnauthorized: false }, max: 10, // More headroom idleTimeoutMillis: 30000, }); // Large container (4 vCPU / 8 GB) const largePool = new Pool({ connectionString: process.env.DATABASE_URL, ssl: { rejectUnauthorized: false }, max: 20, idleTimeoutMillis: 60000, }); // Dynamic pool sizing based on container resources function createOptimalPool(): Pool { const memMB = Math.round(process.memoryUsage().rss / 1024 / 1024); const maxConns = memMB < 256 ? 3 : memMB < 1024 ? 10 : 20; return new Pool({ connectionString: process.env.DATABASE_URL, ssl: { rejectUnauthorized: false }, max: maxConns, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000, }); }
Step 6: Capacity Planning Template
## Capacity Assessment ### Current State - Deployment type: [Autoscale / Reserved VM] - Machine size: [vCPU / RAM] - Peak RPS: [from load test] - P95 latency: [from load test] - Cold start time: [Autoscale only] ### Load Test Results | Metric | Idle | Normal (20 VU) | Peak (50 VU) | |--------|------|----------------|--------------| | RPS | 0 | X | Y | | P50 latency | - | Xms | Yms | | P95 latency | - | Xms | Yms | | Error rate | - | X% | Y% | | Memory | XMB | XMB | XMB | ### Recommendations 1. [Scale action based on results] 2. [Database pool adjustment] 3. [Cold start mitigation] 4. [Cost optimization] ### Scaling Triggers - CPU > 70% sustained: upgrade VM - Memory > 80%: upgrade VM or fix leak - P95 > 2s: add caching or optimize queries - Error rate > 1%: investigate root cause
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Cold start > 15s | Heavy startup | Pre-compile, lazy imports |
| Connection pool exhausted | Too many concurrent requests | Increase pool.max or add queueing |
| OOM during load test | Memory leak under load | Profile with /debug/memory |
| Inconsistent results | Autoscale scaling up | Warm up before measuring |
Resources
Next Steps
For reliability patterns, see
replit-reliability-patterns.