Harness-engineering perf-connection-pooling

Connection Pooling

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/perf-connection-pooling" ~/.claude/skills/intense-visions-harness-engineering-perf-connection-pooling-b20127 && rm -rf "$T"

manifest: agents/skills/claude-code/perf-connection-pooling/SKILL.md

source content

Connection Pooling

Master database connection pooling — pool sizing formulas, connection lifecycle overhead, PgBouncer transaction-mode pooling, serverless connection management, pool monitoring and diagnostics, and configuration for PostgreSQL, MySQL, and managed database services.

When to Use

Database connections are exhausted under load ("too many connections" errors)
Request latency includes 20-50ms for new connection establishment per query
Serverless functions create excessive database connections on cold starts
Connection pool is sized too large, consuming database memory and OS resources
Idle connections accumulate and hit the database's max_connections limit
A connection leak causes pool exhaustion over hours of uptime
Multiple application instances share a database with limited connection capacity
Database failover causes all pooled connections to become stale
Horizontal scaling of application servers multiplies connection count beyond limits
Query latency has high variance with spikes correlating to new connection creation

Instructions

Understand connection overhead. Each PostgreSQL connection consumes ~5-10MB of RAM on the server and requires a TCP handshake plus TLS negotiation:

Connection establishment cost:
- TCP handshake:      ~0.5ms (same datacenter), ~30ms (cross-region)
- TLS handshake:      ~2ms (same datacenter), ~60ms (cross-region)
- PostgreSQL auth:    ~5ms (password), ~10ms (SCRAM-SHA-256)
- Process creation:   ~2ms (PostgreSQL forks per connection)
- Total:              ~10ms (local) to ~100ms (cross-region)

Per-connection memory (PostgreSQL):
- work_mem:           4MB default (per operation, can be higher)
- Backend memory:     ~5-10MB base
- OS resources:       file descriptors, kernel buffers

Size the pool correctly. The optimal pool size depends on workload type, not application concurrency:

Formula (Brian Brazeal / PostgreSQL wiki):
pool_size = (core_count * 2) + effective_spindle_count

For SSD-backed servers:
pool_size = core_count * 2 + 1

Example: 4-core database server with SSD
pool_size = (4 * 2) + 1 = 9 connections

For I/O-bound workloads (many external API calls during transactions):
pool_size = core_count * 4

The counterintuitive rule: smaller pools perform better. A pool of 10 connections on a 4-core server outperforms a pool of 200 because fewer connections mean less context switching, less lock contention, and less memory pressure.

Configure application-level pooling. Set min, max, idle timeout, and connection lifetime:

// Node.js with pg (node-postgres)
import { Pool } from 'pg';

const pool = new Pool({
  host: process.env.DB_HOST,
  max: 10, // maximum connections in pool
  min: 2, // minimum idle connections
  idleTimeoutMillis: 30000, // close idle connections after 30s
  connectionTimeoutMillis: 5000, // fail if connection not acquired in 5s
  maxLifetimeMillis: 1800000, // recycle connections every 30 min
});

// Always return connections to the pool
async function query(sql: string, params?: any[]) {
  const client = await pool.connect();
  try {
    return await client.query(sql, params);
  } finally {
    client.release(); // CRITICAL: always release back to pool
  }
}

// Prisma connection pool configuration
// In DATABASE_URL:
// postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=5

// Or in schema.prisma for more control:
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL + '?connection_limit=10&pool_timeout=5',
    },
  },
});

Deploy PgBouncer for external pooling. PgBouncer sits between the application and PostgreSQL, multiplexing many client connections onto fewer database connections:

; pgbouncer.ini
[databases]
myapp = host=pg-primary port=5432 dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

; Pool mode: transaction (recommended), session, or statement
pool_mode = transaction

; Pool sizing
default_pool_size = 20        ; connections per user/database pair
max_client_conn = 1000        ; total client connections accepted
max_db_connections = 50       ; max connections to actual PostgreSQL
reserve_pool_size = 5         ; extra connections for burst handling

; Timeouts
server_idle_timeout = 60      ; close idle server connections after 60s
client_idle_timeout = 300     ; close idle client connections after 5min
query_timeout = 30            ; kill queries running longer than 30s

Architecture:
[App Server 1] ──┐                     ┌── [PostgreSQL]
[App Server 2] ──┤── [PgBouncer] ──────┤   max_connections = 100
[App Server 3] ──┘   1000 clients      └── 50 actual connections
                     → 50 DB conns

Handle serverless connection challenges. Serverless functions create new connections on every cold start. Use external pooling:

// Neon serverless driver — uses WebSocket, handles pooling externally
import { neon } from '@neondatabase/serverless';
const sql = neon(process.env.DATABASE_URL);

// Supabase with connection pooling (uses PgBouncer internally)
// Use port 6543 for pooled connections instead of 5432
// postgresql://user:pass@host:6543/db?pgbouncer=true

// Prisma with Accelerate (managed connection pooling)
// DATABASE_URL="prisma://accelerate.prisma-data.net/?api_key=..."

// AWS RDS Proxy — managed connection pooling for Lambda
// Configure in AWS console, use proxy endpoint instead of RDS endpoint

Monitor pool health. Track pool metrics to diagnose issues before they cause outages:

// node-postgres pool events
pool.on('connect', () => {
  metrics.increment('db.pool.connect');
});
pool.on('acquire', () => {
  metrics.increment('db.pool.acquire');
});
pool.on('remove', () => {
  metrics.increment('db.pool.remove');
});
pool.on('error', (err) => {
  metrics.increment('db.pool.error');
  console.error('Pool error:', err.message);
});

// Periodic health check
setInterval(() => {
  metrics.gauge('db.pool.total', pool.totalCount);
  metrics.gauge('db.pool.idle', pool.idleCount);
  metrics.gauge('db.pool.waiting', pool.waitingCount);

  if (pool.waitingCount > 0) {
    console.warn(`${pool.waitingCount} queries waiting for connections`);
  }
}, 5000);

Handle connection failures gracefully. Connections break due to network issues, database restarts, or failovers:

// Retry with exponential backoff for transient connection failures
async function resilientQuery(sql: string, params?: any[], retries = 3) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await pool.query(sql, params);
    } catch (err: any) {
      const isTransient = [
        '57P01', // admin_shutdown
        '57P03', // cannot_connect_now
        '08006', // connection_failure
        '08001', // sqlclient_unable_to_establish_sqlconnection
      ].includes(err.code);

      if (!isTransient || attempt === retries - 1) throw err;
      await new Promise((r) => setTimeout(r, 100 * Math.pow(2, attempt)));
    }
  }
}

Details

Transaction Mode vs Session Mode (PgBouncer)

In transaction mode, PgBouncer assigns a server connection for the duration of a transaction, then returns it to the pool. This achieves the highest multiplexing ratio (1000 clients on 20 connections). Limitation: session-level features (SET, LISTEN/NOTIFY, prepared statements, temporary tables) do not work across transactions. In session mode, a server connection is held for the entire client session. Lower multiplexing but full feature compatibility. Most applications should use transaction mode.

Worked Example: Discord Serverless Migration

Discord migrated their message history API from dedicated servers to serverless functions, which caused connection storms — 10,000 concurrent Lambda invocations each opening a database connection. They deployed PgBouncer on a dedicated instance, configured with

pool_mode = transaction

default_pool_size = 25

, and

max_client_conn = 15000

. The PgBouncer instance multiplexes 15,000 Lambda connections onto 25 actual PostgreSQL connections. Result: PostgreSQL CPU dropped from 90% to 35%, and p99 latency decreased from 200ms to 45ms because the database was no longer thrashing with 10,000 concurrent connections.

Worked Example: Vercel Edge Functions + Neon

Vercel's edge functions run in globally distributed isolates that cannot maintain persistent TCP connections. Neon's serverless driver uses HTTP/WebSocket transport instead of raw TCP, with connection pooling handled by Neon's proxy layer. Each edge function invocation sends a query over HTTP — no connection setup, no pool management. The proxy maintains a warm pool of PostgreSQL connections. Result: cold-start database queries complete in 5-10ms (HTTP overhead) versus 50-100ms (TCP connection establishment), and the PostgreSQL server sees a stable 50 connections regardless of edge function concurrency.

Anti-Patterns

Setting pool size equal to max_connections. If pool_size matches max_connections, one application instance monopolizes all database connections, leaving zero for other services, migration scripts, or admin access. Always reserve 20-30% of max_connections for non-application use.

Not releasing connections back to the pool. A missing

client.release()

(or failed

finally

block) causes a connection leak. The pool gradually exhausts. Always release in a finally block or use pool.query() which handles acquisition and release automatically.

Opening a new pool per request. Creating a

new Pool()

in a request handler establishes a fresh pool for each request instead of reusing connections. Create the pool once at application startup and reuse it.

Oversizing the pool. A pool of 100 connections on a 4-core database causes context-switching overhead and lock contention that degrades throughput. Start with

(cores * 2) + 1

and increase only with load testing evidence.

Source

PostgreSQL wiki: Number Of Database Connections — https://wiki.postgresql.org/wiki/Number_Of_Database_Connections
PgBouncer documentation — https://www.pgbouncer.org/config.html
node-postgres: Pooling — https://node-postgres.com/features/pooling
Neon serverless driver — https://neon.tech/docs/serverless/serverless-driver

Process

Read the instructions and examples in this document.
Apply the patterns to your implementation, adapting to your specific context.
Verify your implementation against the details and edge cases listed above.

Harness Integration

Type: knowledge — this skill is a reference document, not a procedural workflow.
No tools or state — consumed as context by other skills and agents.

Success Criteria

Connection pool size follows the (cores * 2) + 1 formula as a starting point.
All database interactions use pooled connections with proper release handling.
External pooling (PgBouncer or managed proxy) is deployed for multi-instance or serverless architectures.
Pool metrics (total, idle, waiting) are monitored with alerts on exhaustion.
No connection leaks exist (verified by monitoring idle count stability over time).