Learn-skills.dev api-database-cockroachdb

CockroachDB distributed SQL -- transaction retries, multi-region, online schema changes, follower reads, PostgreSQL compatibility gaps

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/agents-inc/skills/api-database-cockroachdb" ~/.claude/skills/neversight-learn-skills-dev-api-database-cockroachdb && rm -rf "$T"

manifest: data/skills-md/agents-inc/skills/api-database-cockroachdb/SKILL.md

source content

CockroachDB Patterns

Quick Guide: CockroachDB connects via the standard
pg
driver (PostgreSQL wire protocol). The single most important difference from PostgreSQL: transaction retries are mandatory. CockroachDB's serializable isolation means any transaction can fail with SQLSTATE
40001
-- your application MUST catch this and retry the entire transaction. Use
UUID
with
gen_random_uuid()
for primary keys (never
SERIAL
-- sequential IDs cause distributed hotspots). DDL runs as online schema changes in background jobs and cannot be inside explicit transactions. Use
AS OF SYSTEM TIME
for follower reads to reduce latency in multi-region deployments.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type
, named constants)

(You MUST implement transaction retry logic for SQLSTATE

errors -- CockroachDB WILL return serialization errors under normal operation, unlike PostgreSQL where they are rare)

(You MUST use

UUID

with
gen_random_uuid()
for primary keys -- NEVER use
SERIAL
or sequential IDs, which cause distributed write hotspots)

(You MUST NOT put DDL statements inside explicit transactions -- most DDL runs as background jobs and can fail at COMMIT time with a partially applied state.

CREATE TABLE

/
CREATE INDEX
are exceptions but the safest practice is always: one DDL statement per implicit transaction)

(You MUST use

Pool

from
pg
for all database access -- same as PostgreSQL, but be aware that each node in the cluster is a valid connection target)

</critical_requirements>

Examples

Core Patterns -- Pool setup, parameterized queries, transaction retry logic, error handling
Multi-Region & Performance -- Locality, survival goals, follower reads, AS OF SYSTEM TIME
Schema & Operations -- Online schema changes, IMPORT INTO, CHANGEFEED, cockroach CLI

Additional resources:

reference.md -- PostgreSQL compatibility gaps, error codes, type differences, production checklist

Auto-detection: CockroachDB, cockroachdb, cockroach, CRDB, crdb, cockroach_restart, SAVEPOINT cockroach_restart, 40001, serialization_failure, retry transaction, restart transaction, gen_random_uuid, unique_rowid, AS OF SYSTEM TIME, follower_read_timestamp, CHANGEFEED, CREATE CHANGEFEED, IMPORT INTO, cockroach sql, cockroach start, multi-region, survival goal, zone survival, region survival, locality, REGIONAL BY ROW

When to use:

Direct SQL queries against CockroachDB via the
```
pg
```
driver
Distributed transactions requiring serializable isolation
Multi-region database deployments with locality-aware reads/writes
Applications migrating from PostgreSQL to CockroachDB
Change data capture with CHANGEFEED
Bulk data loading with IMPORT INTO

Key patterns covered:

Transaction retry logic (SQLSTATE 40001 handling with exponential backoff)
UUID primary keys with gen_random_uuid() (hotspot avoidance)
AS OF SYSTEM TIME for follower reads and historical queries
Multi-region configuration (locality, survival goals, regional tables)
Online schema changes (DDL behavior differences from PostgreSQL)
PostgreSQL compatibility gaps (what does NOT work)

When NOT to use:

You need an ORM or query builder -- use your ORM/query builder skill instead
You are targeting standard PostgreSQL without CockroachDB -- use the PostgreSQL skill
You need features CockroachDB lacks (advisory locks, full stored procedure support, CREATE DOMAIN)

Philosophy

CockroachDB is a distributed SQL database that uses the PostgreSQL wire protocol. The core principle: write PostgreSQL-compatible SQL, but design for distribution.

Core principles:

Retry everything -- Serializable isolation means any transaction can be aborted by CockroachDB to resolve conflicts. Your code MUST handle SQLSTATE
```
40001
```
and retry the full transaction. This is not an edge case -- it happens under normal load.
Distribute evenly -- Sequential primary keys (
```
SERIAL
```
, auto-increment) create write hotspots because CockroachDB sorts data by primary key across ranges. Use
```
UUID
```
with
```
gen_random_uuid()
```
to scatter writes across the cluster.
DDL is async -- Schema changes run as background jobs. They cannot be wrapped in explicit transactions. Plan migrations accordingly -- one DDL statement at a time in production.
Read from followers -- Use
```
AS OF SYSTEM TIME
```
to read slightly stale data from the nearest replica instead of always hitting the leaseholder. This is the single biggest latency optimization in multi-region deployments.
PostgreSQL, mostly -- CockroachDB supports most PostgreSQL syntax and the
```
pg
```
driver works directly. But certain features are missing or behave differently. Know the gaps before you hit them in production.

</philosophy>

Core Patterns

Pattern 1: Connection Pool Setup

CockroachDB uses the standard

pg

driver. Pool setup is nearly identical to PostgreSQL, but the connection string points to a CockroachDB node (or load balancer). See examples/core.md for full configuration.

// Good Example - CockroachDB pool with error handling
import pg from "pg";

const POOL_MAX_CLIENTS = 20;
const IDLE_TIMEOUT_MS = 30_000;
const CONNECTION_TIMEOUT_MS = 5_000;

function createPool(): pg.Pool {
  const pool = new pg.Pool({
    connectionString: process.env.DATABASE_URL,
    // Example: postgresql://user:pass@crdb-lb:26257/mydb?sslmode=verify-full
    max: POOL_MAX_CLIENTS,
    idleTimeoutMillis: IDLE_TIMEOUT_MS,
    connectionTimeoutMillis: CONNECTION_TIMEOUT_MS,
  });

  pool.on("error", (err) => {
    console.error("Unexpected idle client error:", err.message);
  });

  return pool;
}

export { createPool };

Why good: Standard pg Pool works unmodified, named constants, error handler prevents process crash, CockroachDB default port is 26257 (not 5432)

// Bad Example - SERIAL primary key
await pool.query(`
  CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
  )
`);
// SERIAL creates sequential IDs via unique_rowid()
// which causes write hotspots on a single range

Why bad: Sequential IDs from SERIAL/unique_rowid() cluster writes on one range, creating a hotspot that defeats CockroachDB's distributed architecture

Pattern 2: Transaction Retry Logic (MANDATORY)

CockroachDB's serializable isolation means transactions can fail with SQLSTATE

under normal operation. You MUST catch this and retry. See examples/core.md for the full retry helper.

// Good Example - Transaction with retry logic
const CRDB_SERIALIZATION_FAILURE = "40001";
const MAX_RETRIES = 5;
const BASE_DELAY_MS = 50;

async function withRetry<T>(
  pool: pg.Pool,
  operation: (client: pg.PoolClient) => Promise<T>,
): Promise<T> {
  for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
    const client = await pool.connect();
    try {
      await client.query("BEGIN");
      const result = await operation(client);
      await client.query("COMMIT");
      return result;
    } catch (err) {
      await client.query("ROLLBACK");
      if (isCrdbRetryError(err) && attempt < MAX_RETRIES) {
        const delay =
          BASE_DELAY_MS * Math.pow(2, attempt) + Math.random() * BASE_DELAY_MS;
        await new Promise((resolve) => setTimeout(resolve, delay));
        continue;
      }
      throw err;
    } finally {
      client.release();
    }
  }
  throw new Error("Retry loop exited unexpectedly");
}

Why good: Catches 40001 errors specifically, exponential backoff with jitter prevents thundering herd, fresh client per attempt, bounded retries, releases client in finally

// Bad Example - No retry logic (WILL fail in production)
const client = await pool.connect();
try {
  await client.query("BEGIN");
  await client.query(
    "UPDATE accounts SET balance = balance - $1 WHERE id = $2",
    [100, fromId],
  );
  await client.query(
    "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
    [100, toId],
  );
  await client.query("COMMIT");
} catch (err) {
  await client.query("ROLLBACK");
  throw err; // 40001 errors bubble up as application failures!
} finally {
  client.release();
}

Why bad: No retry logic -- serialization errors (40001) propagate as unhandled application failures. In CockroachDB, these are EXPECTED under normal concurrent load, not exceptional conditions.

Pattern 3: UUID Primary Keys

CockroachDB distributes data across ranges sorted by primary key. Sequential IDs create hotspots. See examples/core.md for table design patterns.

-- Good Example - UUID primary key with gen_random_uuid()
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Why good: UUIDs distribute writes evenly across all ranges in the cluster,

gen_random_uuid()

is built-in and generates UUIDv4

-- Bad Example - SERIAL primary key
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- SERIAL uses unique_rowid() which generates time-ordered IDs
-- All recent inserts land on the same range -> hotspot

Why bad: SERIAL/unique_rowid() generates roughly time-ordered values, causing all concurrent inserts to target the same range, which bottlenecks on a single node

Pattern 4: AS OF SYSTEM TIME (Follower Reads)

Read slightly stale data from the nearest replica for dramatically lower latency in multi-region setups. See examples/multi-region.md for full patterns.

// Good Example - Follower read with built-in function
const result = await pool.query<ProductRow>(
  "SELECT id, name, price FROM products WHERE category = $1 AS OF SYSTEM TIME follower_read_timestamp()",
  [category],
);

Why good:

follower_read_timestamp()

automatically picks a safe staleness window, query can be served by any replica (nearest to the client), no leaseholder round-trip

When to use: Read-heavy dashboards, product catalogs, search results -- anywhere slightly stale data (at least 4.2 seconds) is acceptable.

When not to use: Reads that must reflect the latest write (e.g., reading immediately after an INSERT to confirm it succeeded).

Pattern 5: Online Schema Changes

CockroachDB DDL runs as background jobs -- NOT inside transactions. See examples/schema-ops.md for migration patterns.

// Good Example - DDL executed as individual statements
await pool.query("ALTER TABLE users ADD COLUMN phone TEXT");
// Runs as a background schema change job
// Table remains fully available for reads and writes during the change

Why good: DDL runs without table locks, no downtime, table available throughout

// Bad Example - DDL inside a transaction
const client = await pool.connect();
try {
  await client.query("BEGIN");
  await client.query("ALTER TABLE users ADD COLUMN phone TEXT");
  await client.query("ALTER TABLE users ADD COLUMN address TEXT");
  await client.query("COMMIT");
  // Most DDL can fail at COMMIT time with a partially applied state
} finally {
  client.release();
}

Why bad: Most DDL in explicit transactions can fail at COMMIT time with a partially applied state.

CREATE TABLE

CREATE INDEX

are exceptions, but the safest practice is always one DDL statement per implicit transaction.

Pattern 6: CHANGEFEED (Change Data Capture)

Stream row-level changes to external sinks. See examples/schema-ops.md for full CHANGEFEED patterns.

-- Good Example - CHANGEFEED to Kafka
CREATE CHANGEFEED FOR TABLE orders
  INTO 'kafka://broker:9092'
  WITH updated, resolved = '10s';

-- Sinkless changefeed (streams to SQL client)
CREATE CHANGEFEED FOR TABLE orders WITH updated;

Why good: Real-time CDC without polling, supports Kafka/webhook/cloud storage sinks,

resolved

timestamps enable downstream consumers to know data completeness

When to use: Event-driven architectures, data replication to analytics systems, audit logging, cache invalidation.

</patterns>

<decision_framework>

Decision Framework

Primary Key Strategy

What type of primary key?
+-- Need human-readable IDs? -> UUID with gen_random_uuid() + separate readable slug column
+-- Need globally unique IDs? -> UUID with gen_random_uuid() (recommended default)
+-- Migrating from PostgreSQL SERIAL? -> Switch to UUID, backfill existing data
+-- Need monotonically increasing? -> DO NOT -- use UUID. If you absolutely must, use
|                                      SERIAL but understand the hotspot tradeoff.

Isolation Level Choice

Which isolation level?
+-- Need strongest guarantees? -> SERIALIZABLE (default, recommended)
|   +-- Your app handles 40001 retries? -> Yes, use SERIALIZABLE
|   +-- Cannot implement retry logic? -> Consider READ COMMITTED
+-- Analytics / read-heavy workload? -> READ COMMITTED (no retry needed)
+-- Background jobs with loose consistency? -> READ COMMITTED

Read Strategy

How fresh must the data be?
+-- Must see latest writes? -> Normal read (hits leaseholder)
+-- Stale by a few seconds is fine? -> AS OF SYSTEM TIME follower_read_timestamp()
+-- Need a specific historical snapshot? -> AS OF SYSTEM TIME '<timestamp>'
+-- Exporting data for analytics? -> AS OF SYSTEM TIME with follower reads

Schema Change Strategy

How to run DDL?
+-- Single column add/drop? -> Run as individual statement (no transaction)
+-- Multiple related changes? -> Run sequentially, one statement at a time
+-- Need to roll back DDL? -> You cannot -- DDL is not transactional. Plan carefully.
+-- Index creation on large table? -> All indexes are created online by default (do NOT use CONCURRENTLY -- it errors)

</decision_framework>

<red_flags>

RED FLAGS

High Priority Issues:

No transaction retry logic for 40001 errors -- CockroachDB WILL return these under normal concurrent load. Without retries, your application randomly fails under traffic.
Using SERIAL or sequential primary keys -- creates a write hotspot on a single range, bottlenecking the entire cluster on one node.
DDL inside explicit transactions -- most DDL can fail at COMMIT time with a partially applied state.
```
CREATE TABLE
```
/
```
CREATE INDEX
```
are exceptions, but the safest practice is one DDL per implicit transaction.
Using advisory locks (
```
pg_advisory_lock
```
,
```
pg_try_advisory_lock
```
) -- CockroachDB does NOT implement them. They are defined as no-op stubs that silently do nothing.

Medium Priority Issues:

Not using
```
AS OF SYSTEM TIME
```
for read-heavy workloads in multi-region -- forces all reads to hit the leaseholder, adding cross-region latency.
Running multiple DDL statements simultaneously in production -- each schema change consumes resources. Run them sequentially.
Assuming PostgreSQL
```
LISTEN
```
/
```
NOTIFY
```
works -- CockroachDB does NOT support
```
LISTEN
```
/
```
NOTIFY
```
. Use
```
CHANGEFEED
```
for real-time change streaming.
Using
```
CREATE DOMAIN
```
-- not supported in CockroachDB. Use
```
CHECK
```
constraints or application-level validation.

Common Mistakes:

Connecting to port 5432 instead of 26257 -- CockroachDB default port is 26257.
Expecting
```
SERIAL
```
to produce gapless sequential IDs -- CockroachDB's
```
unique_rowid()
```
produces time-ordered but non-sequential values with gaps.
Forgetting that
```
numeric
```
/
```
decimal
```
types return as strings in the
```
pg
```
driver (same behavior as PostgreSQL).
Wrapping retry logic around individual statements instead of the entire transaction -- you must retry the FULL transaction, not just the failed statement.
Using
```
SELECT ... FOR UPDATE
```
without understanding it acquires locks across the cluster -- it works but has higher latency than in PostgreSQL.

Gotchas & Edge Cases:

```
40001
```
errors can occur on
```
COMMIT
```
, not just on individual statements. Your retry loop must catch errors from
```
COMMIT
```
too.
CockroachDB's
```
SAVEPOINT cockroach_restart
```
is a special savepoint name that enables the advanced retry protocol. Regular savepoints (
```
SAVEPOINT my_savepoint
```
) work normally for nested rollback.
Temporary tables exist but are experimental (
```
SET experimental_enable_temp_tables = 'on'
```
). Creating many temp objects degrades DDL performance.

READ COMMITTED

isolation is GA and enabled by default (

sql.txn.read_committed_isolation.enabled = true

), but transactions still default to

SERIALIZABLE

. Set per-transaction with

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED

, per-session with

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED

, or per-database with

ALTER DATABASE db SET default_transaction_isolation = 'read committed'

CockroachDB's
```
pg_catalog
```
and
```
information_schema
```
are populated but may have differences from PostgreSQL -- some system tables have extra columns, some are missing columns.
```
IMPORT INTO
```
takes the target table offline during the import. The table cannot serve reads or writes until the import completes.
Changefeed payload is limited. Complex JOINs or aggregations cannot be expressed directly in changefeed queries -- one table per changefeed.
Float overflow returns
```
Infinity
```
in CockroachDB (PostgreSQL returns an error).
Bitwise operator precedence differs from PostgreSQL. Use explicit parentheses.

</red_flags>

<critical_reminders>

CRITICAL REMINDERS

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type
, named constants)

(You MUST implement transaction retry logic for SQLSTATE

errors -- CockroachDB WILL return serialization errors under normal operation, unlike PostgreSQL where they are rare)

(You MUST use

UUID

with
gen_random_uuid()
for primary keys -- NEVER use
SERIAL
or sequential IDs, which cause distributed write hotspots)

(You MUST NOT put DDL statements inside explicit transactions -- most DDL runs as background jobs and can fail at COMMIT time with a partially applied state.

CREATE TABLE

/
CREATE INDEX
are exceptions but the safest practice is always: one DDL statement per implicit transaction)

(You MUST use

Pool

from
pg
for all database access -- same as PostgreSQL, but be aware that each node in the cluster is a valid connection target)

Failure to follow these rules will cause transaction failures under load, write hotspots that defeat distribution, DDL errors, and application crashes.

</critical_reminders>