Harness-engineering db-acid-in-practice

ACID in Practice

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/db-acid-in-practice" ~/.claude/skills/intense-visions-harness-engineering-db-acid-in-practice-291dcd && rm -rf "$T"

manifest: agents/skills/claude-code/db-acid-in-practice/SKILL.md

source content

ACID in Practice

The mechanisms that make ACID guarantees real: Write-Ahead Logging ensures atomicity and durability, fsync ensures persistence to physical media, and crash recovery replays the WAL to restore a consistent state.

When to Use

Tuning the durability vs. performance tradeoff for your workload
Understanding what happens during crash recovery and why it takes time
Debugging data loss or corruption after an unclean shutdown
Configuring WAL settings for replication, backup, or performance
Evaluating whether relaxed durability is acceptable for specific data types

Instructions

Write-Ahead Logging (WAL)

The core invariant: changes are written to the WAL before they are written to data files. This is what makes crash recovery possible.

The write path in PostgreSQL:

Client issues
```
COMMIT
```
PostgreSQL writes the transaction's changes to WAL buffers
WAL buffers are flushed to disk (
```
fsync
```
on WAL segment file)
```
COMMIT
```
returns success to the client
Later, during a checkpoint, dirty pages are written from shared buffers to data files

The data files may be out of date at any moment -- that is fine, because the WAL contains everything needed to reconstruct the current state.

Inspecting WAL state:

-- Current WAL position
SELECT pg_current_wal_lsn();

-- WAL level (minimal, replica, logical)
SHOW wal_level;

-- Current WAL segment file
SELECT pg_walfile_name(pg_current_wal_lsn());

fsync and Durability

fsync

forces the operating system to flush file data from kernel buffers to physical storage. Without

fsync

, a power failure can lose data that the OS reported as "written."

The synchronous_commit tradeoff:

-- Default: full durability, every COMMIT waits for WAL fsync
SET synchronous_commit = on;

-- Relaxed: COMMIT returns before WAL fsync, ~5x throughput gain
-- Risk: last few milliseconds of committed transactions may be lost on crash
SET synchronous_commit = off;

When

synchronous_commit = off

is acceptable:

Session tracking data (losing a few seconds of session updates is tolerable)
Analytics event ingestion (events can be re-sent)
Non-critical logging

When it is NOT acceptable:

Financial transactions (payments, transfers, account balances)
Inventory management (stock counts must be exact)
Authentication state (password changes, token revocations)

Worked Example: Per-Table Durability Tuning

-- Critical table: full durability (default)
BEGIN;
SET LOCAL synchronous_commit = on;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- Non-critical table: relaxed durability for throughput
BEGIN;
SET LOCAL synchronous_commit = off;
INSERT INTO page_views (user_id, path, viewed_at)
  VALUES (42, '/dashboard', NOW());
COMMIT;

SET LOCAL

scopes the setting to the current transaction only -- it does not affect other connections.

Crash Recovery

When PostgreSQL starts after a crash, it replays WAL records from the last completed checkpoint:

Read the control file to find the last checkpoint location
Read WAL from that checkpoint forward
Apply (redo) all WAL records to bring data files up to date
Any transaction that was not committed is effectively rolled back (its changes are never applied to data files)

Inspecting checkpoint state:

# Show last checkpoint location and recovery info
pg_controldata /var/lib/postgresql/data | grep -i checkpoint

Recovery time is proportional to the amount of WAL generated since the last checkpoint. Two settings control this:

-- Maximum time between checkpoints (default: 5 minutes)
SHOW checkpoint_timeout;

-- Maximum WAL size before forcing a checkpoint (default: 1GB)
SHOW max_wal_size;

Shorter checkpoint intervals mean faster recovery but more I/O during normal operation.

Checkpoints

A checkpoint writes all dirty buffers to data files and advances the recovery start point:

-- Manual checkpoint (rarely needed)
CHECKPOINT;

-- Enable checkpoint logging for monitoring
ALTER SYSTEM SET log_checkpoints = on;
SELECT pg_reload_conf();

Checkpoint tuning for production:

-- Spread checkpoint I/O over time (0.0-1.0, default 0.9)
SHOW checkpoint_completion_target;

checkpoint_completion_target

of 0.9 means PostgreSQL tries to complete the checkpoint I/O within 90% of the checkpoint interval, avoiding I/O spikes.

Worked Example: Diagnosing Slow Recovery

A production server crashes and takes 15 minutes to recover. Investigation:

-- Check settings
SHOW checkpoint_timeout;    -- 30min (too long!)
SHOW max_wal_size;          -- 8GB (too large!)

With a 30-minute checkpoint interval and 8GB of WAL, recovery must replay up to 8GB of changes. Reducing to

checkpoint_timeout = 5min

and

max_wal_size = 1GB

brings recovery time under 2 minutes at the cost of more frequent (but smaller) checkpoint I/O.

Anti-Patterns

Disabling fsync for benchmarks and forgetting to re-enable.
```
fsync = off
```
provides dramatic benchmark improvements but guarantees data loss on crash. Never run production with
```
fsync = off
```
.
Setting
```
synchronous_commit = off
```
globally. This should be per-transaction or per-session, not a system-wide default. Critical transactions must have durability guarantees.
Assuming cloud storage means durability is free. EBS volumes can still lose in-flight writes during instance failures. Cloud databases (RDS, Aurora) add replication on top of local fsync, but the application must still use proper transactions.
Ignoring checkpoint warnings in logs. PostgreSQL logs warnings when checkpoints happen too frequently (
```
checkpoints are occurring too frequently
```
). This means
```
max_wal_size
```
is too small for the write workload.

Details

PostgreSQL full_page_writes

After a checkpoint, the first modification to any page writes the entire page to WAL (not just the change). This protects against partial page writes -- a crash mid-write could leave a page half-old, half-new. The full page image in WAL ensures recovery can restore the complete page.

-- Default: on (do not disable in production)
SHOW full_page_writes;

Disabling

full_page_writes

reduces WAL volume but risks unrecoverable data corruption if a crash occurs during a partial page write.

MySQL Callout: InnoDB Redo Log and Doublewrite Buffer

MySQL's InnoDB uses a redo log (analogous to PostgreSQL's WAL) and a doublewrite buffer:

Redo log: Circular log files (
```
ib_logfile0
```
,
```
ib_logfile1
```
). Same write-ahead principle as PostgreSQL WAL.
Doublewrite buffer: Before writing a dirty page to the tablespace, InnoDB writes it to a sequential doublewrite area. If a crash occurs during the page write, InnoDB recovers the page from the doublewrite buffer. PostgreSQL uses
```
full_page_writes
```
for the same purpose.

Key differences: InnoDB redo logs are fixed-size circular buffers (must be sized correctly or writes stall). PostgreSQL WAL segments are created as needed and archived or recycled.

Cloud Durability

AWS RDS and Aurora replicate WAL to 3 Availability Zones before acknowledging a commit. This provides durability beyond what single-node

fsync

offers -- the data survives even if the entire AZ is lost. However, the replication adds latency (~1-3ms per commit compared to local-only fsync).

Aurora specifically decouples storage from compute: 6 copies of data across 3 AZs, with a quorum write of 4/6. This means Aurora can lose an entire AZ plus one additional storage node and still serve reads and writes.

Real-World Case Study: Analytics Ingest Pipeline

A real-time analytics platform ingested 50K events/second. With default

synchronous_commit = on

, the database sustained 12K inserts/second -- a 4x gap. Analysis showed events were individually committed (auto-commit mode) and each waited for WAL fsync.

Two changes closed the gap: (1) batch inserts in groups of 100 within a single transaction (reducing fsync calls 100x), and (2)

SET synchronous_commit = off

for the analytics connection pool (acceptable because missed events could be re-sent from the message queue). Throughput reached 60K inserts/second with sub-second crash recovery window.

Source

PostgreSQL WAL Introduction
PostgreSQL WAL Configuration
Rogov, E. "PostgreSQL 14 Internals" (2023), Chapters 9-10

Process

Read the WAL, fsync, and checkpoint mechanisms described in this document.
Evaluate your workload to determine the appropriate durability vs. performance tradeoff for each data type.
Configure WAL and checkpoint settings based on your recovery time objectives, and verify with controlled crash testing.

Harness Integration

Type: knowledge -- this skill is a reference document, not a procedural workflow.
No tools or state -- consumed as context by other skills and agents.
related_skills: db-acid-properties, db-cap-theorem

Success Criteria

WAL and checkpoint settings are configured appropriately for the workload's durability requirements.
Relaxed durability (
```
synchronous_commit = off
```
) is applied only to non-critical data paths, never globally.