git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/db-acid-in-practice" ~/.claude/skills/intense-visions-harness-engineering-db-acid-in-practice-291dcd && rm -rf "$T"
agents/skills/claude-code/db-acid-in-practice/SKILL.mdACID in Practice
The mechanisms that make ACID guarantees real: Write-Ahead Logging ensures atomicity and durability, fsync ensures persistence to physical media, and crash recovery replays the WAL to restore a consistent state.
When to Use
- Tuning the durability vs. performance tradeoff for your workload
- Understanding what happens during crash recovery and why it takes time
- Debugging data loss or corruption after an unclean shutdown
- Configuring WAL settings for replication, backup, or performance
- Evaluating whether relaxed durability is acceptable for specific data types
Instructions
Write-Ahead Logging (WAL)
The core invariant: changes are written to the WAL before they are written to data files. This is what makes crash recovery possible.
The write path in PostgreSQL:
- Client issues
COMMIT - PostgreSQL writes the transaction's changes to WAL buffers
- WAL buffers are flushed to disk (
on WAL segment file)fsync
returns success to the clientCOMMIT- Later, during a checkpoint, dirty pages are written from shared buffers to data files
The data files may be out of date at any moment -- that is fine, because the WAL contains everything needed to reconstruct the current state.
Inspecting WAL state:
-- Current WAL position SELECT pg_current_wal_lsn(); -- WAL level (minimal, replica, logical) SHOW wal_level; -- Current WAL segment file SELECT pg_walfile_name(pg_current_wal_lsn());
fsync and Durability
fsync forces the operating system to flush file data from kernel buffers to physical storage. Without fsync, a power failure can lose data that the OS reported as "written."
The synchronous_commit tradeoff:
-- Default: full durability, every COMMIT waits for WAL fsync SET synchronous_commit = on; -- Relaxed: COMMIT returns before WAL fsync, ~5x throughput gain -- Risk: last few milliseconds of committed transactions may be lost on crash SET synchronous_commit = off;
When
synchronous_commit = off is acceptable:
- Session tracking data (losing a few seconds of session updates is tolerable)
- Analytics event ingestion (events can be re-sent)
- Non-critical logging
When it is NOT acceptable:
- Financial transactions (payments, transfers, account balances)
- Inventory management (stock counts must be exact)
- Authentication state (password changes, token revocations)
Worked Example: Per-Table Durability Tuning
-- Critical table: full durability (default) BEGIN; SET LOCAL synchronous_commit = on; UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT; -- Non-critical table: relaxed durability for throughput BEGIN; SET LOCAL synchronous_commit = off; INSERT INTO page_views (user_id, path, viewed_at) VALUES (42, '/dashboard', NOW()); COMMIT;
SET LOCAL scopes the setting to the current transaction only -- it does not affect other connections.
Crash Recovery
When PostgreSQL starts after a crash, it replays WAL records from the last completed checkpoint:
- Read the control file to find the last checkpoint location
- Read WAL from that checkpoint forward
- Apply (redo) all WAL records to bring data files up to date
- Any transaction that was not committed is effectively rolled back (its changes are never applied to data files)
Inspecting checkpoint state:
# Show last checkpoint location and recovery info pg_controldata /var/lib/postgresql/data | grep -i checkpoint
Recovery time is proportional to the amount of WAL generated since the last checkpoint. Two settings control this:
-- Maximum time between checkpoints (default: 5 minutes) SHOW checkpoint_timeout; -- Maximum WAL size before forcing a checkpoint (default: 1GB) SHOW max_wal_size;
Shorter checkpoint intervals mean faster recovery but more I/O during normal operation.
Checkpoints
A checkpoint writes all dirty buffers to data files and advances the recovery start point:
-- Manual checkpoint (rarely needed) CHECKPOINT; -- Enable checkpoint logging for monitoring ALTER SYSTEM SET log_checkpoints = on; SELECT pg_reload_conf();
Checkpoint tuning for production:
-- Spread checkpoint I/O over time (0.0-1.0, default 0.9) SHOW checkpoint_completion_target;
A
checkpoint_completion_target of 0.9 means PostgreSQL tries to complete the checkpoint I/O within 90% of the checkpoint interval, avoiding I/O spikes.
Worked Example: Diagnosing Slow Recovery
A production server crashes and takes 15 minutes to recover. Investigation:
-- Check settings SHOW checkpoint_timeout; -- 30min (too long!) SHOW max_wal_size; -- 8GB (too large!)
With a 30-minute checkpoint interval and 8GB of WAL, recovery must replay up to 8GB of changes. Reducing to
checkpoint_timeout = 5min and max_wal_size = 1GB brings recovery time under 2 minutes at the cost of more frequent (but smaller) checkpoint I/O.
Anti-Patterns
-
Disabling fsync for benchmarks and forgetting to re-enable.
provides dramatic benchmark improvements but guarantees data loss on crash. Never run production withfsync = off
.fsync = off -
Setting
globally. This should be per-transaction or per-session, not a system-wide default. Critical transactions must have durability guarantees.synchronous_commit = off -
Assuming cloud storage means durability is free. EBS volumes can still lose in-flight writes during instance failures. Cloud databases (RDS, Aurora) add replication on top of local fsync, but the application must still use proper transactions.
-
Ignoring checkpoint warnings in logs. PostgreSQL logs warnings when checkpoints happen too frequently (
). This meanscheckpoints are occurring too frequently
is too small for the write workload.max_wal_size
Details
PostgreSQL full_page_writes
After a checkpoint, the first modification to any page writes the entire page to WAL (not just the change). This protects against partial page writes -- a crash mid-write could leave a page half-old, half-new. The full page image in WAL ensures recovery can restore the complete page.
-- Default: on (do not disable in production) SHOW full_page_writes;
Disabling
full_page_writes reduces WAL volume but risks unrecoverable data corruption if a crash occurs during a partial page write.
MySQL Callout: InnoDB Redo Log and Doublewrite Buffer
MySQL's InnoDB uses a redo log (analogous to PostgreSQL's WAL) and a doublewrite buffer:
- Redo log: Circular log files (
,ib_logfile0
). Same write-ahead principle as PostgreSQL WAL.ib_logfile1 - Doublewrite buffer: Before writing a dirty page to the tablespace, InnoDB writes it to a sequential doublewrite area. If a crash occurs during the page write, InnoDB recovers the page from the doublewrite buffer. PostgreSQL uses
for the same purpose.full_page_writes
Key differences: InnoDB redo logs are fixed-size circular buffers (must be sized correctly or writes stall). PostgreSQL WAL segments are created as needed and archived or recycled.
Cloud Durability
AWS RDS and Aurora replicate WAL to 3 Availability Zones before acknowledging a commit. This provides durability beyond what single-node
fsync offers -- the data survives even if the entire AZ is lost. However, the replication adds latency (~1-3ms per commit compared to local-only fsync).
Aurora specifically decouples storage from compute: 6 copies of data across 3 AZs, with a quorum write of 4/6. This means Aurora can lose an entire AZ plus one additional storage node and still serve reads and writes.
Real-World Case Study: Analytics Ingest Pipeline
A real-time analytics platform ingested 50K events/second. With default
synchronous_commit = on, the database sustained 12K inserts/second -- a 4x gap. Analysis showed events were individually committed (auto-commit mode) and each waited for WAL fsync.
Two changes closed the gap: (1) batch inserts in groups of 100 within a single transaction (reducing fsync calls 100x), and (2)
SET synchronous_commit = off for the analytics connection pool (acceptable because missed events could be re-sent from the message queue). Throughput reached 60K inserts/second with sub-second crash recovery window.
Source
- PostgreSQL WAL Introduction
- PostgreSQL WAL Configuration
- Rogov, E. "PostgreSQL 14 Internals" (2023), Chapters 9-10
Process
- Read the WAL, fsync, and checkpoint mechanisms described in this document.
- Evaluate your workload to determine the appropriate durability vs. performance tradeoff for each data type.
- Configure WAL and checkpoint settings based on your recovery time objectives, and verify with controlled crash testing.
Harness Integration
- Type: knowledge -- this skill is a reference document, not a procedural workflow.
- No tools or state -- consumed as context by other skills and agents.
- related_skills: db-acid-properties, db-cap-theorem
Success Criteria
- WAL and checkpoint settings are configured appropriately for the workload's durability requirements.
- Relaxed durability (
) is applied only to non-critical data paths, never globally.synchronous_commit = off