Awesome-omni-skill sql-and-database

SQL writing and database engineering patterns, standards, and procedures. Use for designing database schemas, writing performant SQL queries, normalisation strategies, indexing, joins optimisation, locking mechanics, transactions, query debugging with EXPLAIN, and ORM integration. Applies to PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. Covers ORM usage with TypeORM, Prisma, Doctrine, Eloquent, Entity Framework, Hibernate, and GORM.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/sql-and-database" ~/.claude/skills/diegosouzapw-awesome-omni-skill-sql-and-database && rm -rf "$T"
manifest: skills/development/sql-and-database/SKILL.md
source content

SQL & Database Engineering

This skill provides standards, patterns, and procedures for database schema design, writing performant SQL, query debugging, and database engineering best practices. It is database-engine-agnostic with notes on engine-specific behavior where critical.

Guiding Principles

PrincipleApplication
Data Integrity FirstEnforce constraints at the database level (NOT NULL, UNIQUE, FK, CHECK). Never rely solely on application-level validation.
Least PrivilegeDatabase users/roles should have only the permissions they need. Application connections should never use the superuser account.
Explicit Over ImplicitAlways specify column lists in SELECT, INSERT, and JOIN clauses. Avoid
SELECT *
in production code.
Measure Before OptimisingUse
EXPLAIN ANALYZE
to identify actual bottlenecks before adding indexes or restructuring queries.
Schema as CodeAll schema changes go through versioned migration files. Never modify production schemas manually.

1. Database Schema Architecture Design

Naming Conventions

ElementConventionExample
Tables
snake_case
, plural nouns
order_items
,
user_addresses
Columns
snake_case
first_name
,
created_at
Primary keys
id
(preferred) or
<table_singular>_id
id
,
user_id
Foreign keys
<referenced_table_singular>_id
user_id
,
order_id
Indexes
idx_<table>_<columns>
idx_orders_user_id_status
Unique constraints
uq_<table>_<columns>
uq_users_email
Check constraints
chk_<table>_<description>
chk_orders_total_positive
Junction/pivot tables
<table1>_<table2>
(alphabetical)
products_tags
,
roles_users

Standard Columns

Every table MUST include:

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),  -- or BIGSERIAL for auto-increment
    -- ... domain columns ...
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

When soft deletes are required by business rules:

deleted_at TIMESTAMP WITH TIME ZONE DEFAULT NULL

Primary Key Strategy

StrategyWhen to UseTrade-offs
UUID v4
/
UUID v7
Distributed systems, microservices, public-facing IDsLarger storage, random UUIDs cause index fragmentation (prefer UUID v7 for ordered inserts)
ULID
When you need sortable unique IDs with good index localityNot natively supported in all databases
BIGSERIAL
/
IDENTITY
Single-database monoliths, internal IDsSequential, predictable (security concern if exposed), not portable across DB instances

Rule: Never expose auto-increment IDs in public APIs. Use UUIDs or ULIDs for external identifiers.

Data Types — Choose Precisely

NeedUseAvoid
Monetary values
NUMERIC(19,4)
or
DECIMAL(19,4)
FLOAT
,
DOUBLE
(precision loss)
Timestamps
TIMESTAMP WITH TIME ZONE
TIMESTAMP
without timezone
Boolean flags
BOOLEAN
TINYINT
,
CHAR(1)
Short text (name, email)
VARCHAR(n)
with appropriate limit
Unbounded
TEXT
for structured fields
Long text (descriptions)
TEXT
VARCHAR(10000)
Enums
VARCHAR
with CHECK constraint or native ENUM
Magic integers
JSON/semi-structured
JSONB
(PostgreSQL) or
JSON
Storing relational data as JSON
IP addresses
INET
(PostgreSQL) or
VARCHAR(45)
VARCHAR(15)
(IPv6 won't fit)

Enum Handling

Prefer CHECK constraints over native ENUM types for portability and ease of migration:

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    status VARCHAR(20) NOT NULL DEFAULT 'pending'
        CONSTRAINT chk_orders_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Adding a new enum value is a simple

ALTER TABLE ... DROP CONSTRAINT
+
ADD CONSTRAINT
, no table rewrite required.

Schema Design Checklist

Schema design checklist:
- [ ] Every table has a primary key (UUID or BIGSERIAL)
- [ ] Every table has created_at and updated_at columns
- [ ] All foreign keys have corresponding indexes
- [ ] All columns have appropriate NOT NULL constraints
- [ ] Unique constraints are defined where business rules require uniqueness
- [ ] CHECK constraints enforce valid value ranges and enums
- [ ] Naming follows snake_case conventions consistently
- [ ] Data types are chosen precisely (no FLOAT for money, no TEXT for emails)
- [ ] Soft delete (deleted_at) is used only where business rules require it
- [ ] No business logic is embedded in database triggers or stored procedures (keep logic in the application layer)

2. Normalisation Strategies

Normal Forms Reference

Normal FormRuleViolation ExampleFix
1NFEvery column holds atomic (indivisible) values. No repeating groups.
tags: "php,sql,go"
in a single column
Create a separate
tags
table with a junction table
2NF1NF + every non-key column depends on the entire primary key (relevant for composite keys)
order_items(order_id, product_id, product_name)
product_name
depends only on
product_id
Move
product_name
to the
products
table
3NF2NF + no transitive dependencies (non-key column depends on another non-key column)
employees(id, department_id, department_name)
department_name
depends on
department_id
, not on
id
Move
department_name
to a
departments
table
BCNF3NF + every determinant is a candidate keyRare in practice; address when composite keys create functional dependency issuesDecompose the table so every determinant is a key

When to Use Each Level

Target 3NF by default. This eliminates redundancy while keeping the schema manageable.

Use 2NF only as an intermediate step when refactoring legacy schemas — never as a design target.

Use BCNF when you have composite primary keys with overlapping candidate keys (uncommon in application databases).

Practical Normalisation Example

Unnormalised (0NF):

orders:
| order_id | customer_name | customer_email      | items                          |
|----------|---------------|---------------------|--------------------------------|
| 1        | John Doe      | john@example.com    | Widget x2, Gadget x1           |

1NF — Atomic values, no repeating groups:

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL,
    customer_email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL REFERENCES orders(id),
    product_name VARCHAR(100) NOT NULL,
    quantity INT NOT NULL CHECK (quantity > 0)
);

2NF — Remove partial dependencies (product details depend on product, not order):

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(19,4) NOT NULL
);

CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL REFERENCES orders(id),
    product_id BIGINT NOT NULL REFERENCES products(id),
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price NUMERIC(19,4) NOT NULL  -- snapshot of price at order time
);

3NF — Remove transitive dependencies (customer depends on customer, not order):

CREATE TABLE customers (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL UNIQUE
);

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(id),
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Strategic Denormalisation

Denormalise only when you have measured performance data (via

EXPLAIN ANALYZE
) proving that joins are the bottleneck. Common valid cases:

PatternWhen to UseExample
Materialised/cached columnsFrequently read aggregate that is expensive to compute on every query
orders.total_amount
computed from
order_items
and stored on the order row
Snapshot columnsValues that must be preserved at a point in time even if the source changes
order_items.unit_price
(price at time of purchase, not current product price)
Read-optimised viewsReporting or analytics queries that span many tablesMaterialised views refreshed periodically
Search/filter columnsColumns derived from related tables used heavily in WHERE clauses
orders.customer_country
duplicated from
customers.country
for filtering

Rules for denormalisation:

  • Always document why the denormalisation exists (code comment + migration description).
  • Ensure the denormalised data is kept in sync (application-level updates, triggers as last resort, or materialised view refresh).
  • Treat denormalised data as a cache — the normalised source remains the source of truth.

3. Relationships & Foreign Keys

Relationship Types

One-to-Many (most common):

-- A customer has many orders
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE RESTRICT,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Many-to-Many (junction table):

-- Products can have many tags, tags can belong to many products
CREATE TABLE products_tags (
    product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    tag_id BIGINT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (product_id, tag_id)
);

CREATE INDEX idx_products_tags_tag_id ON products_tags(tag_id);

One-to-One:

-- Each user has exactly one profile
CREATE TABLE user_profiles (
    user_id BIGINT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
    bio TEXT,
    avatar_url VARCHAR(500)
);

ON DELETE / ON UPDATE Actions

ActionMeaningWhen to Use
RESTRICT
(default)
Prevent deletion if referenced rows existDefault for most business entities (prevent accidental data loss)
CASCADE
Delete/update child rows when parent is deleted/updatedJunction tables, dependent child records that have no meaning without parent
SET NULL
Set FK to NULL when parent is deletedOptional relationships where the child can exist independently
SET DEFAULT
Set FK to its default valueRare — use when a fallback reference makes business sense
NO ACTION
Same as RESTRICT but checked at end of transactionWhen deferred constraint checking is needed

Rule: Default to

RESTRICT
. Use
CASCADE
only on junction tables and tightly coupled child tables. Always explicitly specify the action — never rely on implicit defaults.

Self-Referencing Relationships

-- Categories with parent-child hierarchy
CREATE TABLE categories (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id BIGINT REFERENCES categories(id) ON DELETE SET NULL
);

CREATE INDEX idx_categories_parent_id ON categories(parent_id);

For deep hierarchies (trees), consider:

  • Adjacency list (above) — simple, good for shallow trees
  • Materialised path
    path VARCHAR(500)
    storing
    /1/5/12/
    — good for read-heavy trees
  • Nested sets — complex to maintain but fast for subtree queries
  • Closure table — separate table storing all ancestor-descendant pairs — balanced read/write performance

4. Indexes

Index Types

TypeSyntax (PostgreSQL)Use Case
B-tree (default)
CREATE INDEX idx ON table(column)
Equality and range queries (
=
,
<
,
>
,
BETWEEN
,
ORDER BY
)
Hash
CREATE INDEX idx ON table USING hash(column)
Equality-only lookups (rare — B-tree covers this equally well)
GIN
CREATE INDEX idx ON table USING gin(column)
Full-text search, JSONB containment, array operations
GiST
CREATE INDEX idx ON table USING gist(column)
Geometric/spatial data, range types, nearest-neighbor queries
BRIN
CREATE INDEX idx ON table USING brin(column)
Very large tables with naturally ordered data (timestamps, sequential IDs)

Index Strategy Rules

Always index:

  • Foreign key columns (not auto-indexed in PostgreSQL/MySQL InnoDB)
  • Columns frequently used in
    WHERE
    clauses
  • Columns used in
    ORDER BY
    on large tables
  • Columns used in
    JOIN
    conditions

Consider composite indexes for:

  • Queries that filter on multiple columns together
  • Covering indexes that satisfy a query entirely from the index

Avoid over-indexing:

  • Each index adds overhead to
    INSERT
    ,
    UPDATE
    , and
    DELETE
    operations
  • Indexes consume storage
  • Review and remove unused indexes periodically

Composite Index Column Order

The order of columns in a composite index matters. Place columns in this priority:

  1. Equality conditions first (
    WHERE status = 'active'
    )
  2. Range conditions second (
    WHERE created_at > '2025-01-01'
    )
  3. Sort columns last (
    ORDER BY created_at DESC
    )
-- Query: WHERE status = 'active' AND created_at > '2025-01-01' ORDER BY created_at DESC
-- Optimal index:
CREATE INDEX idx_orders_status_created_at ON orders(status, created_at DESC);

Partial Indexes

Index only the rows you actually query:

-- Only index active orders (if most queries filter for active)
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
    WHERE deleted_at IS NULL;

Unique Indexes

Use unique indexes to enforce business constraints:

-- Email must be unique, but only for non-deleted users
CREATE UNIQUE INDEX uq_users_email_active ON users(email)
    WHERE deleted_at IS NULL;

Index Monitoring

Periodically check for unused indexes:

-- PostgreSQL: find unused indexes
SELECT
    schemaname, relname AS table_name, indexrelname AS index_name,
    idx_scan AS times_used, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Non-Blocking Index Operations (CONCURRENTLY)

Standard

CREATE INDEX
and
DROP INDEX
acquire locks that block writes on the table for the duration of the operation. On large or heavily-used tables this can cause downtime.

Always use

CONCURRENTLY
when creating or dropping indexes on tables that serve live traffic:

-- Creating an index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- Dropping an index without blocking writes
DROP INDEX CONCURRENTLY idx_orders_customer_id;

Key constraints and caveats:

AspectDetail
Database support
CONCURRENTLY
is PostgreSQL-specific. Other engines have their own online indexing alternatives: MySQL/MariaDB use
ALGORITHM=INPLACE, LOCK=NONE
; SQL Server uses
WITH (ONLINE = ON)
; Oracle uses
ONLINE
keyword. Consult engine-specific docs for constraints and limitations
Transaction blocks
CREATE INDEX CONCURRENTLY
and
DROP INDEX CONCURRENTLY
cannot run inside a transaction block — avoid wrapping them in
BEGIN
/
COMMIT
Migration frameworksMost migration tools wrap statements in a transaction by default. Disable this for concurrent index operations (e.g.,
disable_ddl_transaction!
in Rails,
atomic = False
in Django, separate migration step in Flyway)
Failed buildsIf
CREATE INDEX CONCURRENTLY
fails, it leaves behind an invalid index. Check with
\d table_name
and drop the invalid index before retrying
Unique indexes
CREATE UNIQUE INDEX CONCURRENTLY
performs an extra table scan — takes longer but still avoids blocking writes
Build timeConcurrent builds are slower than regular ones because they require multiple table passes

When to skip

CONCURRENTLY
:

  • Initial schema setup or empty tables (no live traffic to block)
  • During a maintenance window with no active connections
  • Test/development environments

5. Writing Performant SQL Queries

Query Writing Rules

RuleDoDon't
Specify columns
SELECT id, name, email FROM users
SELECT * FROM users
Use parameterised queries
WHERE id = $1
/
WHERE id = ?
WHERE id = '
+ userId +
'
(SQL injection risk)
Limit results
LIMIT 100
or paginate
Unbounded
SELECT
on large tables
Filter early
WHERE
clause narrows rows before joins
Joining full tables then filtering
Use EXISTS over IN for subqueries
WHERE EXISTS (SELECT 1 FROM ...)
WHERE id IN (SELECT id FROM ...)
on large sets
Avoid functions on indexed columns
WHERE created_at >= '2025-01-01'
WHERE DATE(created_at) = '2025-01-01'
(kills index)
Use UNION ALL over UNION
UNION ALL
when duplicates are acceptable
UNION
(forces sort + dedup)

JOIN Optimisation

Join Types and When to Use

Join TypeReturnsUse When
INNER JOIN
Only matching rows from both tablesYou need data that exists in both tables
LEFT JOIN
All rows from left table + matches from right (NULL if no match)You need all records from the primary table regardless of match
RIGHT JOIN
All rows from right table + matches from leftRarely used — rewrite as LEFT JOIN for readability
CROSS JOIN
Cartesian product of both tablesGenerating combinations (e.g., all products × all regions)
LATERAL JOIN
Correlated subquery for each rowTop-N per group, complex per-row calculations

Join Performance Guidelines

-- GOOD: Join on indexed columns, filter early
SELECT o.id, o.created_at, c.name
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'active'
  AND o.created_at >= '2025-01-01'
ORDER BY o.created_at DESC
LIMIT 50;

-- BAD: Joining on expressions, filtering late
SELECT *
FROM orders o
INNER JOIN customers c ON LOWER(c.email) = LOWER(o.customer_email)
WHERE YEAR(o.created_at) = 2025;

Rules:

  • Always join on indexed columns (typically primary keys and foreign keys).
  • Place the most restrictive
    WHERE
    conditions on the driving table to reduce the row set early.
  • Avoid joining on computed expressions — create a persisted computed column or a functional index if needed.
  • When joining many tables, consider whether a subquery or CTE might be clearer and equally performant.

Pagination Strategies

Offset-based pagination (simple but degrades on large offsets):

SELECT id, name, email
FROM users
WHERE deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 20 OFFSET 1000;
-- Database must scan and discard 1000 rows

Keyset/cursor-based pagination (performant at any depth):

-- First page
SELECT id, name, email, created_at
FROM users
WHERE deleted_at IS NULL
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page (use last row's values as cursor)
SELECT id, name, email, created_at
FROM users
WHERE deleted_at IS NULL
  AND (created_at, id) < ('2025-02-01T10:00:00Z', '550e8400-e29b-41d4-a716-446655440000')
ORDER BY created_at DESC, id DESC
LIMIT 20;

Rule: Use offset-based pagination for admin/back-office UIs with moderate data volumes (< 100k rows). Use keyset pagination for APIs, infinite scrolling, and large datasets.

Aggregation Best Practices

-- GOOD: Filter before aggregating
SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
  AND created_at >= '2025-01-01'
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY total_spent DESC;

-- BAD: Aggregating everything then filtering in application code
SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
FROM orders
GROUP BY customer_id;
-- Then filtering in PHP/Node/Java... wasted database work

Common Table Expressions (CTEs)

Use CTEs for readability. Be aware that in some databases (PostgreSQL < 12), CTEs act as optimisation fences:

-- Readable multi-step query
WITH active_customers AS (
    SELECT id, name
    FROM customers
    WHERE status = 'active'
      AND deleted_at IS NULL
),
recent_orders AS (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    WHERE created_at >= NOW() - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT ac.id, ac.name, COALESCE(ro.order_count, 0) AS recent_orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ro.customer_id = ac.id
ORDER BY recent_orders DESC;

Window Functions

Use window functions for ranking, running totals, and per-group calculations without collapsing rows:

-- Rank orders by total within each customer
SELECT
    customer_id,
    id AS order_id,
    total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total DESC) AS rank,
    SUM(total) OVER (PARTITION BY customer_id) AS customer_total
FROM orders
WHERE status = 'completed';

Batch Operations

For large data modifications, batch to avoid long locks and transaction log bloat:

-- Instead of one massive DELETE:
-- DELETE FROM logs WHERE created_at < '2024-01-01';  -- locks millions of rows

-- Batch delete:
DO $$
DECLARE
    rows_deleted INT;
BEGIN
    LOOP
        DELETE FROM logs
        WHERE id IN (
            SELECT id FROM logs
            WHERE created_at < '2024-01-01'
            LIMIT 5000
        );
        GET DIAGNOSTICS rows_deleted = ROW_COUNT;
        EXIT WHEN rows_deleted = 0;
        COMMIT;
    END LOOP;
END $$;

6. Transactions

ACID Properties

PropertyMeaningPractical Impact
AtomicityAll operations in a transaction succeed or all are rolled backPartial updates never reach the database
ConsistencyTransaction moves the database from one valid state to anotherConstraints are enforced at commit time
IsolationConcurrent transactions don't interfere with each otherDetermined by the isolation level
DurabilityCommitted data survives system failuresData is persisted to disk on commit

Transaction Isolation Levels

LevelDirty ReadsNon-Repeatable ReadsPhantom ReadsUse Case
READ UNCOMMITTED
YesYesYesAlmost never — only for dirty analytics on non-critical data
READ COMMITTED
NoYesYesDefault for PostgreSQL. Good for most OLTP workloads
REPEATABLE READ
NoNoYes*Financial calculations, inventory checks within a transaction
SERIALIZABLE
NoNoNoCritical financial operations, booking systems with strict consistency

*PostgreSQL's

REPEATABLE READ
also prevents phantom reads (it uses snapshot isolation internally).

Rule: Use

READ COMMITTED
by default. Escalate to
REPEATABLE READ
or
SERIALIZABLE
only for specific operations that require stronger guarantees, and handle serialisation failures with retry logic.

Transaction Best Practices

-- GOOD: Short, focused transaction
BEGIN;
    UPDATE accounts SET balance = balance - 100.00 WHERE id = 1;
    UPDATE accounts SET balance = balance + 100.00 WHERE id = 2;
    INSERT INTO transactions (from_id, to_id, amount) VALUES (1, 2, 100.00);
COMMIT;

Rules:

  • Keep transactions as short as possible. Long transactions hold locks and block other operations.
  • Never perform external I/O (HTTP calls, file operations) inside a transaction.
  • Always handle transaction failures: catch exceptions, rollback, and optionally retry.
  • Use
    SAVEPOINT
    for partial rollbacks within a larger transaction when needed.
  • Acquire locks in a consistent order across all transactions to prevent deadlocks (e.g., always lock by ascending ID).

Savepoints

BEGIN;
    INSERT INTO orders (customer_id, total) VALUES (1, 250.00);
    SAVEPOINT before_items;

    INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 99, 1);
    -- If this fails:
    ROLLBACK TO SAVEPOINT before_items;
    -- Order is still inserted, items are rolled back

COMMIT;

7. Locking Mechanics

Lock Types

Lock TypeSQLScopeBehaviour
Row-level shared (FOR SHARE)
SELECT ... FOR SHARE
RowOther transactions can read but not modify the row
Row-level exclusive (FOR UPDATE)
SELECT ... FOR UPDATE
RowOther transactions cannot read (with FOR UPDATE) or modify the row
Table-level
LOCK TABLE ... IN <mode> MODE
TableApplies to the entire table — use sparingly
Advisory locks
pg_advisory_lock(key)
Application-definedApplication-level coordination, not tied to specific rows

Optimistic Locking

Optimistic locking assumes conflicts are rare. It checks for conflicts at write time using a version column:

-- Schema
ALTER TABLE products ADD COLUMN version INT NOT NULL DEFAULT 1;

-- Read: fetch current version
SELECT id, name, price, stock, version FROM products WHERE id = 42;
-- Application receives: version = 3

-- Update: include version check
UPDATE products
SET stock = stock - 1, version = version + 1, updated_at = NOW()
WHERE id = 42 AND version = 3;

-- If 0 rows affected → another transaction modified the row → handle conflict (retry or error)

When to use:

  • Low contention (concurrent writes to the same row are rare)
  • Read-heavy workloads
  • User-facing forms where edits happen seconds/minutes apart
  • Distributed systems where holding database locks across requests is impractical

ORM support:

ORMImplementation
TypeORM
@VersionColumn()
decorator
PrismaManual implementation via
version
field and conditional update
Doctrine (PHP)
@ORM\Version
annotation on an integer or datetime column
Eloquent (Laravel)Manual implementation or
laravel-optimistic-locking
package
Entity Framework (.NET)
[Timestamp]
attribute or
IsRowVersion()
in Fluent API
Hibernate (Java)
@Version
annotation
GORM (Go)Built-in optimistic locking with
gorm:"column:version"
tag

Pessimistic Locking

Pessimistic locking assumes conflicts are likely. It acquires locks at read time to prevent concurrent modification:

-- Lock the row immediately — other transactions block until this transaction commits
BEGIN;
    SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE;
    -- ... perform business logic ...
    UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 42;
COMMIT;

Variants:

-- FOR UPDATE SKIP LOCKED — skip already-locked rows (useful for job queues)
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;

-- FOR UPDATE NOWAIT — fail immediately if the row is locked (instead of blocking)
SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE NOWAIT;
-- Throws an error immediately if the row is locked by another transaction

When to use:

  • High contention (concurrent writes to the same row are likely)
  • Critical sections that must not have conflicts (inventory decrements, booking seats)
  • Short-lived transactions where holding a lock briefly is acceptable
  • Job queues and task processing (
    SKIP LOCKED
    )

Optimistic vs Pessimistic — Decision Guide

FactorOptimisticPessimistic
Conflict frequencyLow (< 1% of operations)High or unpredictable
Read/write ratioRead-heavyWrite-heavy on contested resources
Transaction durationCan be long (no locks held)Must be short (locks block others)
Failure handlingRetry on conflictBlock until lock is released
Distributed systemsPreferred (no lock coordination needed)Difficult (requires sticky sessions or distributed locks)
User experienceMay show "conflict, please retry"May show loading/waiting if contention is high

Deadlock Prevention

  1. Consistent lock ordering: Always acquire locks in the same order (e.g., by ascending primary key).
  2. Short transactions: Minimise the time locks are held.
  3. Lock timeout: Set
    lock_timeout
    to avoid indefinite blocking.
  4. Detect and retry: Handle deadlock exceptions and retry the transaction.
-- PostgreSQL: set a lock timeout
SET lock_timeout = '5s';

-- MySQL: set innodb_lock_wait_timeout per session
SET innodb_lock_wait_timeout = 5;

8. Query Debugging with EXPLAIN

EXPLAIN Basics

-- Show the query plan (does not execute the query)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- Show the query plan AND execute the query (shows actual timings)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Full diagnostic output (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'active'
ORDER BY o.created_at DESC
LIMIT 20;

Reading EXPLAIN Output

Key fields to examine:

FieldWhat It Tells You
Seq ScanFull table scan — often a sign of missing index
Index ScanUsing an index to find rows — this is what you want
Index Only ScanData served entirely from the index (covering index) — best case
Bitmap Index ScanIndex used to build a bitmap, then table scanned — common for OR conditions / multiple indexes
Nested LoopFor each row in outer table, scan inner table — efficient for small outer sets
Hash JoinBuild hash table from one side, probe with the other — efficient for large equijoins
Merge JoinBoth sides sorted, then merged — efficient when both inputs are pre-sorted
SortExplicit sort operation — check if an index could eliminate this
Rows (estimated)Planner's estimate of rows processed at this step
Actual Rows (ANALYZE only)Real number of rows processed — compare with estimated to find planner misestimates
Buffers (ANALYZE + BUFFERS)Shared/local buffers hit (cache) vs read (disk) — high reads = cache miss, consider prewarming or more memory
CostArbitrary units comparing relative cost of plan steps — lower is better

Common Performance Problems and Fixes

EXPLAIN SymptomLikely CauseFix
Seq Scan
on large table
Missing indexAdd appropriate index
Seq Scan
despite index existing
Query uses a function on the indexed columnRewrite query or create a functional index
Estimated rows ≠ actual rows (off by 10x+)Stale statisticsRun
ANALYZE table_name;
Sort
with high cost
No index matching
ORDER BY
Add index covering sort columns
Nested Loop
with large inner table
Planner chose wrong join strategyVerify indexes on join columns; consider increasing
work_mem
Hash Join
spilling to disk
work_mem
too small for the hash table
Increase
work_mem
for the session or globally
High
Buffers: read
count
Data not in cacheIncrease
shared_buffers
or ensure table fits in memory; check for bloated tables

Iterative Query Optimisation Process

Query optimisation process:
1. [ ] Identify the slow query (application logs, pg_stat_statements, slow query log)
2. [ ] Run EXPLAIN ANALYZE on the query
3. [ ] Identify the most expensive node in the plan
4. [ ] Determine root cause (missing index, bad statistics, inefficient join, sort)
5. [ ] Apply ONE fix at a time
6. [ ] Re-run EXPLAIN ANALYZE and compare
7. [ ] Repeat until acceptable performance is reached
8. [ ] Verify the fix doesn't degrade other queries

MySQL-Specific EXPLAIN Notes

-- MySQL uses a different EXPLAIN format
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- Key MySQL EXPLAIN columns:
-- type: ALL (full scan) → index → range → ref → eq_ref → const → system (best to worst)
-- possible_keys: indexes the optimiser considered
-- key: index actually used
-- rows: estimated rows scanned
-- Extra: "Using filesort", "Using temporary" are red flags

-- MySQL 8.0+ supports EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

9. SQL Writing Best Practices

Formatting Standards

-- GOOD: Readable, consistent formatting
SELECT
    u.id,
    u.email,
    u.first_name,
    u.last_name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
    AND o.status = 'completed'
WHERE u.deleted_at IS NULL
  AND u.created_at >= '2025-01-01'
GROUP BY u.id, u.email, u.first_name, u.last_name
HAVING COUNT(o.id) > 0
ORDER BY lifetime_value DESC
LIMIT 50;

Formatting rules:

  • Keywords in UPPERCASE:
    SELECT
    ,
    FROM
    ,
    WHERE
    ,
    JOIN
    ,
    ORDER BY
    ,
    GROUP BY
  • One column per line in
    SELECT
    list for queries with more than 3 columns
  • AND
    /
    OR
    at the beginning of the line, indented
  • Table aliases: short but meaningful (
    u
    for
    users
    ,
    o
    for
    orders
    )
  • Align
    JOIN ... ON
    conditions consistently
  • Always terminate statements with
    ;

Security — Preventing SQL Injection

Never concatenate user input into SQL strings. Always use parameterised queries:

-- GOOD: Parameterised (prepared statement)
-- PostgreSQL / Node.js
SELECT * FROM users WHERE email = $1;

-- MySQL / PHP
SELECT * FROM users WHERE email = ?;

ORM equivalents (all safe by default):

// TypeORM — safe
const user = await userRepository.findOne({ where: { email: userInput } });

// Prisma — safe
const user = await prisma.user.findUnique({ where: { email: userInput } });
// Doctrine — safe
$user = $repository->findOneBy(['email' => $userInput]);

// Eloquent — safe
$user = User::where('email', $userInput)->first();
// Entity Framework — safe
var user = await context.Users.FirstOrDefaultAsync(u => u.Email == userInput);
// Hibernate — safe
User user = session.createQuery("FROM User u WHERE u.email = :email", User.class)
    .setParameter("email", userInput)
    .uniqueResult();
// GORM — safe
var user User
db.Where("email = ?", userInput).First(&user)

DANGER: Raw queries with string concatenation:

// TypeORM — DANGEROUS if not parameterised
const users = await dataSource.query(`SELECT * FROM users WHERE email = '${userInput}'`); // SQL INJECTION!

// TypeORM — SAFE raw query
const users = await dataSource.query('SELECT * FROM users WHERE email = $1', [userInput]);

NULL Handling

-- WRONG: = NULL never matches
SELECT * FROM users WHERE deleted_at = NULL;

-- CORRECT: Use IS NULL / IS NOT NULL
SELECT * FROM users WHERE deleted_at IS NULL;

-- Use COALESCE for default values
SELECT COALESCE(u.nickname, u.first_name, 'Anonymous') AS display_name
FROM users u;

-- NULL-safe comparison (when comparing two nullable columns)
-- PostgreSQL:
SELECT * FROM t1 JOIN t2 ON t1.val IS NOT DISTINCT FROM t2.val;
-- MySQL:
SELECT * FROM t1 JOIN t2 ON t1.val <=> t2.val;

UPSERT Patterns

-- PostgreSQL: INSERT ... ON CONFLICT
INSERT INTO products (sku, name, price, updated_at)
VALUES ('WIDGET-001', 'Widget', 19.99, NOW())
ON CONFLICT (sku)
DO UPDATE SET
    name = EXCLUDED.name,
    price = EXCLUDED.price,
    updated_at = NOW();

-- MySQL: INSERT ... ON DUPLICATE KEY UPDATE
INSERT INTO products (sku, name, price, updated_at)
VALUES ('WIDGET-001', 'Widget', 19.99, NOW())
ON DUPLICATE KEY UPDATE
    name = VALUES(name),
    price = VALUES(price),
    updated_at = NOW();

Avoiding Common Anti-Patterns

Anti-PatternProblemFix
SELECT *
Fetches unnecessary columns, breaks when schema changesExplicitly list needed columns
N+1 queries
1 query to fetch parents + N queries for childrenUse
JOIN
or batch loading (
WHERE id IN (...)
)
WHERE column != value
with index
!=
/
<>
cannot use index efficiently
Restructure as positive condition or check if full scan is acceptable
LIKE '%value%'
Leading wildcard prevents index useUse full-text search (
tsvector
in PostgreSQL,
FULLTEXT
in MySQL)
ORDER BY RAND()
Scans and sorts entire tableUse a sampled approach:
WHERE id >= (SELECT FLOOR(RANDOM() * MAX(id)) FROM table) LIMIT 1
Implicit type conversion
WHERE varchar_col = 123
may skip index
Match types explicitly:
WHERE varchar_col = '123'
Correlated subquery in SELECTExecutes subquery for every rowRewrite as
JOIN
or window function
Missing
LIMIT
on exploratory queries
Accidentally fetching millions of rowsAlways add
LIMIT
when exploring data

10. ORM Integration Guidelines

ORM Selection by Stack

StackPrimary ORMAlternative
PHP (Symfony)Doctrine ORM
PHP (Laravel)Eloquent ORM
Node.js (NestJS)TypeORM or MikroORMPrisma
Node.js (General)PrismaTypeORM, Drizzle
.NETEntity Framework CoreDapper (micro-ORM for raw SQL)
Java (Spring)Hibernate / Spring Data JPAjOOQ (for SQL-first approach)
GoGORMsqlx (for raw SQL), Ent

ORM Best Practices

1. Always review generated SQL. Enable query logging in development to inspect what the ORM generates. Inefficient ORM usage can produce catastrophic queries.

// TypeORM: enable query logging
const dataSource = new DataSource({
    logging: ['query', 'error'],
    // ...
});

// Prisma: enable query logging
const prisma = new PrismaClient({
    log: ['query', 'warn', 'error'],
});
// Doctrine: enable SQL logger
$configuration->setSQLLogger(new \Doctrine\DBAL\Logging\EchoSQLLogger());

// Laravel: enable query log
DB::enableQueryLog();
// ... run queries ...
dd(DB::getQueryLog());

2. Solve N+1 queries with eager loading.

// TypeORM — eager loading with relations
const orders = await orderRepository.find({
    relations: ['customer', 'items', 'items.product'],
    where: { status: 'active' },
});

// Prisma — include related data
const orders = await prisma.order.findMany({
    where: { status: 'active' },
    include: {
        customer: true,
        items: { include: { product: true } },
    },
});
// Eloquent — eager loading
$orders = Order::with(['customer', 'items.product'])
    ->where('status', 'active')
    ->get();

// Doctrine — DQL with joins
$orders = $em->createQuery('
    SELECT o, c, i, p
    FROM App\Entity\Order o
    JOIN o.customer c
    JOIN o.items i
    JOIN i.product p
    WHERE o.status = :status
')->setParameter('status', 'active')
  ->getResult();
// Entity Framework — eager loading
var orders = await context.Orders
    .Include(o => o.Customer)
    .Include(o => o.Items)
        .ThenInclude(i => i.Product)
    .Where(o => o.Status == "active")
    .ToListAsync();
// Hibernate / Spring Data JPA — fetch join
@Query("SELECT o FROM Order o " +
       "JOIN FETCH o.customer " +
       "JOIN FETCH o.items i " +
       "JOIN FETCH i.product " +
       "WHERE o.status = :status")
List<Order> findActiveWithDetails(@Param("status") String status);
// GORM — preload
var orders []Order
db.Preload("Customer").Preload("Items.Product").
    Where("status = ?", "active").
    Find(&orders)

3. Use raw SQL for complex queries. When ORM abstractions produce inefficient SQL or cannot express the query you need, drop to raw SQL:

// TypeORM — raw query
const results = await dataSource.query(`
    SELECT u.id, u.email, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed'
    WHERE u.deleted_at IS NULL
    GROUP BY u.id, u.email
    HAVING COUNT(o.id) > 5
    ORDER BY order_count DESC
    LIMIT $1
`, [50]);

// Prisma — raw query
const results = await prisma.$queryRaw`
    SELECT u.id, u.email, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed'
    WHERE u.deleted_at IS NULL
    GROUP BY u.id, u.email
    HAVING COUNT(o.id) > 5
    ORDER BY order_count DESC
    LIMIT ${50}
`;

4. ORM Migration Workflow.

  • Generate migrations from entity/model changes, never edit the database directly.
  • Review generated migration SQL before applying — ORMs sometimes generate suboptimal DDL.
  • Test migrations against a copy of production data to catch issues (data truncation, long-running locks).

5. Repository Pattern with ORMs. Keep all database queries in repository classes. Services should never directly use the ORM's query builder or entity manager:

Service → Repository Interface → Repository Implementation (uses ORM)

This keeps business logic decoupled from the data access mechanism and makes testing easier (mock the repository interface).


11. Database Maintenance & Monitoring

Statistics and Vacuuming (PostgreSQL)

-- Update table statistics for the planner
ANALYZE orders;

-- Update statistics for all tables
ANALYZE;

-- Check for table bloat (dead tuples)
SELECT relname, n_dead_tup, n_live_tup,
       ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

Slow Query Identification

-- PostgreSQL: enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries by total time
SELECT
    calls,
    ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
    ROUND(mean_exec_time::NUMERIC, 2) AS mean_ms,
    ROUND(max_exec_time::NUMERIC, 2) AS max_ms,
    LEFT(query, 200) AS query_preview
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- MySQL: enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- queries taking > 1 second

Connection Pooling

  • Always use connection pooling in production. Direct connections are expensive to create.
  • PostgreSQL: use PgBouncer or built-in pooling in the ORM/driver.
  • MySQL: use ProxySQL or driver-level pooling.
  • Set pool size based on:
    pool_size = (core_count * 2) + effective_spindle_count
    (formula from PostgreSQL wiki).
  • Monitor for connection exhaustion — set
    max_connections
    appropriately and alert when pool utilisation exceeds 80%.

Implementation Procedure

When writing SQL or designing a database schema, follow this workflow:

Implementation progress:
- [ ] Step 1: Understand the data requirements
- [ ] Step 2: Design the schema (entities, relationships, constraints)
- [ ] Step 3: Choose normalisation level and document any denormalisation decisions
- [ ] Step 4: Define indexes based on expected query patterns
- [ ] Step 5: Write the migration(s)
- [ ] Step 6: Write the queries (raw SQL or ORM)
- [ ] Step 7: Run EXPLAIN ANALYZE on critical queries
- [ ] Step 8: Implement appropriate locking and transaction strategy
- [ ] Step 9: Review ORM-generated SQL (if applicable)
- [ ] Step 10: Test with realistic data volumes

Step 1: Understand the data requirements Read the task description and acceptance criteria. Identify the entities, their attributes, and the relationships between them. Clarify cardinality (one-to-many, many-to-many).

Step 2: Design the schema Create tables following the naming conventions and standard columns defined in this skill. Define relationships with appropriate foreign keys and ON DELETE actions.

Step 3: Choose normalisation level Target 3NF by default. Document any intentional denormalisation with a clear justification tied to measured performance needs or snapshot requirements.

Step 4: Define indexes Identify the primary query patterns and create indexes to support them. Index all foreign keys. Create composite indexes for multi-column filters and sorts.

Step 5: Write the migration(s) Generate migration files with both

up
and
down
methods. Review the generated DDL before applying.

Step 6: Write the queries Write queries following the performance and formatting rules in this skill. Use parameterised queries. Avoid

SELECT *
.

Step 7: Run EXPLAIN ANALYZE Verify critical query plans. Look for sequential scans on large tables, missing indexes, and planner misestimates.

Step 8: Implement locking and transaction strategy Choose optimistic or pessimistic locking based on the contention profile. Define transaction boundaries and appropriate isolation levels.

Step 9: Review ORM-generated SQL Enable query logging and inspect generated queries for N+1 problems, unnecessary joins, or missing eager loading.

Step 10: Test with realistic data volumes Seed the development database with production-scale data. Verify that queries perform within acceptable time limits under load.

Connected Skills

  • backend-api-development
    — for API layer patterns, repository pattern, and database handling guidance
  • architecture-design
    — for designing data models as part of broader system architecture
  • code-review
    — for validating SQL quality, index coverage, and query performance
  • technical-context-discovery
    — for establishing database conventions and existing patterns before designing