Awesome-omni-skill sql-and-database
SQL writing and database engineering patterns, standards, and procedures. Use for designing database schemas, writing performant SQL queries, normalisation strategies, indexing, joins optimisation, locking mechanics, transactions, query debugging with EXPLAIN, and ORM integration. Applies to PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. Covers ORM usage with TypeORM, Prisma, Doctrine, Eloquent, Entity Framework, Hibernate, and GORM.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/sql-and-database" ~/.claude/skills/diegosouzapw-awesome-omni-skill-sql-and-database && rm -rf "$T"
skills/development/sql-and-database/SKILL.mdSQL & Database Engineering
This skill provides standards, patterns, and procedures for database schema design, writing performant SQL, query debugging, and database engineering best practices. It is database-engine-agnostic with notes on engine-specific behavior where critical.
Guiding Principles
| Principle | Application |
|---|---|
| Data Integrity First | Enforce constraints at the database level (NOT NULL, UNIQUE, FK, CHECK). Never rely solely on application-level validation. |
| Least Privilege | Database users/roles should have only the permissions they need. Application connections should never use the superuser account. |
| Explicit Over Implicit | Always specify column lists in SELECT, INSERT, and JOIN clauses. Avoid in production code. |
| Measure Before Optimising | Use to identify actual bottlenecks before adding indexes or restructuring queries. |
| Schema as Code | All schema changes go through versioned migration files. Never modify production schemas manually. |
1. Database Schema Architecture Design
Naming Conventions
| Element | Convention | Example |
|---|---|---|
| Tables | , plural nouns | , |
| Columns | | , |
| Primary keys | (preferred) or | , |
| Foreign keys | | , |
| Indexes | | |
| Unique constraints | | |
| Check constraints | | |
| Junction/pivot tables | (alphabetical) | , |
Standard Columns
Every table MUST include:
CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- or BIGSERIAL for auto-increment -- ... domain columns ... created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() );
When soft deletes are required by business rules:
deleted_at TIMESTAMP WITH TIME ZONE DEFAULT NULL
Primary Key Strategy
| Strategy | When to Use | Trade-offs |
|---|---|---|
/ | Distributed systems, microservices, public-facing IDs | Larger storage, random UUIDs cause index fragmentation (prefer UUID v7 for ordered inserts) |
| When you need sortable unique IDs with good index locality | Not natively supported in all databases |
/ | Single-database monoliths, internal IDs | Sequential, predictable (security concern if exposed), not portable across DB instances |
Rule: Never expose auto-increment IDs in public APIs. Use UUIDs or ULIDs for external identifiers.
Data Types — Choose Precisely
| Need | Use | Avoid |
|---|---|---|
| Monetary values | or | , (precision loss) |
| Timestamps | | without timezone |
| Boolean flags | | , |
| Short text (name, email) | with appropriate limit | Unbounded for structured fields |
| Long text (descriptions) | | |
| Enums | with CHECK constraint or native ENUM | Magic integers |
| JSON/semi-structured | (PostgreSQL) or | Storing relational data as JSON |
| IP addresses | (PostgreSQL) or | (IPv6 won't fit) |
Enum Handling
Prefer CHECK constraints over native ENUM types for portability and ease of migration:
CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), status VARCHAR(20) NOT NULL DEFAULT 'pending' CONSTRAINT chk_orders_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')), created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() );
Adding a new enum value is a simple
ALTER TABLE ... DROP CONSTRAINT + ADD CONSTRAINT, no table rewrite required.
Schema Design Checklist
Schema design checklist: - [ ] Every table has a primary key (UUID or BIGSERIAL) - [ ] Every table has created_at and updated_at columns - [ ] All foreign keys have corresponding indexes - [ ] All columns have appropriate NOT NULL constraints - [ ] Unique constraints are defined where business rules require uniqueness - [ ] CHECK constraints enforce valid value ranges and enums - [ ] Naming follows snake_case conventions consistently - [ ] Data types are chosen precisely (no FLOAT for money, no TEXT for emails) - [ ] Soft delete (deleted_at) is used only where business rules require it - [ ] No business logic is embedded in database triggers or stored procedures (keep logic in the application layer)
2. Normalisation Strategies
Normal Forms Reference
| Normal Form | Rule | Violation Example | Fix |
|---|---|---|---|
| 1NF | Every column holds atomic (indivisible) values. No repeating groups. | in a single column | Create a separate table with a junction table |
| 2NF | 1NF + every non-key column depends on the entire primary key (relevant for composite keys) | — depends only on | Move to the table |
| 3NF | 2NF + no transitive dependencies (non-key column depends on another non-key column) | — depends on , not on | Move to a table |
| BCNF | 3NF + every determinant is a candidate key | Rare in practice; address when composite keys create functional dependency issues | Decompose the table so every determinant is a key |
When to Use Each Level
Target 3NF by default. This eliminates redundancy while keeping the schema manageable.
Use 2NF only as an intermediate step when refactoring legacy schemas — never as a design target.
Use BCNF when you have composite primary keys with overlapping candidate keys (uncommon in application databases).
Practical Normalisation Example
Unnormalised (0NF):
orders: | order_id | customer_name | customer_email | items | |----------|---------------|---------------------|--------------------------------| | 1 | John Doe | john@example.com | Widget x2, Gadget x1 |
1NF — Atomic values, no repeating groups:
CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, customer_email VARCHAR(255) NOT NULL, created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() ); CREATE TABLE order_items ( id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(id), product_name VARCHAR(100) NOT NULL, quantity INT NOT NULL CHECK (quantity > 0) );
2NF — Remove partial dependencies (product details depend on product, not order):
CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, price NUMERIC(19,4) NOT NULL ); CREATE TABLE order_items ( id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(id), product_id BIGINT NOT NULL REFERENCES products(id), quantity INT NOT NULL CHECK (quantity > 0), unit_price NUMERIC(19,4) NOT NULL -- snapshot of price at order time );
3NF — Remove transitive dependencies (customer depends on customer, not order):
CREATE TABLE customers ( id BIGSERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE ); CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, customer_id BIGINT NOT NULL REFERENCES customers(id), created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() );
Strategic Denormalisation
Denormalise only when you have measured performance data (via
EXPLAIN ANALYZE) proving that joins are the bottleneck. Common valid cases:
| Pattern | When to Use | Example |
|---|---|---|
| Materialised/cached columns | Frequently read aggregate that is expensive to compute on every query | computed from and stored on the order row |
| Snapshot columns | Values that must be preserved at a point in time even if the source changes | (price at time of purchase, not current product price) |
| Read-optimised views | Reporting or analytics queries that span many tables | Materialised views refreshed periodically |
| Search/filter columns | Columns derived from related tables used heavily in WHERE clauses | duplicated from for filtering |
Rules for denormalisation:
- Always document why the denormalisation exists (code comment + migration description).
- Ensure the denormalised data is kept in sync (application-level updates, triggers as last resort, or materialised view refresh).
- Treat denormalised data as a cache — the normalised source remains the source of truth.
3. Relationships & Foreign Keys
Relationship Types
One-to-Many (most common):
-- A customer has many orders CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE RESTRICT, created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() ); CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Many-to-Many (junction table):
-- Products can have many tags, tags can belong to many products CREATE TABLE products_tags ( product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE, tag_id BIGINT NOT NULL REFERENCES tags(id) ON DELETE CASCADE, PRIMARY KEY (product_id, tag_id) ); CREATE INDEX idx_products_tags_tag_id ON products_tags(tag_id);
One-to-One:
-- Each user has exactly one profile CREATE TABLE user_profiles ( user_id BIGINT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE, bio TEXT, avatar_url VARCHAR(500) );
ON DELETE / ON UPDATE Actions
| Action | Meaning | When to Use |
|---|---|---|
(default) | Prevent deletion if referenced rows exist | Default for most business entities (prevent accidental data loss) |
| Delete/update child rows when parent is deleted/updated | Junction tables, dependent child records that have no meaning without parent |
| Set FK to NULL when parent is deleted | Optional relationships where the child can exist independently |
| Set FK to its default value | Rare — use when a fallback reference makes business sense |
| Same as RESTRICT but checked at end of transaction | When deferred constraint checking is needed |
Rule: Default to
RESTRICT. Use CASCADE only on junction tables and tightly coupled child tables. Always explicitly specify the action — never rely on implicit defaults.
Self-Referencing Relationships
-- Categories with parent-child hierarchy CREATE TABLE categories ( id BIGSERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, parent_id BIGINT REFERENCES categories(id) ON DELETE SET NULL ); CREATE INDEX idx_categories_parent_id ON categories(parent_id);
For deep hierarchies (trees), consider:
- Adjacency list (above) — simple, good for shallow trees
- Materialised path —
storingpath VARCHAR(500)
— good for read-heavy trees/1/5/12/ - Nested sets — complex to maintain but fast for subtree queries
- Closure table — separate table storing all ancestor-descendant pairs — balanced read/write performance
4. Indexes
Index Types
| Type | Syntax (PostgreSQL) | Use Case |
|---|---|---|
| B-tree (default) | | Equality and range queries (, , , , ) |
| Hash | | Equality-only lookups (rare — B-tree covers this equally well) |
| GIN | | Full-text search, JSONB containment, array operations |
| GiST | | Geometric/spatial data, range types, nearest-neighbor queries |
| BRIN | | Very large tables with naturally ordered data (timestamps, sequential IDs) |
Index Strategy Rules
Always index:
- Foreign key columns (not auto-indexed in PostgreSQL/MySQL InnoDB)
- Columns frequently used in
clausesWHERE - Columns used in
on large tablesORDER BY - Columns used in
conditionsJOIN
Consider composite indexes for:
- Queries that filter on multiple columns together
- Covering indexes that satisfy a query entirely from the index
Avoid over-indexing:
- Each index adds overhead to
,INSERT
, andUPDATE
operationsDELETE - Indexes consume storage
- Review and remove unused indexes periodically
Composite Index Column Order
The order of columns in a composite index matters. Place columns in this priority:
- Equality conditions first (
)WHERE status = 'active' - Range conditions second (
)WHERE created_at > '2025-01-01' - Sort columns last (
)ORDER BY created_at DESC
-- Query: WHERE status = 'active' AND created_at > '2025-01-01' ORDER BY created_at DESC -- Optimal index: CREATE INDEX idx_orders_status_created_at ON orders(status, created_at DESC);
Partial Indexes
Index only the rows you actually query:
-- Only index active orders (if most queries filter for active) CREATE INDEX idx_orders_active ON orders(customer_id, created_at) WHERE deleted_at IS NULL;
Unique Indexes
Use unique indexes to enforce business constraints:
-- Email must be unique, but only for non-deleted users CREATE UNIQUE INDEX uq_users_email_active ON users(email) WHERE deleted_at IS NULL;
Index Monitoring
Periodically check for unused indexes:
-- PostgreSQL: find unused indexes SELECT schemaname, relname AS table_name, indexrelname AS index_name, idx_scan AS times_used, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;
Non-Blocking Index Operations (CONCURRENTLY)
Standard
CREATE INDEX and DROP INDEX acquire locks that block writes on the table for the duration of the operation. On large or heavily-used tables this can cause downtime.
Always use
when creating or dropping indexes on tables that serve live traffic:CONCURRENTLY
-- Creating an index without blocking writes CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id); -- Dropping an index without blocking writes DROP INDEX CONCURRENTLY idx_orders_customer_id;
Key constraints and caveats:
| Aspect | Detail |
|---|---|
| Database support | is PostgreSQL-specific. Other engines have their own online indexing alternatives: MySQL/MariaDB use ; SQL Server uses ; Oracle uses keyword. Consult engine-specific docs for constraints and limitations |
| Transaction blocks | and cannot run inside a transaction block — avoid wrapping them in / |
| Migration frameworks | Most migration tools wrap statements in a transaction by default. Disable this for concurrent index operations (e.g., in Rails, in Django, separate migration step in Flyway) |
| Failed builds | If fails, it leaves behind an invalid index. Check with and drop the invalid index before retrying |
| Unique indexes | performs an extra table scan — takes longer but still avoids blocking writes |
| Build time | Concurrent builds are slower than regular ones because they require multiple table passes |
When to skip
:CONCURRENTLY
- Initial schema setup or empty tables (no live traffic to block)
- During a maintenance window with no active connections
- Test/development environments
5. Writing Performant SQL Queries
Query Writing Rules
| Rule | Do | Don't |
|---|---|---|
| Specify columns | | |
| Use parameterised queries | / | + userId + (SQL injection risk) |
| Limit results | or paginate | Unbounded on large tables |
| Filter early | clause narrows rows before joins | Joining full tables then filtering |
| Use EXISTS over IN for subqueries | | on large sets |
| Avoid functions on indexed columns | | (kills index) |
| Use UNION ALL over UNION | when duplicates are acceptable | (forces sort + dedup) |
JOIN Optimisation
Join Types and When to Use
| Join Type | Returns | Use When |
|---|---|---|
| Only matching rows from both tables | You need data that exists in both tables |
| All rows from left table + matches from right (NULL if no match) | You need all records from the primary table regardless of match |
| All rows from right table + matches from left | Rarely used — rewrite as LEFT JOIN for readability |
| Cartesian product of both tables | Generating combinations (e.g., all products × all regions) |
| Correlated subquery for each row | Top-N per group, complex per-row calculations |
Join Performance Guidelines
-- GOOD: Join on indexed columns, filter early SELECT o.id, o.created_at, c.name FROM orders o INNER JOIN customers c ON c.id = o.customer_id WHERE o.status = 'active' AND o.created_at >= '2025-01-01' ORDER BY o.created_at DESC LIMIT 50; -- BAD: Joining on expressions, filtering late SELECT * FROM orders o INNER JOIN customers c ON LOWER(c.email) = LOWER(o.customer_email) WHERE YEAR(o.created_at) = 2025;
Rules:
- Always join on indexed columns (typically primary keys and foreign keys).
- Place the most restrictive
conditions on the driving table to reduce the row set early.WHERE - Avoid joining on computed expressions — create a persisted computed column or a functional index if needed.
- When joining many tables, consider whether a subquery or CTE might be clearer and equally performant.
Pagination Strategies
Offset-based pagination (simple but degrades on large offsets):
SELECT id, name, email FROM users WHERE deleted_at IS NULL ORDER BY created_at DESC LIMIT 20 OFFSET 1000; -- Database must scan and discard 1000 rows
Keyset/cursor-based pagination (performant at any depth):
-- First page SELECT id, name, email, created_at FROM users WHERE deleted_at IS NULL ORDER BY created_at DESC, id DESC LIMIT 20; -- Next page (use last row's values as cursor) SELECT id, name, email, created_at FROM users WHERE deleted_at IS NULL AND (created_at, id) < ('2025-02-01T10:00:00Z', '550e8400-e29b-41d4-a716-446655440000') ORDER BY created_at DESC, id DESC LIMIT 20;
Rule: Use offset-based pagination for admin/back-office UIs with moderate data volumes (< 100k rows). Use keyset pagination for APIs, infinite scrolling, and large datasets.
Aggregation Best Practices
-- GOOD: Filter before aggregating SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent FROM orders WHERE status = 'completed' AND created_at >= '2025-01-01' GROUP BY customer_id HAVING COUNT(*) > 5 ORDER BY total_spent DESC; -- BAD: Aggregating everything then filtering in application code SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent FROM orders GROUP BY customer_id; -- Then filtering in PHP/Node/Java... wasted database work
Common Table Expressions (CTEs)
Use CTEs for readability. Be aware that in some databases (PostgreSQL < 12), CTEs act as optimisation fences:
-- Readable multi-step query WITH active_customers AS ( SELECT id, name FROM customers WHERE status = 'active' AND deleted_at IS NULL ), recent_orders AS ( SELECT customer_id, COUNT(*) AS order_count FROM orders WHERE created_at >= NOW() - INTERVAL '30 days' GROUP BY customer_id ) SELECT ac.id, ac.name, COALESCE(ro.order_count, 0) AS recent_orders FROM active_customers ac LEFT JOIN recent_orders ro ON ro.customer_id = ac.id ORDER BY recent_orders DESC;
Window Functions
Use window functions for ranking, running totals, and per-group calculations without collapsing rows:
-- Rank orders by total within each customer SELECT customer_id, id AS order_id, total, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total DESC) AS rank, SUM(total) OVER (PARTITION BY customer_id) AS customer_total FROM orders WHERE status = 'completed';
Batch Operations
For large data modifications, batch to avoid long locks and transaction log bloat:
-- Instead of one massive DELETE: -- DELETE FROM logs WHERE created_at < '2024-01-01'; -- locks millions of rows -- Batch delete: DO $$ DECLARE rows_deleted INT; BEGIN LOOP DELETE FROM logs WHERE id IN ( SELECT id FROM logs WHERE created_at < '2024-01-01' LIMIT 5000 ); GET DIAGNOSTICS rows_deleted = ROW_COUNT; EXIT WHEN rows_deleted = 0; COMMIT; END LOOP; END $$;
6. Transactions
ACID Properties
| Property | Meaning | Practical Impact |
|---|---|---|
| Atomicity | All operations in a transaction succeed or all are rolled back | Partial updates never reach the database |
| Consistency | Transaction moves the database from one valid state to another | Constraints are enforced at commit time |
| Isolation | Concurrent transactions don't interfere with each other | Determined by the isolation level |
| Durability | Committed data survives system failures | Data is persisted to disk on commit |
Transaction Isolation Levels
| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Use Case |
|---|---|---|---|---|
| Yes | Yes | Yes | Almost never — only for dirty analytics on non-critical data |
| No | Yes | Yes | Default for PostgreSQL. Good for most OLTP workloads |
| No | No | Yes* | Financial calculations, inventory checks within a transaction |
| No | No | No | Critical financial operations, booking systems with strict consistency |
*PostgreSQL's
REPEATABLE READ also prevents phantom reads (it uses snapshot isolation internally).
Rule: Use
READ COMMITTED by default. Escalate to REPEATABLE READ or SERIALIZABLE only for specific operations that require stronger guarantees, and handle serialisation failures with retry logic.
Transaction Best Practices
-- GOOD: Short, focused transaction BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE id = 1; UPDATE accounts SET balance = balance + 100.00 WHERE id = 2; INSERT INTO transactions (from_id, to_id, amount) VALUES (1, 2, 100.00); COMMIT;
Rules:
- Keep transactions as short as possible. Long transactions hold locks and block other operations.
- Never perform external I/O (HTTP calls, file operations) inside a transaction.
- Always handle transaction failures: catch exceptions, rollback, and optionally retry.
- Use
for partial rollbacks within a larger transaction when needed.SAVEPOINT - Acquire locks in a consistent order across all transactions to prevent deadlocks (e.g., always lock by ascending ID).
Savepoints
BEGIN; INSERT INTO orders (customer_id, total) VALUES (1, 250.00); SAVEPOINT before_items; INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 99, 1); -- If this fails: ROLLBACK TO SAVEPOINT before_items; -- Order is still inserted, items are rolled back COMMIT;
7. Locking Mechanics
Lock Types
| Lock Type | SQL | Scope | Behaviour |
|---|---|---|---|
| Row-level shared (FOR SHARE) | | Row | Other transactions can read but not modify the row |
| Row-level exclusive (FOR UPDATE) | | Row | Other transactions cannot read (with FOR UPDATE) or modify the row |
| Table-level | | Table | Applies to the entire table — use sparingly |
| Advisory locks | | Application-defined | Application-level coordination, not tied to specific rows |
Optimistic Locking
Optimistic locking assumes conflicts are rare. It checks for conflicts at write time using a version column:
-- Schema ALTER TABLE products ADD COLUMN version INT NOT NULL DEFAULT 1; -- Read: fetch current version SELECT id, name, price, stock, version FROM products WHERE id = 42; -- Application receives: version = 3 -- Update: include version check UPDATE products SET stock = stock - 1, version = version + 1, updated_at = NOW() WHERE id = 42 AND version = 3; -- If 0 rows affected → another transaction modified the row → handle conflict (retry or error)
When to use:
- Low contention (concurrent writes to the same row are rare)
- Read-heavy workloads
- User-facing forms where edits happen seconds/minutes apart
- Distributed systems where holding database locks across requests is impractical
ORM support:
| ORM | Implementation |
|---|---|
| TypeORM | decorator |
| Prisma | Manual implementation via field and conditional update |
| Doctrine (PHP) | annotation on an integer or datetime column |
| Eloquent (Laravel) | Manual implementation or package |
| Entity Framework (.NET) | attribute or in Fluent API |
| Hibernate (Java) | annotation |
| GORM (Go) | Built-in optimistic locking with tag |
Pessimistic Locking
Pessimistic locking assumes conflicts are likely. It acquires locks at read time to prevent concurrent modification:
-- Lock the row immediately — other transactions block until this transaction commits BEGIN; SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE; -- ... perform business logic ... UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 42; COMMIT;
Variants:
-- FOR UPDATE SKIP LOCKED — skip already-locked rows (useful for job queues) SELECT id, payload FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED; -- FOR UPDATE NOWAIT — fail immediately if the row is locked (instead of blocking) SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE NOWAIT; -- Throws an error immediately if the row is locked by another transaction
When to use:
- High contention (concurrent writes to the same row are likely)
- Critical sections that must not have conflicts (inventory decrements, booking seats)
- Short-lived transactions where holding a lock briefly is acceptable
- Job queues and task processing (
)SKIP LOCKED
Optimistic vs Pessimistic — Decision Guide
| Factor | Optimistic | Pessimistic |
|---|---|---|
| Conflict frequency | Low (< 1% of operations) | High or unpredictable |
| Read/write ratio | Read-heavy | Write-heavy on contested resources |
| Transaction duration | Can be long (no locks held) | Must be short (locks block others) |
| Failure handling | Retry on conflict | Block until lock is released |
| Distributed systems | Preferred (no lock coordination needed) | Difficult (requires sticky sessions or distributed locks) |
| User experience | May show "conflict, please retry" | May show loading/waiting if contention is high |
Deadlock Prevention
- Consistent lock ordering: Always acquire locks in the same order (e.g., by ascending primary key).
- Short transactions: Minimise the time locks are held.
- Lock timeout: Set
to avoid indefinite blocking.lock_timeout - Detect and retry: Handle deadlock exceptions and retry the transaction.
-- PostgreSQL: set a lock timeout SET lock_timeout = '5s'; -- MySQL: set innodb_lock_wait_timeout per session SET innodb_lock_wait_timeout = 5;
8. Query Debugging with EXPLAIN
EXPLAIN Basics
-- Show the query plan (does not execute the query) EXPLAIN SELECT * FROM orders WHERE customer_id = 42; -- Show the query plan AND execute the query (shows actual timings) EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42; -- Full diagnostic output (PostgreSQL) EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT o.id, c.name FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.status = 'active' ORDER BY o.created_at DESC LIMIT 20;
Reading EXPLAIN Output
Key fields to examine:
| Field | What It Tells You |
|---|---|
| Seq Scan | Full table scan — often a sign of missing index |
| Index Scan | Using an index to find rows — this is what you want |
| Index Only Scan | Data served entirely from the index (covering index) — best case |
| Bitmap Index Scan | Index used to build a bitmap, then table scanned — common for OR conditions / multiple indexes |
| Nested Loop | For each row in outer table, scan inner table — efficient for small outer sets |
| Hash Join | Build hash table from one side, probe with the other — efficient for large equijoins |
| Merge Join | Both sides sorted, then merged — efficient when both inputs are pre-sorted |
| Sort | Explicit sort operation — check if an index could eliminate this |
| Rows (estimated) | Planner's estimate of rows processed at this step |
| Actual Rows (ANALYZE only) | Real number of rows processed — compare with estimated to find planner misestimates |
| Buffers (ANALYZE + BUFFERS) | Shared/local buffers hit (cache) vs read (disk) — high reads = cache miss, consider prewarming or more memory |
| Cost | Arbitrary units comparing relative cost of plan steps — lower is better |
Common Performance Problems and Fixes
| EXPLAIN Symptom | Likely Cause | Fix |
|---|---|---|
on large table | Missing index | Add appropriate index |
despite index existing | Query uses a function on the indexed column | Rewrite query or create a functional index |
| Estimated rows ≠ actual rows (off by 10x+) | Stale statistics | Run |
with high cost | No index matching | Add index covering sort columns |
with large inner table | Planner chose wrong join strategy | Verify indexes on join columns; consider increasing |
spilling to disk | too small for the hash table | Increase for the session or globally |
High count | Data not in cache | Increase or ensure table fits in memory; check for bloated tables |
Iterative Query Optimisation Process
Query optimisation process: 1. [ ] Identify the slow query (application logs, pg_stat_statements, slow query log) 2. [ ] Run EXPLAIN ANALYZE on the query 3. [ ] Identify the most expensive node in the plan 4. [ ] Determine root cause (missing index, bad statistics, inefficient join, sort) 5. [ ] Apply ONE fix at a time 6. [ ] Re-run EXPLAIN ANALYZE and compare 7. [ ] Repeat until acceptable performance is reached 8. [ ] Verify the fix doesn't degrade other queries
MySQL-Specific EXPLAIN Notes
-- MySQL uses a different EXPLAIN format EXPLAIN SELECT * FROM orders WHERE customer_id = 42; -- Key MySQL EXPLAIN columns: -- type: ALL (full scan) → index → range → ref → eq_ref → const → system (best to worst) -- possible_keys: indexes the optimiser considered -- key: index actually used -- rows: estimated rows scanned -- Extra: "Using filesort", "Using temporary" are red flags -- MySQL 8.0+ supports EXPLAIN ANALYZE: EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
9. SQL Writing Best Practices
Formatting Standards
-- GOOD: Readable, consistent formatting SELECT u.id, u.email, u.first_name, u.last_name, COUNT(o.id) AS order_count, COALESCE(SUM(o.total), 0) AS lifetime_value FROM users u LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed' WHERE u.deleted_at IS NULL AND u.created_at >= '2025-01-01' GROUP BY u.id, u.email, u.first_name, u.last_name HAVING COUNT(o.id) > 0 ORDER BY lifetime_value DESC LIMIT 50;
Formatting rules:
- Keywords in UPPERCASE:
,SELECT
,FROM
,WHERE
,JOIN
,ORDER BYGROUP BY - One column per line in
list for queries with more than 3 columnsSELECT
/AND
at the beginning of the line, indentedOR- Table aliases: short but meaningful (
foru
,users
foro
)orders - Align
conditions consistentlyJOIN ... ON - Always terminate statements with
;
Security — Preventing SQL Injection
Never concatenate user input into SQL strings. Always use parameterised queries:
-- GOOD: Parameterised (prepared statement) -- PostgreSQL / Node.js SELECT * FROM users WHERE email = $1; -- MySQL / PHP SELECT * FROM users WHERE email = ?;
ORM equivalents (all safe by default):
// TypeORM — safe const user = await userRepository.findOne({ where: { email: userInput } }); // Prisma — safe const user = await prisma.user.findUnique({ where: { email: userInput } });
// Doctrine — safe $user = $repository->findOneBy(['email' => $userInput]); // Eloquent — safe $user = User::where('email', $userInput)->first();
// Entity Framework — safe var user = await context.Users.FirstOrDefaultAsync(u => u.Email == userInput);
// Hibernate — safe User user = session.createQuery("FROM User u WHERE u.email = :email", User.class) .setParameter("email", userInput) .uniqueResult();
// GORM — safe var user User db.Where("email = ?", userInput).First(&user)
DANGER: Raw queries with string concatenation:
// TypeORM — DANGEROUS if not parameterised const users = await dataSource.query(`SELECT * FROM users WHERE email = '${userInput}'`); // SQL INJECTION! // TypeORM — SAFE raw query const users = await dataSource.query('SELECT * FROM users WHERE email = $1', [userInput]);
NULL Handling
-- WRONG: = NULL never matches SELECT * FROM users WHERE deleted_at = NULL; -- CORRECT: Use IS NULL / IS NOT NULL SELECT * FROM users WHERE deleted_at IS NULL; -- Use COALESCE for default values SELECT COALESCE(u.nickname, u.first_name, 'Anonymous') AS display_name FROM users u; -- NULL-safe comparison (when comparing two nullable columns) -- PostgreSQL: SELECT * FROM t1 JOIN t2 ON t1.val IS NOT DISTINCT FROM t2.val; -- MySQL: SELECT * FROM t1 JOIN t2 ON t1.val <=> t2.val;
UPSERT Patterns
-- PostgreSQL: INSERT ... ON CONFLICT INSERT INTO products (sku, name, price, updated_at) VALUES ('WIDGET-001', 'Widget', 19.99, NOW()) ON CONFLICT (sku) DO UPDATE SET name = EXCLUDED.name, price = EXCLUDED.price, updated_at = NOW(); -- MySQL: INSERT ... ON DUPLICATE KEY UPDATE INSERT INTO products (sku, name, price, updated_at) VALUES ('WIDGET-001', 'Widget', 19.99, NOW()) ON DUPLICATE KEY UPDATE name = VALUES(name), price = VALUES(price), updated_at = NOW();
Avoiding Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Fetches unnecessary columns, breaks when schema changes | Explicitly list needed columns |
| 1 query to fetch parents + N queries for children | Use or batch loading () |
with index | / cannot use index efficiently | Restructure as positive condition or check if full scan is acceptable |
| Leading wildcard prevents index use | Use full-text search ( in PostgreSQL, in MySQL) |
| Scans and sorts entire table | Use a sampled approach: |
| Implicit type conversion | may skip index | Match types explicitly: |
| Correlated subquery in SELECT | Executes subquery for every row | Rewrite as or window function |
Missing on exploratory queries | Accidentally fetching millions of rows | Always add when exploring data |
10. ORM Integration Guidelines
ORM Selection by Stack
| Stack | Primary ORM | Alternative |
|---|---|---|
| PHP (Symfony) | Doctrine ORM | — |
| PHP (Laravel) | Eloquent ORM | — |
| Node.js (NestJS) | TypeORM or MikroORM | Prisma |
| Node.js (General) | Prisma | TypeORM, Drizzle |
| .NET | Entity Framework Core | Dapper (micro-ORM for raw SQL) |
| Java (Spring) | Hibernate / Spring Data JPA | jOOQ (for SQL-first approach) |
| Go | GORM | sqlx (for raw SQL), Ent |
ORM Best Practices
1. Always review generated SQL. Enable query logging in development to inspect what the ORM generates. Inefficient ORM usage can produce catastrophic queries.
// TypeORM: enable query logging const dataSource = new DataSource({ logging: ['query', 'error'], // ... }); // Prisma: enable query logging const prisma = new PrismaClient({ log: ['query', 'warn', 'error'], });
// Doctrine: enable SQL logger $configuration->setSQLLogger(new \Doctrine\DBAL\Logging\EchoSQLLogger()); // Laravel: enable query log DB::enableQueryLog(); // ... run queries ... dd(DB::getQueryLog());
2. Solve N+1 queries with eager loading.
// TypeORM — eager loading with relations const orders = await orderRepository.find({ relations: ['customer', 'items', 'items.product'], where: { status: 'active' }, }); // Prisma — include related data const orders = await prisma.order.findMany({ where: { status: 'active' }, include: { customer: true, items: { include: { product: true } }, }, });
// Eloquent — eager loading $orders = Order::with(['customer', 'items.product']) ->where('status', 'active') ->get(); // Doctrine — DQL with joins $orders = $em->createQuery(' SELECT o, c, i, p FROM App\Entity\Order o JOIN o.customer c JOIN o.items i JOIN i.product p WHERE o.status = :status ')->setParameter('status', 'active') ->getResult();
// Entity Framework — eager loading var orders = await context.Orders .Include(o => o.Customer) .Include(o => o.Items) .ThenInclude(i => i.Product) .Where(o => o.Status == "active") .ToListAsync();
// Hibernate / Spring Data JPA — fetch join @Query("SELECT o FROM Order o " + "JOIN FETCH o.customer " + "JOIN FETCH o.items i " + "JOIN FETCH i.product " + "WHERE o.status = :status") List<Order> findActiveWithDetails(@Param("status") String status);
// GORM — preload var orders []Order db.Preload("Customer").Preload("Items.Product"). Where("status = ?", "active"). Find(&orders)
3. Use raw SQL for complex queries. When ORM abstractions produce inefficient SQL or cannot express the query you need, drop to raw SQL:
// TypeORM — raw query const results = await dataSource.query(` SELECT u.id, u.email, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed' WHERE u.deleted_at IS NULL GROUP BY u.id, u.email HAVING COUNT(o.id) > 5 ORDER BY order_count DESC LIMIT $1 `, [50]); // Prisma — raw query const results = await prisma.$queryRaw` SELECT u.id, u.email, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed' WHERE u.deleted_at IS NULL GROUP BY u.id, u.email HAVING COUNT(o.id) > 5 ORDER BY order_count DESC LIMIT ${50} `;
4. ORM Migration Workflow.
- Generate migrations from entity/model changes, never edit the database directly.
- Review generated migration SQL before applying — ORMs sometimes generate suboptimal DDL.
- Test migrations against a copy of production data to catch issues (data truncation, long-running locks).
5. Repository Pattern with ORMs. Keep all database queries in repository classes. Services should never directly use the ORM's query builder or entity manager:
Service → Repository Interface → Repository Implementation (uses ORM)
This keeps business logic decoupled from the data access mechanism and makes testing easier (mock the repository interface).
11. Database Maintenance & Monitoring
Statistics and Vacuuming (PostgreSQL)
-- Update table statistics for the planner ANALYZE orders; -- Update statistics for all tables ANALYZE; -- Check for table bloat (dead tuples) SELECT relname, n_dead_tup, n_live_tup, ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC;
Slow Query Identification
-- PostgreSQL: enable pg_stat_statements extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Find slowest queries by total time SELECT calls, ROUND(total_exec_time::NUMERIC, 2) AS total_ms, ROUND(mean_exec_time::NUMERIC, 2) AS mean_ms, ROUND(max_exec_time::NUMERIC, 2) AS max_ms, LEFT(query, 200) AS query_preview FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20;
-- MySQL: enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- queries taking > 1 second
Connection Pooling
- Always use connection pooling in production. Direct connections are expensive to create.
- PostgreSQL: use PgBouncer or built-in pooling in the ORM/driver.
- MySQL: use ProxySQL or driver-level pooling.
- Set pool size based on:
(formula from PostgreSQL wiki).pool_size = (core_count * 2) + effective_spindle_count - Monitor for connection exhaustion — set
appropriately and alert when pool utilisation exceeds 80%.max_connections
Implementation Procedure
When writing SQL or designing a database schema, follow this workflow:
Implementation progress: - [ ] Step 1: Understand the data requirements - [ ] Step 2: Design the schema (entities, relationships, constraints) - [ ] Step 3: Choose normalisation level and document any denormalisation decisions - [ ] Step 4: Define indexes based on expected query patterns - [ ] Step 5: Write the migration(s) - [ ] Step 6: Write the queries (raw SQL or ORM) - [ ] Step 7: Run EXPLAIN ANALYZE on critical queries - [ ] Step 8: Implement appropriate locking and transaction strategy - [ ] Step 9: Review ORM-generated SQL (if applicable) - [ ] Step 10: Test with realistic data volumes
Step 1: Understand the data requirements Read the task description and acceptance criteria. Identify the entities, their attributes, and the relationships between them. Clarify cardinality (one-to-many, many-to-many).
Step 2: Design the schema Create tables following the naming conventions and standard columns defined in this skill. Define relationships with appropriate foreign keys and ON DELETE actions.
Step 3: Choose normalisation level Target 3NF by default. Document any intentional denormalisation with a clear justification tied to measured performance needs or snapshot requirements.
Step 4: Define indexes Identify the primary query patterns and create indexes to support them. Index all foreign keys. Create composite indexes for multi-column filters and sorts.
Step 5: Write the migration(s) Generate migration files with both
up and down methods. Review the generated DDL before applying.
Step 6: Write the queries Write queries following the performance and formatting rules in this skill. Use parameterised queries. Avoid
SELECT *.
Step 7: Run EXPLAIN ANALYZE Verify critical query plans. Look for sequential scans on large tables, missing indexes, and planner misestimates.
Step 8: Implement locking and transaction strategy Choose optimistic or pessimistic locking based on the contention profile. Define transaction boundaries and appropriate isolation levels.
Step 9: Review ORM-generated SQL Enable query logging and inspect generated queries for N+1 problems, unnecessary joins, or missing eager loading.
Step 10: Test with realistic data volumes Seed the development database with production-scale data. Verify that queries perform within acceptable time limits under load.
Connected Skills
— for API layer patterns, repository pattern, and database handling guidancebackend-api-development
— for designing data models as part of broader system architecturearchitecture-design
— for validating SQL quality, index coverage, and query performancecode-review
— for establishing database conventions and existing patterns before designingtechnical-context-discovery