Claude-skill-registry Database Query Optimization
Comprehensive guide to optimizing database queries, understanding EXPLAIN plans, indexing strategies, and eliminating N+1 queries
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/db-query-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-database-query-optimization-736b30 && rm -rf "$T"
skills/data/db-query-optimization/SKILL.mdDatabase Query Optimization
Why Query Optimization Matters
The database is often the bottleneck in modern applications. A single slow query can bring down your entire system.
Key Impacts
- Slow Queries = Slow Application: Database latency directly affects user experience
- Database is Often the Bottleneck: CPU and memory are cheap; database I/O is expensive
- Exponential Cost: Unoptimized queries get exponentially slower as data grows
- Resource Exhaustion: Slow queries consume connections, lock tables, and block other queries
- Cascading Failures: One slow query can cause timeouts across your entire system
The Golden Rule
"Optimize for reads, not writes" (unless you're write-heavy)
Most applications are read-heavy (90% reads, 10% writes). Optimize accordingly.
Query Analysis Tools
PostgreSQL
1. EXPLAIN
Shows the query execution plan without running the query.
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Output:
Seq Scan on users (cost=0.00..35.50 rows=1 width=100) Filter: (email = 'test@example.com'::text)
2. EXPLAIN ANALYZE
Runs the query and shows actual execution times.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Output:
Index Scan using users_email_idx on users (cost=0.29..8.30 rows=1 width=100) (actual time=0.015..0.016 rows=1 loops=1) Index Cond: (email = 'test@example.com'::text) Planning Time: 0.123 ms Execution Time: 0.045 ms
3. pg_stat_statements
Tracks execution statistics for all SQL statements.
Enable:
-- In postgresql.conf shared_preload_libraries = 'pg_stat_statements' -- Create extension CREATE EXTENSION pg_stat_statements;
Query slow queries:
SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
MySQL
1. EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Output:
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2. Slow Query Log
-- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- Log queries > 1 second SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
Analyze with mysqldumpslow:
mysqldumpslow -s t -t 10 /var/log/mysql/slow.log
MongoDB
1. explain()
db.users.find({ email: 'test@example.com' }).explain('executionStats');
Output:
{ "executionStats": { "executionTimeMillis": 150, "totalDocsExamined": 10000, "totalKeysExamined": 0, "executionStages": { "stage": "COLLSCAN" // ❌ Bad: Collection scan } } }
2. Profiler
// Enable profiler (level 2 = all operations) db.setProfilingLevel(2); // Query slow operations db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(10);
Redis
SLOWLOG
# Get slow commands SLOWLOG GET 10 # Configure threshold (microseconds) CONFIG SET slowlog-log-slower-than 10000 # 10ms
Understanding EXPLAIN
PostgreSQL EXPLAIN Output
Scan Types (Best to Worst)
-
Index Scan ✅ (Best)
Index Scan using users_email_idx on users- Uses index to find rows
- Fast for small result sets
-
Index Only Scan ✅ (Best)
Index Only Scan using users_email_name_idx on users- All data in index (no table lookup needed)
- Fastest possible
-
Bitmap Index Scan ⚠️ (OK)
Bitmap Index Scan on users_created_at_idx- Combines multiple indexes
- Good for medium result sets
-
Seq Scan ❌ (Bad for large tables)
Seq Scan on users- Scans entire table
- OK for small tables (<1000 rows)
- Bad for large tables
Join Types
-
Nested Loop (Small datasets)
Nested Loop- Good for small result sets
- O(n × m) complexity
-
Hash Join (Medium datasets)
Hash Join- Builds hash table in memory
- Good for medium result sets
-
Merge Join (Large sorted datasets)
Merge Join- Requires sorted inputs
- Good for large result sets
Cost Estimates
Seq Scan on users (cost=0.00..35.50 rows=1 width=100) ↑ ↑ ↑ ↑ startup total estimated row cost cost rows width
- Startup Cost: Cost to get first row
- Total Cost: Cost to get all rows
- Rows: Estimated number of rows
- Width: Average row size in bytes
Lower cost = better
Index Fundamentals
Index Types
1. B-tree Indexes (Default, Most Common)
Use Cases:
- Equality (
)= - Range (
,<
,>
)BETWEEN - Sorting (
)ORDER BY - Pattern matching (
)LIKE 'prefix%'
Create:
CREATE INDEX users_email_idx ON users(email);
2. Hash Indexes (Equality Only)
Use Cases:
- Equality (
) only= - Faster than B-tree for equality
Create:
CREATE INDEX users_email_hash_idx ON users USING HASH (email);
Limitations:
- No range queries
- No sorting
- PostgreSQL only (MySQL doesn't support)
3. GiST/GIN (Full-Text, Arrays, JSON)
Use Cases:
- Full-text search
- Array contains (
)@> - JSON queries (
,@>
)?
Create:
-- Full-text search CREATE INDEX posts_content_gin_idx ON posts USING GIN (to_tsvector('english', content)); -- Array contains CREATE INDEX tags_gin_idx ON posts USING GIN (tags); -- JSON CREATE INDEX metadata_gin_idx ON products USING GIN (metadata);
4. Partial Indexes (Filtered)
Use Cases:
- Index only a subset of rows
- Smaller index size
- Faster queries on filtered data
Create:
-- Only index active users CREATE INDEX users_active_email_idx ON users(email) WHERE status = 'active';
5. Covering Indexes (Include Columns)
Use Cases:
- Avoid table lookups
- Index-only scans
Create:
-- PostgreSQL CREATE INDEX users_email_name_idx ON users(email) INCLUDE (name); -- MySQL (composite index) CREATE INDEX users_email_name_idx ON users(email, name);
When to Add Indexes
✅ Add Indexes For:
-
Frequent WHERE Conditions
-- Query SELECT * FROM users WHERE email = 'test@example.com'; -- Index CREATE INDEX users_email_idx ON users(email); -
JOIN Columns
-- Query SELECT * FROM orders o JOIN users u ON o.user_id = u.id; -- Index CREATE INDEX orders_user_id_idx ON orders(user_id); -
ORDER BY Columns
-- Query SELECT * FROM posts ORDER BY created_at DESC LIMIT 10; -- Index CREATE INDEX posts_created_at_idx ON posts(created_at DESC); -
Foreign Keys
-- Always index foreign keys! CREATE INDEX orders_user_id_idx ON orders(user_id);
❌ Don't Add Indexes For:
-
Write-Heavy Tables
- Indexes slow down INSERT, UPDATE, DELETE
- Trade-off: Read speed vs write speed
-
Low-Cardinality Columns
-- BAD: Only 2 unique values CREATE INDEX users_is_active_idx ON users(is_active);- Exception: Partial indexes on rare values
-- GOOD: Only index inactive users (rare) CREATE INDEX users_inactive_idx ON users(id) WHERE is_active = false; -
Small Tables (<1000 rows)
- Seq scan is faster than index scan
- Index overhead not worth it
-
Columns with Functions
-- BAD: Index not used SELECT * FROM users WHERE LOWER(email) = 'test@example.com'; -- GOOD: Functional index CREATE INDEX users_email_lower_idx ON users(LOWER(email));
Index Maintenance
1. Find Unused Indexes
PostgreSQL:
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_toast%' ORDER BY pg_relation_size(indexrelid) DESC;
Action: Drop unused indexes
DROP INDEX users_unused_idx;
2. Find Duplicate Indexes
PostgreSQL:
SELECT pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS size, (array_agg(idx))[1] AS idx1, (array_agg(idx))[2] AS idx2, (array_agg(idx))[3] AS idx3, (array_agg(idx))[4] AS idx4 FROM ( SELECT indexrelid::regclass AS idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'|| COALESCE(indexprs::text,'')||E'\n' || COALESCE(indpred::text,'')) AS key FROM pg_index ) sub GROUP BY key HAVING COUNT(*) > 1 ORDER BY SUM(pg_relation_size(idx)) DESC;
3. Index Bloat (REINDEX)
Check bloat:
SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC;
Rebuild bloated indexes:
REINDEX INDEX CONCURRENTLY users_email_idx;
Query Optimization Techniques
1. SELECT Only Needed Columns (Not SELECT *)
Bad:
SELECT * FROM users WHERE id = 1;
Good:
SELECT id, name, email FROM users WHERE id = 1;
Why:
- Less data transferred
- Smaller result set
- Can use covering indexes
2. Proper WHERE Filtering
Bad:
-- Fetches all rows, filters in application SELECT * FROM users; -- app.filter(user => user.status === 'active')
Good:
-- Filters in database SELECT * FROM users WHERE status = 'active';
3. Avoid Functions on Indexed Columns
Bad:
-- Index not used SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
Good:
-- Use functional index CREATE INDEX users_email_lower_idx ON users(LOWER(email)); SELECT * FROM users WHERE LOWER(email) = 'test@example.com'; -- Or store lowercase email SELECT * FROM users WHERE email = 'test@example.com';
4. Use EXISTS Instead of COUNT
Bad:
-- Counts all rows SELECT COUNT(*) FROM orders WHERE user_id = 1; -- if (count > 0) { ... }
Good:
-- Stops at first match SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);
5. Batch Queries (Avoid N+1)
Bad:
// N+1 queries const users = await db.users.findMany(); for (const user of users) { user.posts = await db.posts.findMany({ where: { userId: user.id } }); }
Good:
// Single query with join const users = await db.users.findMany({ include: { posts: true } });
6. Connection Pooling
Bad:
// New connection per query const client = new Client(); await client.connect(); await client.query('SELECT * FROM users'); await client.end();
Good:
// Connection pool const pool = new Pool({ max: 20 }); await pool.query('SELECT * FROM users');
N+1 Query Problem
The Problem
Example:
// 1 query to get users const users = await db.query('SELECT * FROM users LIMIT 10'); // N queries to get posts for each user for (const user of users) { const posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]); user.posts = posts; } // Total: 1 + 10 = 11 queries
Impact:
- 10 users = 11 queries
- 100 users = 101 queries
- 1000 users = 1001 queries
Solutions
1. JOIN
SQL:
SELECT u.*, p.id AS post_id, p.title AS post_title, p.content AS post_content FROM users u LEFT JOIN posts p ON p.user_id = u.id WHERE u.id IN (1, 2, 3, 4, 5);
Prisma:
const users = await prisma.user.findMany({ include: { posts: true } });
SQLAlchemy:
users = db.query(User).options(joinedload(User.posts)).all()
2. IN Clause
SQL:
-- 1. Get users SELECT * FROM users LIMIT 10; -- 2. Get all posts in one query SELECT * FROM posts WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
JavaScript:
const users = await db.query('SELECT * FROM users LIMIT 10'); const userIds = users.map(u => u.id); const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]); // Group posts by user_id const postsByUserId = posts.reduce((acc, post) => { if (!acc[post.user_id]) acc[post.user_id] = []; acc[post.user_id].push(post); return acc; }, {}); // Attach posts to users users.forEach(user => { user.posts = postsByUserId[user.id] || []; });
3. DataLoader (GraphQL)
Usage:
const DataLoader = require('dataloader'); const postLoader = new DataLoader(async (userIds) => { const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]); // Group by user_id const postsByUserId = userIds.map(id => posts.filter(post => post.user_id === id) ); return postsByUserId; }); // Usage const posts = await postLoader.load(userId);
Pagination Optimization
1. LIMIT/OFFSET (Simple but Slow)
Query:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 1000;
Problem:
- Database still scans first 1000 rows
- Slow for large offsets
- Inconsistent results if data changes
When to Use:
- Small datasets
- Low page numbers (< 100)
2. Keyset Pagination (Cursor-Based, Faster)
Query:
-- First page SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 20; -- Next page (using last item's created_at and id) SELECT * FROM posts WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345) ORDER BY created_at DESC, id DESC LIMIT 20;
Pros:
- Constant time (doesn't scan skipped rows)
- Consistent results
- Scales to millions of rows
Cons:
- Can't jump to arbitrary page
- Requires unique, sortable column
Implementation:
async function getPosts(cursor = null, limit = 20) { let query = 'SELECT * FROM posts'; let params = []; if (cursor) { const [created_at, id] = cursor.split('_'); query += ' WHERE (created_at, id) < (?, ?)'; params = [created_at, id]; } query += ' ORDER BY created_at DESC, id DESC LIMIT ?'; params.push(limit); const posts = await db.query(query, params); const nextCursor = posts.length === limit ? `${posts[posts.length - 1].created_at}_${posts[posts.length - 1].id}` : null; return { posts, nextCursor }; }
3. Avoid COUNT(*) on Every Page
Bad:
-- Runs on every page load SELECT COUNT(*) FROM posts; -- Slow! SELECT * FROM posts LIMIT 20 OFFSET 0;
Good:
-- Cache the count or estimate it SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'posts'; -- Or don't show total count SELECT * FROM posts LIMIT 20 OFFSET 0;
Aggregate Optimization
1. Materialized Views
Problem:
-- Slow: Calculates on every request SELECT user_id, COUNT(*) AS post_count, AVG(views) AS avg_views FROM posts GROUP BY user_id;
Solution:
-- Create materialized view CREATE MATERIALIZED VIEW user_post_stats AS SELECT user_id, COUNT(*) AS post_count, AVG(views) AS avg_views FROM posts GROUP BY user_id; -- Create index on materialized view CREATE INDEX user_post_stats_user_id_idx ON user_post_stats(user_id); -- Query materialized view (fast!) SELECT * FROM user_post_stats WHERE user_id = 1; -- Refresh periodically REFRESH MATERIALIZED VIEW CONCURRENTLY user_post_stats;
2. Summary Tables
Problem:
-- Slow: Aggregates millions of rows SELECT DATE(created_at) AS date, COUNT(*) AS orders, SUM(total) AS revenue FROM orders WHERE created_at >= '2024-01-01' GROUP BY DATE(created_at);
Solution:
-- Create summary table CREATE TABLE daily_order_stats ( date DATE PRIMARY KEY, orders INTEGER, revenue DECIMAL(10, 2) ); -- Populate with trigger or cron job INSERT INTO daily_order_stats (date, orders, revenue) SELECT DATE(created_at), COUNT(*), SUM(total) FROM orders WHERE DATE(created_at) = CURRENT_DATE GROUP BY DATE(created_at) ON CONFLICT (date) DO UPDATE SET orders = EXCLUDED.orders, revenue = EXCLUDED.revenue; -- Query summary table (fast!) SELECT * FROM daily_order_stats WHERE date >= '2024-01-01';
3. Pre-Computed Aggregates
Problem:
-- Slow: Counts on every request SELECT COUNT(*) FROM posts WHERE user_id = 1;
Solution:
-- Add post_count column to users table ALTER TABLE users ADD COLUMN post_count INTEGER DEFAULT 0; -- Update with trigger CREATE OR REPLACE FUNCTION update_user_post_count() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'INSERT' THEN UPDATE users SET post_count = post_count + 1 WHERE id = NEW.user_id; ELSIF TG_OP = 'DELETE' THEN UPDATE users SET post_count = post_count - 1 WHERE id = OLD.user_id; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER update_user_post_count_trigger AFTER INSERT OR DELETE ON posts FOR EACH ROW EXECUTE FUNCTION update_user_post_count(); -- Query (instant!) SELECT post_count FROM users WHERE id = 1;
PostgreSQL Specific
1. Parallel Queries
Enable:
SET max_parallel_workers_per_gather = 4;
Check if query uses parallelism:
EXPLAIN SELECT COUNT(*) FROM large_table; -- Look for "Parallel Seq Scan"
2. Partitioning
Range Partitioning:
-- Create partitioned table CREATE TABLE orders ( id SERIAL, user_id INTEGER, created_at TIMESTAMP, total DECIMAL(10, 2) ) PARTITION BY RANGE (created_at); -- Create partitions CREATE TABLE orders_2024_01 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE orders_2024_02 PARTITION OF orders FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); -- Queries automatically use correct partition SELECT * FROM orders WHERE created_at >= '2024-01-15';
Benefits:
- Faster queries (scans only relevant partitions)
- Easier maintenance (drop old partitions)
- Better vacuum performance
3. Extensions
pg_trgm (Fuzzy Search):
CREATE EXTENSION pg_trgm; -- Create GIN index for fuzzy search CREATE INDEX users_name_trgm_idx ON users USING GIN (name gin_trgm_ops); -- Fuzzy search SELECT * FROM users WHERE name % 'jhon'; -- Finds "John"
MongoDB Specific
1. Compound Indexes
Query:
db.users.find({ status: 'active', created_at: { $gte: ISODate('2024-01-01') } });
Index:
db.users.createIndex({ status: 1, created_at: -1 });
Index Prefix Rule:
- Index on
supports queries on:{a: 1, b: 1, c: 1}{a: 1}{a: 1, b: 1}{a: 1, b: 1, c: 1}
- But NOT
or{b: 1}{c: 1}
2. Covered Queries
Query:
db.users.find( { email: 'test@example.com' }, { _id: 0, name: 1, email: 1 } // Projection );
Index:
db.users.createIndex({ email: 1, name: 1 });
Check if covered:
db.users.find( { email: 'test@example.com' }, { _id: 0, name: 1, email: 1 } ).explain('executionStats'); // Look for "totalDocsExamined": 0 (covered query!)
3. Aggregation Pipeline Optimization
Bad:
db.orders.aggregate([ { $match: { status: 'completed' } }, { $sort: { created_at: -1 } }, { $limit: 10 } ]);
Good:
// Add index db.orders.createIndex({ status: 1, created_at: -1 }); // Same query, now uses index db.orders.aggregate([ { $match: { status: 'completed' } }, { $sort: { created_at: -1 } }, { $limit: 10 } ]);
Tips:
- Put
first (filter early)$match - Put
early (reduce data)$limit - Use indexes for
and$match$sort
Query Caching
1. Application-Level Cache (Redis)
Pattern:
async function getUser(id) { // Check cache const cached = await redis.get(`user:${id}`); if (cached) return JSON.parse(cached); // Query database const user = await db.query('SELECT * FROM users WHERE id = ?', [id]); // Cache result await redis.setex(`user:${id}`, 3600, JSON.stringify(user)); return user; }
2. Database Query Cache (MySQL, Deprecated in 8.0)
MySQL 5.7:
SET GLOBAL query_cache_type = 1; SET GLOBAL query_cache_size = 268435456; -- 256MB
Note: Removed in MySQL 8.0 (use application-level caching instead)
3. Materialized Views (PostgreSQL)
See "Aggregate Optimization" section above
Monitoring Query Performance
1. Slow Query Log
PostgreSQL:
-- In postgresql.conf log_min_duration_statement = 1000 -- Log queries > 1s -- Or set at runtime ALTER DATABASE mydb SET log_min_duration_statement = 1000;
MySQL:
SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1;
2. pg_stat_statements
Enable:
CREATE EXTENSION pg_stat_statements;
Find slow queries:
SELECT query, calls, total_exec_time, mean_exec_time, max_exec_time, stddev_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
3. APM Tools
Datadog:
- Automatic query tracking
- Slow query detection
- EXPLAIN plan analysis
New Relic:
- Database monitoring
- Slow transaction traces
- Query analysis
Real Optimization Examples
Example 1: Slow JOIN Query
Problem:
EXPLAIN ANALYZE SELECT u.name, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name; -- Execution Time: 5000 ms
EXPLAIN Output:
Hash Join (cost=1000.00..5000.00 rows=10000 width=100) (actual time=100.000..5000.000 rows=10000 loops=1) Hash Cond: (p.user_id = u.id) -> Seq Scan on posts p (cost=0.00..3000.00 rows=100000 width=4) -> Hash (cost=500.00..500.00 rows=10000 width=100) -> Seq Scan on users u (cost=0.00..500.00 rows=10000 width=100)
Fix:
-- Add index on foreign key CREATE INDEX posts_user_id_idx ON posts(user_id); -- Re-run query EXPLAIN ANALYZE SELECT u.name, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name; -- Execution Time: 50 ms (100x faster!)
Example 2: Missing Index Detection
Problem:
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending'; -- Slow: 500ms
EXPLAIN:
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
Output:
Seq Scan on orders (cost=0.00..10000.00 rows=100 width=100) (actual time=0.100..500.000 rows=10 loops=1) Filter: ((user_id = 123) AND (status = 'pending'::text)) Rows Removed by Filter: 99990
Fix:
-- Add compound index CREATE INDEX orders_user_id_status_idx ON orders(user_id, status); -- Re-run EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123 AND status = 'pending'; -- Execution Time: 2ms (250x faster!)
Output:
Index Scan using orders_user_id_status_idx on orders (cost=0.29..8.30 rows=10 width=100) (actual time=0.015..2.000 rows=10 loops=1) Index Cond: ((user_id = 123) AND (status = 'pending'::text))
Example 3: N+1 Query Elimination
Problem:
// 1 + N queries const users = await db.query('SELECT * FROM users LIMIT 10'); for (const user of users) { user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]); } // Time: 1000ms
Fix:
// 1 query const users = await db.query(` SELECT u.*, json_agg(json_build_object('id', p.id, 'title', p.title)) AS posts FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id LIMIT 10 `); // Time: 50ms (20x faster!)
Example 4: Pagination Improvement
Problem:
-- Page 1000 (offset 20000) SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 20000; -- Execution Time: 2000ms
Fix:
-- Keyset pagination SELECT * FROM posts WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345) ORDER BY created_at DESC, id DESC LIMIT 20; -- Execution Time: 5ms (400x faster!)
Tools
PostgreSQL
- pgAdmin: GUI for database management
- pgBadger: Log analyzer
- pg_stat_statements: Query statistics
- EXPLAIN Visualizer: https://explain.dalibo.com/
MySQL
- MySQL Workbench: GUI for database management
- mysqldumpslow: Slow query log analyzer
- Percona Toolkit: Advanced MySQL tools
General
- DataGrip: Multi-database IDE
- DBeaver: Open-source database tool
- Datadog: APM with database monitoring
- New Relic: APM with database monitoring
Optimization Checklist
Before Optimizing
- Identify slow queries (slow query log, APM)
- Run EXPLAIN ANALYZE
- Measure baseline performance
Optimization Steps
- Add indexes for WHERE, JOIN, ORDER BY columns
- Eliminate N+1 queries
- Use SELECT only needed columns
- Optimize pagination (keyset vs offset)
- Add connection pooling
- Cache frequent queries
- Use materialized views for aggregates
After Optimizing
- Run EXPLAIN ANALYZE again
- Measure new performance
- Document improvement
- Monitor for regressions
Summary
Quick Reference
Query Analysis:
- PostgreSQL:
,EXPLAIN ANALYZEpg_stat_statements - MySQL:
, slow query logEXPLAIN - MongoDB:
, profilerexplain()
Index Types:
- B-tree: Default, most common
- Hash: Equality only
- GIN/GiST: Full-text, arrays, JSON
- Partial: Filtered indexes
- Covering: Include columns
Common Optimizations:
- Add indexes for WHERE, JOIN, ORDER BY
- Eliminate N+1 queries (use JOINs or IN)
- Use keyset pagination (not LIMIT/OFFSET)
- Avoid SELECT *
- Use connection pooling
- Cache frequent queries
- Use materialized views for aggregates
Red Flags:
- Seq Scan on large tables
- N+1 queries
- LIMIT with large OFFSET
- Functions on indexed columns
- Missing indexes on foreign keys
Tools:
- pgAdmin, DataGrip, DBeaver
- pg_stat_statements, slow query log
- Datadog, New Relic