Claude-skill-registry Database Query Optimization

Comprehensive guide to optimizing database queries, understanding EXPLAIN plans, indexing strategies, and eliminating N+1 queries

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/db-query-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-database-query-optimization-736b30 && rm -rf "$T"

manifest: skills/data/db-query-optimization/SKILL.md

Database Query Optimization

Why Query Optimization Matters

The database is often the bottleneck in modern applications. A single slow query can bring down your entire system.

Key Impacts

Slow Queries = Slow Application: Database latency directly affects user experience
Database is Often the Bottleneck: CPU and memory are cheap; database I/O is expensive
Exponential Cost: Unoptimized queries get exponentially slower as data grows
Resource Exhaustion: Slow queries consume connections, lock tables, and block other queries
Cascading Failures: One slow query can cause timeouts across your entire system

The Golden Rule

"Optimize for reads, not writes" (unless you're write-heavy)

Most applications are read-heavy (90% reads, 10% writes). Optimize accordingly.

Query Analysis Tools

PostgreSQL

1. EXPLAIN

Shows the query execution plan without running the query.

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

Output:

Seq Scan on users  (cost=0.00..35.50 rows=1 width=100)
  Filter: (email = 'test@example.com'::text)

2. EXPLAIN ANALYZE

Runs the query and shows actual execution times.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

Output:

Index Scan using users_email_idx on users  (cost=0.29..8.30 rows=1 width=100) (actual time=0.015..0.016 rows=1 loops=1)
  Index Cond: (email = 'test@example.com'::text)
Planning Time: 0.123 ms
Execution Time: 0.045 ms

3. pg_stat_statements

Tracks execution statistics for all SQL statements.

Enable:

-- In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

-- Create extension
CREATE EXTENSION pg_stat_statements;

Query slow queries:

SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

MySQL

1. EXPLAIN

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

Output:

+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+

2. Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

Analyze with mysqldumpslow:

mysqldumpslow -s t -t 10 /var/log/mysql/slow.log

MongoDB

1. explain()

db.users.find({ email: 'test@example.com' }).explain('executionStats');

Output:

{
  "executionStats": {
    "executionTimeMillis": 150,
    "totalDocsExamined": 10000,
    "totalKeysExamined": 0,
    "executionStages": {
      "stage": "COLLSCAN"  // ❌ Bad: Collection scan
    }
  }
}

2. Profiler

// Enable profiler (level 2 = all operations)
db.setProfilingLevel(2);

// Query slow operations
db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(10);

Redis

SLOWLOG

# Get slow commands
SLOWLOG GET 10

# Configure threshold (microseconds)
CONFIG SET slowlog-log-slower-than 10000  # 10ms

Understanding EXPLAIN

PostgreSQL EXPLAIN Output

Scan Types (Best to Worst)

Index Scan ✅ (Best)

Index Scan using users_email_idx on users

Uses index to find rows
Fast for small result sets

Index Only Scan ✅ (Best)

Index Only Scan using users_email_name_idx on users

All data in index (no table lookup needed)
Fastest possible

Bitmap Index Scan ⚠️ (OK)

Bitmap Index Scan on users_created_at_idx

Combines multiple indexes
Good for medium result sets

Seq Scan ❌ (Bad for large tables)
```
Seq Scan on users
```
- Scans entire table
- OK for small tables (<1000 rows)
- Bad for large tables

Join Types

Nested Loop (Small datasets)
```
Nested Loop
```
- Good for small result sets
- O(n × m) complexity
Hash Join (Medium datasets)
```
Hash Join
```
- Builds hash table in memory
- Good for medium result sets
Merge Join (Large sorted datasets)
```
Merge Join
```
- Requires sorted inputs
- Good for large result sets

Cost Estimates

Seq Scan on users  (cost=0.00..35.50 rows=1 width=100)
                          ↑      ↑      ↑       ↑
                       startup  total  estimated  row
                       cost     cost   rows       width

Startup Cost: Cost to get first row
Total Cost: Cost to get all rows
Rows: Estimated number of rows
Width: Average row size in bytes

Lower cost = better

Index Fundamentals

Index Types

1. B-tree Indexes (Default, Most Common)

Use Cases:

Equality (
```
=
```
)
Range (
```
<
```
,
```
>
```
,
```
BETWEEN
```
)
Sorting (
```
ORDER BY
```
)
Pattern matching (
```
LIKE 'prefix%'
```
)

Create:

CREATE INDEX users_email_idx ON users(email);

2. Hash Indexes (Equality Only)

Use Cases:

Equality (
```
=
```
) only
Faster than B-tree for equality

Create:

CREATE INDEX users_email_hash_idx ON users USING HASH (email);

Limitations:

No range queries
No sorting
PostgreSQL only (MySQL doesn't support)

3. GiST/GIN (Full-Text, Arrays, JSON)

Use Cases:

Full-text search
Array contains (
```
@>
```
)
JSON queries (
```
@>
```
,
```
?
```
)

Create:

-- Full-text search
CREATE INDEX posts_content_gin_idx ON posts USING GIN (to_tsvector('english', content));

-- Array contains
CREATE INDEX tags_gin_idx ON posts USING GIN (tags);

-- JSON
CREATE INDEX metadata_gin_idx ON products USING GIN (metadata);

4. Partial Indexes (Filtered)

Use Cases:

Index only a subset of rows
Smaller index size
Faster queries on filtered data

Create:

-- Only index active users
CREATE INDEX users_active_email_idx ON users(email) WHERE status = 'active';

5. Covering Indexes (Include Columns)

Use Cases:

Avoid table lookups
Index-only scans

Create:

-- PostgreSQL
CREATE INDEX users_email_name_idx ON users(email) INCLUDE (name);

-- MySQL (composite index)
CREATE INDEX users_email_name_idx ON users(email, name);

When to Add Indexes

✅ Add Indexes For:

Frequent WHERE Conditions

-- Query
SELECT * FROM users WHERE email = 'test@example.com';

-- Index
CREATE INDEX users_email_idx ON users(email);

JOIN Columns

-- Query
SELECT * FROM orders o JOIN users u ON o.user_id = u.id;

-- Index
CREATE INDEX orders_user_id_idx ON orders(user_id);

ORDER BY Columns

-- Query
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;

-- Index
CREATE INDEX posts_created_at_idx ON posts(created_at DESC);

Foreign Keys

-- Always index foreign keys!
CREATE INDEX orders_user_id_idx ON orders(user_id);

❌ Don't Add Indexes For:

Write-Heavy Tables
- Indexes slow down INSERT, UPDATE, DELETE
- Trade-off: Read speed vs write speed

Low-Cardinality Columns

-- BAD: Only 2 unique values
CREATE INDEX users_is_active_idx ON users(is_active);

Exception: Partial indexes on rare values

-- GOOD: Only index inactive users (rare)
CREATE INDEX users_inactive_idx ON users(id) WHERE is_active = false;

Small Tables (<1000 rows)
- Seq scan is faster than index scan
- Index overhead not worth it

Columns with Functions

-- BAD: Index not used
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- GOOD: Functional index
CREATE INDEX users_email_lower_idx ON users(LOWER(email));

Index Maintenance

1. Find Unused Indexes

PostgreSQL:

SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;

Action: Drop unused indexes

DROP INDEX users_unused_idx;

2. Find Duplicate Indexes

PostgreSQL:

SELECT 
  pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS size,
  (array_agg(idx))[1] AS idx1,
  (array_agg(idx))[2] AS idx2,
  (array_agg(idx))[3] AS idx3,
  (array_agg(idx))[4] AS idx4
FROM (
  SELECT 
    indexrelid::regclass AS idx,
    (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||
     COALESCE(indexprs::text,'')||E'\n' || COALESCE(indpred::text,'')) AS key
  FROM pg_index
) sub
GROUP BY key
HAVING COUNT(*) > 1
ORDER BY SUM(pg_relation_size(idx)) DESC;

3. Index Bloat (REINDEX)

Check bloat:

SELECT 
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Rebuild bloated indexes:

REINDEX INDEX CONCURRENTLY users_email_idx;

Query Optimization Techniques

1. SELECT Only Needed Columns (Not SELECT *)

Bad:

SELECT * FROM users WHERE id = 1;

Good:

SELECT id, name, email FROM users WHERE id = 1;

Why:

Less data transferred
Smaller result set
Can use covering indexes

2. Proper WHERE Filtering

Bad:

-- Fetches all rows, filters in application
SELECT * FROM users;
-- app.filter(user => user.status === 'active')

Good:

-- Filters in database
SELECT * FROM users WHERE status = 'active';

3. Avoid Functions on Indexed Columns

Bad:

-- Index not used
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

Good:

-- Use functional index
CREATE INDEX users_email_lower_idx ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Or store lowercase email
SELECT * FROM users WHERE email = 'test@example.com';

4. Use EXISTS Instead of COUNT

Bad:

-- Counts all rows
SELECT COUNT(*) FROM orders WHERE user_id = 1;
-- if (count > 0) { ... }

Good:

-- Stops at first match
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);

5. Batch Queries (Avoid N+1)

Bad:

// N+1 queries
const users = await db.users.findMany();
for (const user of users) {
  user.posts = await db.posts.findMany({ where: { userId: user.id } });
}

Good:

// Single query with join
const users = await db.users.findMany({
  include: { posts: true }
});

6. Connection Pooling

Bad:

// New connection per query
const client = new Client();
await client.connect();
await client.query('SELECT * FROM users');
await client.end();

Good:

// Connection pool
const pool = new Pool({ max: 20 });
await pool.query('SELECT * FROM users');

N+1 Query Problem

The Problem

Example:

// 1 query to get users
const users = await db.query('SELECT * FROM users LIMIT 10');

// N queries to get posts for each user
for (const user of users) {
  const posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
  user.posts = posts;
}
// Total: 1 + 10 = 11 queries

Impact:

10 users = 11 queries
100 users = 101 queries
1000 users = 1001 queries

Solutions

1. JOIN

SQL:

SELECT 
  u.*,
  p.id AS post_id,
  p.title AS post_title,
  p.content AS post_content
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.id IN (1, 2, 3, 4, 5);

Prisma:

const users = await prisma.user.findMany({
  include: { posts: true }
});

SQLAlchemy:

users = db.query(User).options(joinedload(User.posts)).all()

2. IN Clause

SQL:

-- 1. Get users
SELECT * FROM users LIMIT 10;

-- 2. Get all posts in one query
SELECT * FROM posts WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

JavaScript:

const users = await db.query('SELECT * FROM users LIMIT 10');
const userIds = users.map(u => u.id);
const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]);

// Group posts by user_id
const postsByUserId = posts.reduce((acc, post) => {
  if (!acc[post.user_id]) acc[post.user_id] = [];
  acc[post.user_id].push(post);
  return acc;
}, {});

// Attach posts to users
users.forEach(user => {
  user.posts = postsByUserId[user.id] || [];
});

3. DataLoader (GraphQL)

Usage:

const DataLoader = require('dataloader');

const postLoader = new DataLoader(async (userIds) => {
  const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]);
  
  // Group by user_id
  const postsByUserId = userIds.map(id => 
    posts.filter(post => post.user_id === id)
  );
  
  return postsByUserId;
});

// Usage
const posts = await postLoader.load(userId);

Pagination Optimization

1. LIMIT/OFFSET (Simple but Slow)

Query:

SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 1000;

Problem:

Database still scans first 1000 rows
Slow for large offsets
Inconsistent results if data changes

When to Use:

Small datasets
Low page numbers (< 100)

2. Keyset Pagination (Cursor-Based, Faster)

Query:

-- First page
SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 20;

-- Next page (using last item's created_at and id)
SELECT * FROM posts 
WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345)
ORDER BY created_at DESC, id DESC 
LIMIT 20;

Pros:

Constant time (doesn't scan skipped rows)
Consistent results
Scales to millions of rows

Cons:

Can't jump to arbitrary page
Requires unique, sortable column

Implementation:

async function getPosts(cursor = null, limit = 20) {
  let query = 'SELECT * FROM posts';
  let params = [];
  
  if (cursor) {
    const [created_at, id] = cursor.split('_');
    query += ' WHERE (created_at, id) < (?, ?)';
    params = [created_at, id];
  }
  
  query += ' ORDER BY created_at DESC, id DESC LIMIT ?';
  params.push(limit);
  
  const posts = await db.query(query, params);
  
  const nextCursor = posts.length === limit
    ? `${posts[posts.length - 1].created_at}_${posts[posts.length - 1].id}`
    : null;
  
  return { posts, nextCursor };
}

3. Avoid COUNT(*) on Every Page

Bad:

-- Runs on every page load
SELECT COUNT(*) FROM posts;  -- Slow!
SELECT * FROM posts LIMIT 20 OFFSET 0;

Good:

-- Cache the count or estimate it
SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'posts';

-- Or don't show total count
SELECT * FROM posts LIMIT 20 OFFSET 0;

Aggregate Optimization

1. Materialized Views

Problem:

-- Slow: Calculates on every request
SELECT 
  user_id,
  COUNT(*) AS post_count,
  AVG(views) AS avg_views
FROM posts
GROUP BY user_id;

Solution:

-- Create materialized view
CREATE MATERIALIZED VIEW user_post_stats AS
SELECT 
  user_id,
  COUNT(*) AS post_count,
  AVG(views) AS avg_views
FROM posts
GROUP BY user_id;

-- Create index on materialized view
CREATE INDEX user_post_stats_user_id_idx ON user_post_stats(user_id);

-- Query materialized view (fast!)
SELECT * FROM user_post_stats WHERE user_id = 1;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_post_stats;

2. Summary Tables

Problem:

-- Slow: Aggregates millions of rows
SELECT 
  DATE(created_at) AS date,
  COUNT(*) AS orders,
  SUM(total) AS revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at);

Solution:

-- Create summary table
CREATE TABLE daily_order_stats (
  date DATE PRIMARY KEY,
  orders INTEGER,
  revenue DECIMAL(10, 2)
);

-- Populate with trigger or cron job
INSERT INTO daily_order_stats (date, orders, revenue)
SELECT 
  DATE(created_at),
  COUNT(*),
  SUM(total)
FROM orders
WHERE DATE(created_at) = CURRENT_DATE
GROUP BY DATE(created_at)
ON CONFLICT (date) DO UPDATE SET
  orders = EXCLUDED.orders,
  revenue = EXCLUDED.revenue;

-- Query summary table (fast!)
SELECT * FROM daily_order_stats WHERE date >= '2024-01-01';

3. Pre-Computed Aggregates

Problem:

-- Slow: Counts on every request
SELECT COUNT(*) FROM posts WHERE user_id = 1;

Solution:

-- Add post_count column to users table
ALTER TABLE users ADD COLUMN post_count INTEGER DEFAULT 0;

-- Update with trigger
CREATE OR REPLACE FUNCTION update_user_post_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE users SET post_count = post_count + 1 WHERE id = NEW.user_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE users SET post_count = post_count - 1 WHERE id = OLD.user_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_user_post_count_trigger
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_user_post_count();

-- Query (instant!)
SELECT post_count FROM users WHERE id = 1;

PostgreSQL Specific

1. Parallel Queries

Enable:

SET max_parallel_workers_per_gather = 4;

Check if query uses parallelism:

EXPLAIN SELECT COUNT(*) FROM large_table;
-- Look for "Parallel Seq Scan"

2. Partitioning

Range Partitioning:

-- Create partitioned table
CREATE TABLE orders (
  id SERIAL,
  user_id INTEGER,
  created_at TIMESTAMP,
  total DECIMAL(10, 2)
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Queries automatically use correct partition
SELECT * FROM orders WHERE created_at >= '2024-01-15';

Benefits:

Faster queries (scans only relevant partitions)
Easier maintenance (drop old partitions)
Better vacuum performance

3. Extensions

pg_trgm (Fuzzy Search):

CREATE EXTENSION pg_trgm;

-- Create GIN index for fuzzy search
CREATE INDEX users_name_trgm_idx ON users USING GIN (name gin_trgm_ops);

-- Fuzzy search
SELECT * FROM users WHERE name % 'jhon';  -- Finds "John"

MongoDB Specific

1. Compound Indexes

Query:

db.users.find({ status: 'active', created_at: { $gte: ISODate('2024-01-01') } });

Index:

db.users.createIndex({ status: 1, created_at: -1 });

Index Prefix Rule:

Index on

{a: 1, b: 1, c: 1}

supports queries on:

```
{a: 1}
```
```
{a: 1, b: 1}
```
```
{a: 1, b: 1, c: 1}
```

But NOT
```
{b: 1}
```
or
```
{c: 1}
```

2. Covered Queries

Query:

db.users.find(
  { email: 'test@example.com' },
  { _id: 0, name: 1, email: 1 }  // Projection
);

Index:

db.users.createIndex({ email: 1, name: 1 });

Check if covered:

db.users.find(
  { email: 'test@example.com' },
  { _id: 0, name: 1, email: 1 }
).explain('executionStats');

// Look for "totalDocsExamined": 0 (covered query!)

3. Aggregation Pipeline Optimization

Bad:

db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $sort: { created_at: -1 } },
  { $limit: 10 }
]);

Good:

// Add index
db.orders.createIndex({ status: 1, created_at: -1 });

// Same query, now uses index
db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $sort: { created_at: -1 } },
  { $limit: 10 }
]);

Tips:

Put
```
$match
```
first (filter early)
Put
```
$limit
```
early (reduce data)
Use indexes for
```
$match
```
and
```
$sort
```

Query Caching

1. Application-Level Cache (Redis)

Pattern:

async function getUser(id) {
  // Check cache
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);
  
  // Query database
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  
  // Cache result
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
  
  return user;
}

2. Database Query Cache (MySQL, Deprecated in 8.0)

MySQL 5.7:

SET GLOBAL query_cache_type = 1;
SET GLOBAL query_cache_size = 268435456;  -- 256MB

Note: Removed in MySQL 8.0 (use application-level caching instead)

3. Materialized Views (PostgreSQL)

See "Aggregate Optimization" section above

Monitoring Query Performance

1. Slow Query Log

PostgreSQL:

-- In postgresql.conf
log_min_duration_statement = 1000  -- Log queries > 1s

-- Or set at runtime
ALTER DATABASE mydb SET log_min_duration_statement = 1000;

MySQL:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

2. pg_stat_statements

Enable:

CREATE EXTENSION pg_stat_statements;

Find slow queries:

SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time,
  stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

3. APM Tools

Datadog:

Automatic query tracking
Slow query detection
EXPLAIN plan analysis

New Relic:

Database monitoring
Slow transaction traces
Query analysis

Real Optimization Examples

Example 1: Slow JOIN Query

Problem:

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;

-- Execution Time: 5000 ms

EXPLAIN Output:

Hash Join  (cost=1000.00..5000.00 rows=10000 width=100) (actual time=100.000..5000.000 rows=10000 loops=1)
  Hash Cond: (p.user_id = u.id)
  ->  Seq Scan on posts p  (cost=0.00..3000.00 rows=100000 width=4)
  ->  Hash  (cost=500.00..500.00 rows=10000 width=100)
        ->  Seq Scan on users u  (cost=0.00..500.00 rows=10000 width=100)

Fix:

-- Add index on foreign key
CREATE INDEX posts_user_id_idx ON posts(user_id);

-- Re-run query
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;

-- Execution Time: 50 ms (100x faster!)

Example 2: Missing Index Detection

Problem:

SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Slow: 500ms

EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

Output:

Seq Scan on orders  (cost=0.00..10000.00 rows=100 width=100) (actual time=0.100..500.000 rows=10 loops=1)
  Filter: ((user_id = 123) AND (status = 'pending'::text))
  Rows Removed by Filter: 99990

Fix:

-- Add compound index
CREATE INDEX orders_user_id_status_idx ON orders(user_id, status);

-- Re-run
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Execution Time: 2ms (250x faster!)

Output:

Index Scan using orders_user_id_status_idx on orders  (cost=0.29..8.30 rows=10 width=100) (actual time=0.015..2.000 rows=10 loops=1)
  Index Cond: ((user_id = 123) AND (status = 'pending'::text))

Example 3: N+1 Query Elimination

Problem:

// 1 + N queries
const users = await db.query('SELECT * FROM users LIMIT 10');
for (const user of users) {
  user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}
// Time: 1000ms

Fix:

// 1 query
const users = await db.query(`
  SELECT 
    u.*,
    json_agg(json_build_object('id', p.id, 'title', p.title)) AS posts
  FROM users u
  LEFT JOIN posts p ON p.user_id = u.id
  GROUP BY u.id
  LIMIT 10
`);
// Time: 50ms (20x faster!)

Example 4: Pagination Improvement

Problem:

-- Page 1000 (offset 20000)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 20000;

-- Execution Time: 2000ms

Fix:

-- Keyset pagination
SELECT * FROM posts 
WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345)
ORDER BY created_at DESC, id DESC 
LIMIT 20;

-- Execution Time: 5ms (400x faster!)

Tools

PostgreSQL

pgAdmin: GUI for database management
pgBadger: Log analyzer
pg_stat_statements: Query statistics
EXPLAIN Visualizer: https://explain.dalibo.com/

MySQL

MySQL Workbench: GUI for database management
mysqldumpslow: Slow query log analyzer
Percona Toolkit: Advanced MySQL tools

General

DataGrip: Multi-database IDE
DBeaver: Open-source database tool
Datadog: APM with database monitoring
New Relic: APM with database monitoring

Optimization Checklist

Before Optimizing

Identify slow queries (slow query log, APM)
Run EXPLAIN ANALYZE
Measure baseline performance

Optimization Steps

Add indexes for WHERE, JOIN, ORDER BY columns
Eliminate N+1 queries
Use SELECT only needed columns
Optimize pagination (keyset vs offset)
Add connection pooling
Cache frequent queries
Use materialized views for aggregates

After Optimizing

Run EXPLAIN ANALYZE again
Measure new performance
Document improvement
Monitor for regressions

Summary

Quick Reference

Query Analysis:

PostgreSQL:
```
EXPLAIN ANALYZE
```
,
```
pg_stat_statements
```
MySQL:
```
EXPLAIN
```
, slow query log
MongoDB:
```
explain()
```
, profiler

Index Types:

B-tree: Default, most common
Hash: Equality only
GIN/GiST: Full-text, arrays, JSON
Partial: Filtered indexes
Covering: Include columns

Common Optimizations:

Add indexes for WHERE, JOIN, ORDER BY
Eliminate N+1 queries (use JOINs or IN)
Use keyset pagination (not LIMIT/OFFSET)
Avoid SELECT *
Use connection pooling
Cache frequent queries
Use materialized views for aggregates

Red Flags:

Seq Scan on large tables
N+1 queries
LIMIT with large OFFSET
Functions on indexed columns
Missing indexes on foreign keys

Tools:

pgAdmin, DataGrip, DBeaver
pg_stat_statements, slow query log
Datadog, New Relic