Claude-skill-registry Database Query Optimization

Comprehensive guide to optimizing database queries, understanding EXPLAIN plans, indexing strategies, and eliminating N+1 queries

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/db-query-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-database-query-optimization-736b30 && rm -rf "$T"
manifest: skills/data/db-query-optimization/SKILL.md
source content

Database Query Optimization

Why Query Optimization Matters

The database is often the bottleneck in modern applications. A single slow query can bring down your entire system.

Key Impacts

  • Slow Queries = Slow Application: Database latency directly affects user experience
  • Database is Often the Bottleneck: CPU and memory are cheap; database I/O is expensive
  • Exponential Cost: Unoptimized queries get exponentially slower as data grows
  • Resource Exhaustion: Slow queries consume connections, lock tables, and block other queries
  • Cascading Failures: One slow query can cause timeouts across your entire system

The Golden Rule

"Optimize for reads, not writes" (unless you're write-heavy)

Most applications are read-heavy (90% reads, 10% writes). Optimize accordingly.


Query Analysis Tools

PostgreSQL

1. EXPLAIN

Shows the query execution plan without running the query.

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

Output:

Seq Scan on users  (cost=0.00..35.50 rows=1 width=100)
  Filter: (email = 'test@example.com'::text)

2. EXPLAIN ANALYZE

Runs the query and shows actual execution times.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

Output:

Index Scan using users_email_idx on users  (cost=0.29..8.30 rows=1 width=100) (actual time=0.015..0.016 rows=1 loops=1)
  Index Cond: (email = 'test@example.com'::text)
Planning Time: 0.123 ms
Execution Time: 0.045 ms

3. pg_stat_statements

Tracks execution statistics for all SQL statements.

Enable:

-- In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

-- Create extension
CREATE EXTENSION pg_stat_statements;

Query slow queries:

SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

MySQL

1. EXPLAIN

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

Output:

+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+

2. Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

Analyze with mysqldumpslow:

mysqldumpslow -s t -t 10 /var/log/mysql/slow.log

MongoDB

1. explain()

db.users.find({ email: 'test@example.com' }).explain('executionStats');

Output:

{
  "executionStats": {
    "executionTimeMillis": 150,
    "totalDocsExamined": 10000,
    "totalKeysExamined": 0,
    "executionStages": {
      "stage": "COLLSCAN"  // ❌ Bad: Collection scan
    }
  }
}

2. Profiler

// Enable profiler (level 2 = all operations)
db.setProfilingLevel(2);

// Query slow operations
db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(10);

Redis

SLOWLOG

# Get slow commands
SLOWLOG GET 10

# Configure threshold (microseconds)
CONFIG SET slowlog-log-slower-than 10000  # 10ms

Understanding EXPLAIN

PostgreSQL EXPLAIN Output

Scan Types (Best to Worst)

  1. Index Scan ✅ (Best)

    Index Scan using users_email_idx on users
    
    • Uses index to find rows
    • Fast for small result sets
  2. Index Only Scan ✅ (Best)

    Index Only Scan using users_email_name_idx on users
    
    • All data in index (no table lookup needed)
    • Fastest possible
  3. Bitmap Index Scan ⚠️ (OK)

    Bitmap Index Scan on users_created_at_idx
    
    • Combines multiple indexes
    • Good for medium result sets
  4. Seq Scan ❌ (Bad for large tables)

    Seq Scan on users
    
    • Scans entire table
    • OK for small tables (<1000 rows)
    • Bad for large tables

Join Types

  1. Nested Loop (Small datasets)

    Nested Loop
    
    • Good for small result sets
    • O(n × m) complexity
  2. Hash Join (Medium datasets)

    Hash Join
    
    • Builds hash table in memory
    • Good for medium result sets
  3. Merge Join (Large sorted datasets)

    Merge Join
    
    • Requires sorted inputs
    • Good for large result sets

Cost Estimates

Seq Scan on users  (cost=0.00..35.50 rows=1 width=100)
                          ↑      ↑      ↑       ↑
                       startup  total  estimated  row
                       cost     cost   rows       width
  • Startup Cost: Cost to get first row
  • Total Cost: Cost to get all rows
  • Rows: Estimated number of rows
  • Width: Average row size in bytes

Lower cost = better


Index Fundamentals

Index Types

1. B-tree Indexes (Default, Most Common)

Use Cases:

  • Equality (
    =
    )
  • Range (
    <
    ,
    >
    ,
    BETWEEN
    )
  • Sorting (
    ORDER BY
    )
  • Pattern matching (
    LIKE 'prefix%'
    )

Create:

CREATE INDEX users_email_idx ON users(email);

2. Hash Indexes (Equality Only)

Use Cases:

  • Equality (
    =
    ) only
  • Faster than B-tree for equality

Create:

CREATE INDEX users_email_hash_idx ON users USING HASH (email);

Limitations:

  • No range queries
  • No sorting
  • PostgreSQL only (MySQL doesn't support)

3. GiST/GIN (Full-Text, Arrays, JSON)

Use Cases:

  • Full-text search
  • Array contains (
    @>
    )
  • JSON queries (
    @>
    ,
    ?
    )

Create:

-- Full-text search
CREATE INDEX posts_content_gin_idx ON posts USING GIN (to_tsvector('english', content));

-- Array contains
CREATE INDEX tags_gin_idx ON posts USING GIN (tags);

-- JSON
CREATE INDEX metadata_gin_idx ON products USING GIN (metadata);

4. Partial Indexes (Filtered)

Use Cases:

  • Index only a subset of rows
  • Smaller index size
  • Faster queries on filtered data

Create:

-- Only index active users
CREATE INDEX users_active_email_idx ON users(email) WHERE status = 'active';

5. Covering Indexes (Include Columns)

Use Cases:

  • Avoid table lookups
  • Index-only scans

Create:

-- PostgreSQL
CREATE INDEX users_email_name_idx ON users(email) INCLUDE (name);

-- MySQL (composite index)
CREATE INDEX users_email_name_idx ON users(email, name);

When to Add Indexes

✅ Add Indexes For:

  1. Frequent WHERE Conditions

    -- Query
    SELECT * FROM users WHERE email = 'test@example.com';
    
    -- Index
    CREATE INDEX users_email_idx ON users(email);
    
  2. JOIN Columns

    -- Query
    SELECT * FROM orders o JOIN users u ON o.user_id = u.id;
    
    -- Index
    CREATE INDEX orders_user_id_idx ON orders(user_id);
    
  3. ORDER BY Columns

    -- Query
    SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;
    
    -- Index
    CREATE INDEX posts_created_at_idx ON posts(created_at DESC);
    
  4. Foreign Keys

    -- Always index foreign keys!
    CREATE INDEX orders_user_id_idx ON orders(user_id);
    

❌ Don't Add Indexes For:

  1. Write-Heavy Tables

    • Indexes slow down INSERT, UPDATE, DELETE
    • Trade-off: Read speed vs write speed
  2. Low-Cardinality Columns

    -- BAD: Only 2 unique values
    CREATE INDEX users_is_active_idx ON users(is_active);
    
    • Exception: Partial indexes on rare values
    -- GOOD: Only index inactive users (rare)
    CREATE INDEX users_inactive_idx ON users(id) WHERE is_active = false;
    
  3. Small Tables (<1000 rows)

    • Seq scan is faster than index scan
    • Index overhead not worth it
  4. Columns with Functions

    -- BAD: Index not used
    SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
    
    -- GOOD: Functional index
    CREATE INDEX users_email_lower_idx ON users(LOWER(email));
    

Index Maintenance

1. Find Unused Indexes

PostgreSQL:

SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;

Action: Drop unused indexes

DROP INDEX users_unused_idx;

2. Find Duplicate Indexes

PostgreSQL:

SELECT 
  pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS size,
  (array_agg(idx))[1] AS idx1,
  (array_agg(idx))[2] AS idx2,
  (array_agg(idx))[3] AS idx3,
  (array_agg(idx))[4] AS idx4
FROM (
  SELECT 
    indexrelid::regclass AS idx,
    (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||
     COALESCE(indexprs::text,'')||E'\n' || COALESCE(indpred::text,'')) AS key
  FROM pg_index
) sub
GROUP BY key
HAVING COUNT(*) > 1
ORDER BY SUM(pg_relation_size(idx)) DESC;

3. Index Bloat (REINDEX)

Check bloat:

SELECT 
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Rebuild bloated indexes:

REINDEX INDEX CONCURRENTLY users_email_idx;

Query Optimization Techniques

1. SELECT Only Needed Columns (Not SELECT *)

Bad:

SELECT * FROM users WHERE id = 1;

Good:

SELECT id, name, email FROM users WHERE id = 1;

Why:

  • Less data transferred
  • Smaller result set
  • Can use covering indexes

2. Proper WHERE Filtering

Bad:

-- Fetches all rows, filters in application
SELECT * FROM users;
-- app.filter(user => user.status === 'active')

Good:

-- Filters in database
SELECT * FROM users WHERE status = 'active';

3. Avoid Functions on Indexed Columns

Bad:

-- Index not used
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

Good:

-- Use functional index
CREATE INDEX users_email_lower_idx ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Or store lowercase email
SELECT * FROM users WHERE email = 'test@example.com';

4. Use EXISTS Instead of COUNT

Bad:

-- Counts all rows
SELECT COUNT(*) FROM orders WHERE user_id = 1;
-- if (count > 0) { ... }

Good:

-- Stops at first match
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);

5. Batch Queries (Avoid N+1)

Bad:

// N+1 queries
const users = await db.users.findMany();
for (const user of users) {
  user.posts = await db.posts.findMany({ where: { userId: user.id } });
}

Good:

// Single query with join
const users = await db.users.findMany({
  include: { posts: true }
});

6. Connection Pooling

Bad:

// New connection per query
const client = new Client();
await client.connect();
await client.query('SELECT * FROM users');
await client.end();

Good:

// Connection pool
const pool = new Pool({ max: 20 });
await pool.query('SELECT * FROM users');

N+1 Query Problem

The Problem

Example:

// 1 query to get users
const users = await db.query('SELECT * FROM users LIMIT 10');

// N queries to get posts for each user
for (const user of users) {
  const posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
  user.posts = posts;
}
// Total: 1 + 10 = 11 queries

Impact:

  • 10 users = 11 queries
  • 100 users = 101 queries
  • 1000 users = 1001 queries

Solutions

1. JOIN

SQL:

SELECT 
  u.*,
  p.id AS post_id,
  p.title AS post_title,
  p.content AS post_content
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.id IN (1, 2, 3, 4, 5);

Prisma:

const users = await prisma.user.findMany({
  include: { posts: true }
});

SQLAlchemy:

users = db.query(User).options(joinedload(User.posts)).all()

2. IN Clause

SQL:

-- 1. Get users
SELECT * FROM users LIMIT 10;

-- 2. Get all posts in one query
SELECT * FROM posts WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

JavaScript:

const users = await db.query('SELECT * FROM users LIMIT 10');
const userIds = users.map(u => u.id);
const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]);

// Group posts by user_id
const postsByUserId = posts.reduce((acc, post) => {
  if (!acc[post.user_id]) acc[post.user_id] = [];
  acc[post.user_id].push(post);
  return acc;
}, {});

// Attach posts to users
users.forEach(user => {
  user.posts = postsByUserId[user.id] || [];
});

3. DataLoader (GraphQL)

Usage:

const DataLoader = require('dataloader');

const postLoader = new DataLoader(async (userIds) => {
  const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]);
  
  // Group by user_id
  const postsByUserId = userIds.map(id => 
    posts.filter(post => post.user_id === id)
  );
  
  return postsByUserId;
});

// Usage
const posts = await postLoader.load(userId);

Pagination Optimization

1. LIMIT/OFFSET (Simple but Slow)

Query:

SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 1000;

Problem:

  • Database still scans first 1000 rows
  • Slow for large offsets
  • Inconsistent results if data changes

When to Use:

  • Small datasets
  • Low page numbers (< 100)

2. Keyset Pagination (Cursor-Based, Faster)

Query:

-- First page
SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 20;

-- Next page (using last item's created_at and id)
SELECT * FROM posts 
WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345)
ORDER BY created_at DESC, id DESC 
LIMIT 20;

Pros:

  • Constant time (doesn't scan skipped rows)
  • Consistent results
  • Scales to millions of rows

Cons:

  • Can't jump to arbitrary page
  • Requires unique, sortable column

Implementation:

async function getPosts(cursor = null, limit = 20) {
  let query = 'SELECT * FROM posts';
  let params = [];
  
  if (cursor) {
    const [created_at, id] = cursor.split('_');
    query += ' WHERE (created_at, id) < (?, ?)';
    params = [created_at, id];
  }
  
  query += ' ORDER BY created_at DESC, id DESC LIMIT ?';
  params.push(limit);
  
  const posts = await db.query(query, params);
  
  const nextCursor = posts.length === limit
    ? `${posts[posts.length - 1].created_at}_${posts[posts.length - 1].id}`
    : null;
  
  return { posts, nextCursor };
}

3. Avoid COUNT(*) on Every Page

Bad:

-- Runs on every page load
SELECT COUNT(*) FROM posts;  -- Slow!
SELECT * FROM posts LIMIT 20 OFFSET 0;

Good:

-- Cache the count or estimate it
SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'posts';

-- Or don't show total count
SELECT * FROM posts LIMIT 20 OFFSET 0;

Aggregate Optimization

1. Materialized Views

Problem:

-- Slow: Calculates on every request
SELECT 
  user_id,
  COUNT(*) AS post_count,
  AVG(views) AS avg_views
FROM posts
GROUP BY user_id;

Solution:

-- Create materialized view
CREATE MATERIALIZED VIEW user_post_stats AS
SELECT 
  user_id,
  COUNT(*) AS post_count,
  AVG(views) AS avg_views
FROM posts
GROUP BY user_id;

-- Create index on materialized view
CREATE INDEX user_post_stats_user_id_idx ON user_post_stats(user_id);

-- Query materialized view (fast!)
SELECT * FROM user_post_stats WHERE user_id = 1;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_post_stats;

2. Summary Tables

Problem:

-- Slow: Aggregates millions of rows
SELECT 
  DATE(created_at) AS date,
  COUNT(*) AS orders,
  SUM(total) AS revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at);

Solution:

-- Create summary table
CREATE TABLE daily_order_stats (
  date DATE PRIMARY KEY,
  orders INTEGER,
  revenue DECIMAL(10, 2)
);

-- Populate with trigger or cron job
INSERT INTO daily_order_stats (date, orders, revenue)
SELECT 
  DATE(created_at),
  COUNT(*),
  SUM(total)
FROM orders
WHERE DATE(created_at) = CURRENT_DATE
GROUP BY DATE(created_at)
ON CONFLICT (date) DO UPDATE SET
  orders = EXCLUDED.orders,
  revenue = EXCLUDED.revenue;

-- Query summary table (fast!)
SELECT * FROM daily_order_stats WHERE date >= '2024-01-01';

3. Pre-Computed Aggregates

Problem:

-- Slow: Counts on every request
SELECT COUNT(*) FROM posts WHERE user_id = 1;

Solution:

-- Add post_count column to users table
ALTER TABLE users ADD COLUMN post_count INTEGER DEFAULT 0;

-- Update with trigger
CREATE OR REPLACE FUNCTION update_user_post_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE users SET post_count = post_count + 1 WHERE id = NEW.user_id;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE users SET post_count = post_count - 1 WHERE id = OLD.user_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_user_post_count_trigger
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_user_post_count();

-- Query (instant!)
SELECT post_count FROM users WHERE id = 1;

PostgreSQL Specific

1. Parallel Queries

Enable:

SET max_parallel_workers_per_gather = 4;

Check if query uses parallelism:

EXPLAIN SELECT COUNT(*) FROM large_table;
-- Look for "Parallel Seq Scan"

2. Partitioning

Range Partitioning:

-- Create partitioned table
CREATE TABLE orders (
  id SERIAL,
  user_id INTEGER,
  created_at TIMESTAMP,
  total DECIMAL(10, 2)
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Queries automatically use correct partition
SELECT * FROM orders WHERE created_at >= '2024-01-15';

Benefits:

  • Faster queries (scans only relevant partitions)
  • Easier maintenance (drop old partitions)
  • Better vacuum performance

3. Extensions

pg_trgm (Fuzzy Search):

CREATE EXTENSION pg_trgm;

-- Create GIN index for fuzzy search
CREATE INDEX users_name_trgm_idx ON users USING GIN (name gin_trgm_ops);

-- Fuzzy search
SELECT * FROM users WHERE name % 'jhon';  -- Finds "John"

MongoDB Specific

1. Compound Indexes

Query:

db.users.find({ status: 'active', created_at: { $gte: ISODate('2024-01-01') } });

Index:

db.users.createIndex({ status: 1, created_at: -1 });

Index Prefix Rule:

  • Index on
    {a: 1, b: 1, c: 1}
    supports queries on:
    • {a: 1}
    • {a: 1, b: 1}
    • {a: 1, b: 1, c: 1}
  • But NOT
    {b: 1}
    or
    {c: 1}

2. Covered Queries

Query:

db.users.find(
  { email: 'test@example.com' },
  { _id: 0, name: 1, email: 1 }  // Projection
);

Index:

db.users.createIndex({ email: 1, name: 1 });

Check if covered:

db.users.find(
  { email: 'test@example.com' },
  { _id: 0, name: 1, email: 1 }
).explain('executionStats');

// Look for "totalDocsExamined": 0 (covered query!)

3. Aggregation Pipeline Optimization

Bad:

db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $sort: { created_at: -1 } },
  { $limit: 10 }
]);

Good:

// Add index
db.orders.createIndex({ status: 1, created_at: -1 });

// Same query, now uses index
db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $sort: { created_at: -1 } },
  { $limit: 10 }
]);

Tips:

  • Put
    $match
    first (filter early)
  • Put
    $limit
    early (reduce data)
  • Use indexes for
    $match
    and
    $sort

Query Caching

1. Application-Level Cache (Redis)

Pattern:

async function getUser(id) {
  // Check cache
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);
  
  // Query database
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  
  // Cache result
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
  
  return user;
}

2. Database Query Cache (MySQL, Deprecated in 8.0)

MySQL 5.7:

SET GLOBAL query_cache_type = 1;
SET GLOBAL query_cache_size = 268435456;  -- 256MB

Note: Removed in MySQL 8.0 (use application-level caching instead)

3. Materialized Views (PostgreSQL)

See "Aggregate Optimization" section above


Monitoring Query Performance

1. Slow Query Log

PostgreSQL:

-- In postgresql.conf
log_min_duration_statement = 1000  -- Log queries > 1s

-- Or set at runtime
ALTER DATABASE mydb SET log_min_duration_statement = 1000;

MySQL:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

2. pg_stat_statements

Enable:

CREATE EXTENSION pg_stat_statements;

Find slow queries:

SELECT 
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time,
  stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

3. APM Tools

Datadog:

  • Automatic query tracking
  • Slow query detection
  • EXPLAIN plan analysis

New Relic:

  • Database monitoring
  • Slow transaction traces
  • Query analysis

Real Optimization Examples

Example 1: Slow JOIN Query

Problem:

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;

-- Execution Time: 5000 ms

EXPLAIN Output:

Hash Join  (cost=1000.00..5000.00 rows=10000 width=100) (actual time=100.000..5000.000 rows=10000 loops=1)
  Hash Cond: (p.user_id = u.id)
  ->  Seq Scan on posts p  (cost=0.00..3000.00 rows=100000 width=4)
  ->  Hash  (cost=500.00..500.00 rows=10000 width=100)
        ->  Seq Scan on users u  (cost=0.00..500.00 rows=10000 width=100)

Fix:

-- Add index on foreign key
CREATE INDEX posts_user_id_idx ON posts(user_id);

-- Re-run query
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;

-- Execution Time: 50 ms (100x faster!)

Example 2: Missing Index Detection

Problem:

SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Slow: 500ms

EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

Output:

Seq Scan on orders  (cost=0.00..10000.00 rows=100 width=100) (actual time=0.100..500.000 rows=10 loops=1)
  Filter: ((user_id = 123) AND (status = 'pending'::text))
  Rows Removed by Filter: 99990

Fix:

-- Add compound index
CREATE INDEX orders_user_id_status_idx ON orders(user_id, status);

-- Re-run
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Execution Time: 2ms (250x faster!)

Output:

Index Scan using orders_user_id_status_idx on orders  (cost=0.29..8.30 rows=10 width=100) (actual time=0.015..2.000 rows=10 loops=1)
  Index Cond: ((user_id = 123) AND (status = 'pending'::text))

Example 3: N+1 Query Elimination

Problem:

// 1 + N queries
const users = await db.query('SELECT * FROM users LIMIT 10');
for (const user of users) {
  user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}
// Time: 1000ms

Fix:

// 1 query
const users = await db.query(`
  SELECT 
    u.*,
    json_agg(json_build_object('id', p.id, 'title', p.title)) AS posts
  FROM users u
  LEFT JOIN posts p ON p.user_id = u.id
  GROUP BY u.id
  LIMIT 10
`);
// Time: 50ms (20x faster!)

Example 4: Pagination Improvement

Problem:

-- Page 1000 (offset 20000)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 20000;

-- Execution Time: 2000ms

Fix:

-- Keyset pagination
SELECT * FROM posts 
WHERE (created_at, id) < ('2024-01-15 10:00:00', 12345)
ORDER BY created_at DESC, id DESC 
LIMIT 20;

-- Execution Time: 5ms (400x faster!)

Tools

PostgreSQL

  • pgAdmin: GUI for database management
  • pgBadger: Log analyzer
  • pg_stat_statements: Query statistics
  • EXPLAIN Visualizer: https://explain.dalibo.com/

MySQL

  • MySQL Workbench: GUI for database management
  • mysqldumpslow: Slow query log analyzer
  • Percona Toolkit: Advanced MySQL tools

General

  • DataGrip: Multi-database IDE
  • DBeaver: Open-source database tool
  • Datadog: APM with database monitoring
  • New Relic: APM with database monitoring

Optimization Checklist

Before Optimizing

  • Identify slow queries (slow query log, APM)
  • Run EXPLAIN ANALYZE
  • Measure baseline performance

Optimization Steps

  • Add indexes for WHERE, JOIN, ORDER BY columns
  • Eliminate N+1 queries
  • Use SELECT only needed columns
  • Optimize pagination (keyset vs offset)
  • Add connection pooling
  • Cache frequent queries
  • Use materialized views for aggregates

After Optimizing

  • Run EXPLAIN ANALYZE again
  • Measure new performance
  • Document improvement
  • Monitor for regressions

Summary

Quick Reference

Query Analysis:

  • PostgreSQL:
    EXPLAIN ANALYZE
    ,
    pg_stat_statements
  • MySQL:
    EXPLAIN
    , slow query log
  • MongoDB:
    explain()
    , profiler

Index Types:

  • B-tree: Default, most common
  • Hash: Equality only
  • GIN/GiST: Full-text, arrays, JSON
  • Partial: Filtered indexes
  • Covering: Include columns

Common Optimizations:

  1. Add indexes for WHERE, JOIN, ORDER BY
  2. Eliminate N+1 queries (use JOINs or IN)
  3. Use keyset pagination (not LIMIT/OFFSET)
  4. Avoid SELECT *
  5. Use connection pooling
  6. Cache frequent queries
  7. Use materialized views for aggregates

Red Flags:

  • Seq Scan on large tables
  • N+1 queries
  • LIMIT with large OFFSET
  • Functions on indexed columns
  • Missing indexes on foreign keys

Tools:

  • pgAdmin, DataGrip, DBeaver
  • pg_stat_statements, slow query log
  • Datadog, New Relic