Awesome-claude-code-toolkit database-optimization

Query optimization, indexing strategies, and database performance tuning for PostgreSQL and MySQL

install
source · Clone the upstream repo
git clone https://github.com/rohitg00/awesome-claude-code-toolkit
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/rohitg00/awesome-claude-code-toolkit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/database-optimization" ~/.claude/skills/rohitg00-awesome-claude-code-toolkit-database-optimization && rm -rf "$T"
manifest: skills/database-optimization/SKILL.md
source content

Database Optimization

EXPLAIN Analysis

Always run

EXPLAIN ANALYZE
before optimizing. Read the output bottom-up.

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

Key metrics to watch:

  • Seq Scan on large tables = missing index
  • Nested Loop with high row count = consider hash/merge join
  • Sort without index = add index on sort column
  • Rows estimated vs actual divergence = stale statistics, run
    ANALYZE

Index Strategies

B-tree (default, most cases)

CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);

Use for: equality, range queries, sorting. Column order matters in composite indexes: put equality columns first, then range/sort columns.

Partial Index (PostgreSQL)

CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';

Use when queries always filter on a specific condition. Dramatically smaller than full indexes.

GIN (PostgreSQL - arrays, JSONB, full-text)

CREATE INDEX idx_products_tags ON products USING GIN (tags);
CREATE INDEX idx_docs_search ON documents USING GIN (to_tsvector('english', content));

GiST (PostgreSQL - spatial, range types)

CREATE INDEX idx_locations_point ON locations USING GiST (coordinates);
CREATE INDEX idx_events_period ON events USING GiST (tsrange(start_at, end_at));

Covering Index (index-only scans)

-- PostgreSQL
CREATE INDEX idx_users_email_name ON users (email) INCLUDE (name);

-- MySQL
CREATE INDEX idx_users_email_name ON users (email, name);

N+1 Query Detection

Symptom: 1 query to fetch parent + N queries for each child.

# BAD: N+1
users = db.query(User).all()
for user in users:
    print(user.orders)  # triggers query per user

# GOOD: eager load
users = db.query(User).options(joinedload(User.orders)).all()
// BAD: N+1
const users = await User.findAll();
for (const user of users) {
  const orders = await Order.findAll({ where: { userId: user.id } });
}

// GOOD: batch load
const users = await User.findAll({ include: [Order] });

Detection: enable query logging, count queries per request. More than 10 queries for a single endpoint is a red flag.

Connection Pooling

Rule of thumb: pool_size = (core_count * 2) + disk_count
Typical web app: 10-20 connections per app instance

PostgreSQL:

  • Use PgBouncer in transaction mode for serverless/high-connection scenarios
  • Set
    idle_in_transaction_session_timeout = '30s'
  • Monitor with
    pg_stat_activity

MySQL:

  • Set
    max_connections
    based on available RAM (each connection uses ~10MB)
  • Use ProxySQL for connection multiplexing
  • Monitor with
    SHOW PROCESSLIST

Read Replicas

  • Route all
    SELECT
    queries to replicas
  • Route all writes to primary
  • Account for replication lag (typically 10-100ms)
  • Never read-after-write from a replica; use primary for consistency-critical reads
  • Use connection-level routing, not query-level
# SQLAlchemy read replica routing
class RoutingSession(Session):
    def get_bind(self, mapper=None, clause=None):
        if self._flushing or self.is_modified():
            return engines["primary"]
        return engines["replica"]

Partition Strategies

Range Partitioning (time-series data)

-- PostgreSQL
CREATE TABLE events (
    id bigint GENERATED ALWAYS AS IDENTITY,
    created_at timestamptz NOT NULL,
    data jsonb
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025_q1 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE events_2025_q2 PARTITION OF events
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

Hash Partitioning (even distribution)

CREATE TABLE sessions (
    id uuid PRIMARY KEY,
    user_id bigint NOT NULL
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);

Partition when tables exceed 50-100GB or when you need to drop old data quickly.

Query Optimization Checklist

  1. Run
    EXPLAIN ANALYZE
    and read the plan
  2. Check for sequential scans on tables with >10K rows
  3. Verify index usage (check
    idx_scan
    in
    pg_stat_user_indexes
    )
  4. Look for implicit type casts that prevent index use
  5. Replace
    SELECT *
    with specific columns
  6. Add
    LIMIT
    to queries that only need a subset
  7. Use
    EXISTS
    instead of
    COUNT(*) > 0
  8. Batch
    INSERT
    /
    UPDATE
    operations (500-1000 rows per batch)
  9. Avoid functions on indexed columns in
    WHERE
    clauses
  10. Monitor slow query log (pg:
    log_min_duration_statement = 100
    )

Dangerous Patterns

  • LIKE '%term%'
    on unindexed columns (use full-text search instead)
  • ORDER BY RANDOM()
    (use
    TABLESAMPLE
    or application-level randomization)
  • SELECT DISTINCT
    masking a join problem
  • Missing
    WHERE
    on
    UPDATE
    /
    DELETE
    (always verify with
    SELECT
    first)
  • Long-running transactions holding locks
  • Using
    OFFSET
    for deep pagination (use keyset/cursor pagination instead)