Claude-initial-setup query-optimization
install
source · Clone the upstream repo
git clone https://github.com/VersoXBT/claude-initial-setup
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/VersoXBT/claude-initial-setup "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/database/query-optimization" ~/.claude/skills/versoxbt-claude-initial-setup-query-optimization && rm -rf "$T"
manifest:
skills/database/query-optimization/SKILL.mdsource content
Query Optimization
Diagnose and fix slow SQL queries using EXPLAIN ANALYZE, proper indexing, query rewriting, and N+1 detection.
When to Use
- User reports slow database queries
- User asks about EXPLAIN ANALYZE output
- User needs to optimize query performance
- User has N+1 query problems in an ORM
- User asks about index tuning or covering indexes
Core Patterns
EXPLAIN ANALYZE
Always start with EXPLAIN ANALYZE to understand what the database is actually doing.
-- Basic usage EXPLAIN ANALYZE SELECT o.id, o.status, c.name FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.status = 'pending' ORDER BY o.ordered_at DESC LIMIT 20;
Key things to look for in the output:
-- BAD: Sequential scan on large table Seq Scan on orders (cost=0.00..15234.00 rows=500000 width=48) Filter: (status = 'pending'::text) Rows Removed by Filter: 450000 -- GOOD: Index scan Index Scan using idx_orders_status_ordered_at on orders (cost=0.42..85.20 rows=50000 width=48) Index Cond: (status = 'pending'::text)
-- Check actual vs estimated rows (large discrepancies mean stale statistics) EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE customer_id = 12345; -- If estimates are wrong, update statistics ANALYZE orders;
Index Tuning
Create indexes that match your query patterns. Column order in composite indexes matters.
-- Query: WHERE status = ? AND ordered_at > ? ORDER BY ordered_at -- Index must match: equality columns first, then range/sort columns CREATE INDEX idx_orders_status_ordered_at ON orders(status, ordered_at DESC); -- Query: WHERE customer_id = ? AND status = ? ORDER BY ordered_at DESC CREATE INDEX idx_orders_customer_status_date ON orders(customer_id, status, ordered_at DESC); -- Verify the index is being used EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100 AND status = 'pending' ORDER BY ordered_at DESC LIMIT 10;
Covering Indexes (Index-Only Scans)
Include all columns the query needs in the index to avoid hitting the table at all.
-- Query only needs id, status, ordered_at -- INCLUDE adds columns to the index leaf without affecting sort order CREATE INDEX idx_orders_status_covering ON orders(status, ordered_at DESC) INCLUDE (id, customer_id); -- This enables an index-only scan (no table lookup) EXPLAIN ANALYZE SELECT id, customer_id, ordered_at FROM orders WHERE status = 'pending' ORDER BY ordered_at DESC LIMIT 20; -- Look for "Index Only Scan" in the output
N+1 Query Detection and Fix
N+1 queries execute 1 query for the parent, then N queries for each child. Fix with JOINs or batch loading.
-- BAD: N+1 pattern (executed in application code) -- Query 1: SELECT * FROM orders WHERE status = 'pending' -- Query 2..N+1: SELECT * FROM customers WHERE id = ? (once per order) -- GOOD: Single query with JOIN SELECT o.id, o.status, o.ordered_at, c.name, c.email FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.status = 'pending' ORDER BY o.ordered_at DESC; -- GOOD: Batch loading (when JOIN produces too many rows) SELECT * FROM orders WHERE status = 'pending' ORDER BY ordered_at DESC; SELECT * FROM customers WHERE id IN (1, 2, 3, 4, 5); -- collected IDs from first query
ORM-level fixes:
# SQLAlchemy: Use joinedload or selectinload from sqlalchemy.orm import joinedload, selectinload # JOIN strategy (one query) orders = session.query(Order).options( joinedload(Order.customer) ).filter(Order.status == "pending").all() # Subquery strategy (two queries, better for large result sets) orders = session.query(Order).options( selectinload(Order.items) ).filter(Order.status == "pending").all()
Query Rewriting
Restructure queries to help the optimizer choose better plans.
-- BAD: Correlated subquery (executes per row) SELECT c.name, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count FROM customers c; -- GOOD: JOIN with aggregation (single pass) SELECT c.name, COUNT(o.id) AS order_count FROM customers c LEFT JOIN orders o ON o.customer_id = c.id GROUP BY c.id, c.name; -- BAD: OR conditions prevent index use SELECT * FROM orders WHERE customer_id = 100 OR status = 'pending'; -- GOOD: UNION ALL uses indexes on both conditions SELECT * FROM orders WHERE customer_id = 100 UNION ALL SELECT * FROM orders WHERE status = 'pending' AND customer_id != 100; -- BAD: Function on indexed column prevents index use SELECT * FROM customers WHERE LOWER(email) = 'user@example.com'; -- GOOD: Expression index CREATE INDEX idx_customers_email_lower ON customers(LOWER(email)); SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';
Anti-Patterns
- Not using EXPLAIN ANALYZE: Guessing at performance problems wastes time. Always measure first with EXPLAIN ANALYZE.
- Indexing every column: Indexes cost write performance and storage. Index only columns used in WHERE, JOIN, and ORDER BY of frequent queries.
- **SELECT ***: Fetching all columns prevents covering index optimizations and wastes network bandwidth. Select only the columns you need.
- Ignoring stale statistics: PostgreSQL's planner uses table statistics to choose plans. Run
after bulk data changes.ANALYZE - Using OFFSET for pagination:
still scans 10000 rows. Use keyset pagination:OFFSET 10000
.WHERE id > last_seen_id ORDER BY id LIMIT 20 - Premature optimization: Optimize queries that are actually slow in production, not queries that might be slow someday.
Quick Reference
-- Find slow queries (PostgreSQL) SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20; -- Find unused indexes SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC; -- Find missing indexes (sequential scans on large tables) SELECT relname, seq_scan, seq_tup_read, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 100 AND pg_relation_size(relid) > 10000000 ORDER BY seq_tup_read DESC; -- Table and index sizes SELECT relname AS table_name, pg_size_pretty(pg_total_relation_size(relid)) AS total_size, pg_size_pretty(pg_relation_size(relid)) AS table_size, pg_size_pretty(pg_indexes_size(relid)) AS index_size FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC; -- Keyset pagination (fast, stable) SELECT * FROM orders WHERE (ordered_at, id) < ('2024-01-15 10:30:00', 5000) ORDER BY ordered_at DESC, id DESC LIMIT 20;