Marketplace query-optimizer
Analyze and optimize SQL queries for better performance and efficiency.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/curiouslearner/query-optimizer" ~/.claude/skills/aiskillstore-marketplace-query-optimizer && rm -rf "$T"
skills/curiouslearner/query-optimizer/SKILL.mdQuery Optimizer Skill
Analyze and optimize SQL queries for better performance and efficiency.
Instructions
You are a database performance optimization expert. When invoked:
-
Analyze Query Performance:
- Use EXPLAIN/EXPLAIN ANALYZE to understand execution plan
- Identify slow queries from logs
- Measure query execution time
- Detect full table scans and missing indexes
-
Identify Bottlenecks:
- Find N+1 query problems
- Detect inefficient JOINs
- Identify missing or unused indexes
- Spot suboptimal WHERE clauses
-
Optimize Queries:
- Add appropriate indexes
- Rewrite queries for better performance
- Suggest caching strategies
- Recommend query restructuring
-
Provide Recommendations:
- Index creation suggestions
- Query rewriting alternatives
- Database configuration tuning
- Monitoring and alerting setup
Supported Databases
- SQL: PostgreSQL, MySQL, MariaDB, SQL Server, SQLite
- Analysis Tools: EXPLAIN, EXPLAIN ANALYZE, Query Profiler
- Monitoring: pg_stat_statements, slow query log, performance schema
Usage Examples
@query-optimizer @query-optimizer --analyze-slow-queries @query-optimizer --suggest-indexes @query-optimizer --explain SELECT * FROM users WHERE email = 'test@example.com' @query-optimizer --fix-n-plus-one
Query Analysis Tools
PostgreSQL - EXPLAIN ANALYZE
-- Basic EXPLAIN EXPLAIN SELECT u.id, u.username, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.active = true GROUP BY u.id, u.username; -- EXPLAIN ANALYZE - actually runs the query EXPLAIN ANALYZE SELECT u.id, u.username, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.active = true GROUP BY u.id, u.username; -- EXPLAIN with all options (PostgreSQL) EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON) SELECT * FROM orders WHERE user_id = 123 AND created_at >= '2024-01-01';
Reading EXPLAIN Output:
Seq Scan on users (cost=0.00..1234.56 rows=10000 width=32) Filter: (active = true) -- Seq Scan = Sequential Scan (full table scan) - BAD for large tables -- cost=0.00..1234.56 = startup cost..total cost -- rows=10000 = estimated rows -- width=32 = average row size in bytes
Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=32) Index Cond: (email = 'test@example.com'::text) -- Index Scan = Using index - GOOD -- Much lower cost than Seq Scan -- rows=1 = accurate estimate
MySQL - EXPLAIN
-- MySQL EXPLAIN EXPLAIN SELECT u.id, u.username, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.active = true GROUP BY u.id, u.username; -- EXPLAIN with execution stats (MySQL 8.0+) EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123; -- Show warnings for optimization info EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; SHOW WARNINGS;
MySQL EXPLAIN Output:
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ -- type=ALL means full table scan - BAD -- key=NULL means no index used +----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+ | 1 | SIMPLE | users | ref | idx_users_email| idx_users_email| 767 | const | 1 | NULL | +----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+ -- type=ref means index lookup - GOOD -- key shows index being used
Common Performance Issues
1. Missing Indexes
Problem:
-- Slow query - full table scan SELECT * FROM users WHERE email = 'john@example.com'; -- EXPLAIN shows: -- Seq Scan on users (cost=0.00..1500.00 rows=1 width=100) -- Filter: (email = 'john@example.com')
Solution:
-- Add index on email column CREATE INDEX idx_users_email ON users(email); -- Now EXPLAIN shows: -- Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=100) -- Index Cond: (email = 'john@example.com') -- Query becomes 100x faster
2. N+1 Query Problem
Problem:
// ORM code causing N+1 queries const users = await User.findAll(); // 1 query for (const user of users) { const orders = await Order.findAll({ where: { userId: user.id } // N queries (one per user) }); console.log(`${user.name}: ${orders.length} orders`); } // Total: 1 + N queries for N users // For 100 users = 101 queries!
Solution:
// Use eager loading - single query with JOIN const users = await User.findAll({ include: [{ model: Order, attributes: ['id', 'total_amount'] }] }); for (const user of users) { console.log(`${user.name}: ${user.orders.length} orders`); } // Total: 1 query regardless of user count
SQL Equivalent:
-- Instead of multiple queries: SELECT * FROM users; SELECT * FROM orders WHERE user_id = 1; SELECT * FROM orders WHERE user_id = 2; -- ... (N more queries) -- Use single JOIN query: SELECT u.id, u.name, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name;
3. SELECT * Inefficiency
Problem:
-- Fetching all columns when only need few SELECT * FROM products WHERE category_id = 5; -- Fetches: id, name, description (large text), image_url, specs (json), -- price, stock, created_at, updated_at, etc.
Solution:
-- Only select needed columns SELECT id, name, price, stock FROM products WHERE category_id = 5; -- Benefits: -- - Less data transferred -- - Faster query execution -- - Lower memory usage -- - Can use covering indexes
4. Inefficient Pagination
Problem:
-- OFFSET becomes slow with large offsets SELECT * FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 10000; -- Database must: -- 1. Sort all rows -- 2. Skip 10,000 rows -- 3. Return next 20 -- Gets slower as offset increases
Solution:
-- Use cursor-based (keyset) pagination SELECT * FROM users WHERE created_at < '2024-01-01 12:00:00' AND (created_at < '2024-01-01 12:00:00' OR id < 12345) ORDER BY created_at DESC, id DESC LIMIT 20; -- Or with indexed column: SELECT * FROM users WHERE id < 10000 ORDER BY id DESC LIMIT 20; -- Benefits: -- - Consistent performance regardless of page -- - Uses index efficiently -- - No need to skip rows
5. Function on Indexed Column
Problem:
-- Function prevents index usage SELECT * FROM users WHERE LOWER(email) = 'john@example.com'; -- EXPLAIN shows Seq Scan (index not used)
Solution 1 - Store lowercase:
-- Add computed column ALTER TABLE users ADD COLUMN email_lower VARCHAR(255) GENERATED ALWAYS AS (LOWER(email)) STORED; CREATE INDEX idx_users_email_lower ON users(email_lower); -- Query: SELECT * FROM users WHERE email_lower = 'john@example.com';
Solution 2 - Functional index (PostgreSQL):
-- Create index on function result CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- Now original query uses index SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
Solution 3 - Case-insensitive collation:
-- PostgreSQL - use citext type ALTER TABLE users ALTER COLUMN email TYPE citext; -- Query without LOWER: SELECT * FROM users WHERE email = 'john@example.com'; -- Automatically case-insensitive
6. Inefficient JOINs
Problem:
-- Multiple JOINs without proper indexes SELECT u.username, o.id as order_id, p.name as product_name FROM users u JOIN orders o ON u.id = o.user_id JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id WHERE u.email = 'john@example.com'; -- Slow if missing indexes on: -- - users.email -- - orders.user_id -- - order_items.order_id -- - order_items.product_id
Solution:
-- Add necessary indexes CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id); CREATE INDEX idx_order_items_product_id ON order_items(product_id); -- Now query uses indexes for all JOINs -- EXPLAIN will show "Index Scan" for each table
7. OR Conditions
Problem:
-- OR prevents efficient index usage SELECT * FROM users WHERE username = 'john' OR email = 'john@example.com'; -- May not use indexes optimally
Solution:
-- Use UNION for better index usage SELECT * FROM users WHERE username = 'john' UNION SELECT * FROM users WHERE email = 'john@example.com'; -- Each subquery uses its own index -- Deduplicates results automatically
8. NOT IN with Subquery
Problem:
-- Slow subquery execution SELECT * FROM users WHERE id NOT IN ( SELECT user_id FROM banned_users ); -- Can be very slow with large subquery results
Solution:
-- Use LEFT JOIN with NULL check SELECT u.* FROM users u LEFT JOIN banned_users bu ON u.id = bu.user_id WHERE bu.user_id IS NULL; -- Or use NOT EXISTS (often faster): SELECT u.* FROM users u WHERE NOT EXISTS ( SELECT 1 FROM banned_users bu WHERE bu.user_id = u.id );
Index Optimization
When to Add Indexes
Add indexes for:
- Primary keys (automatic in most databases)
- Foreign keys (critical for JOINs)
- Columns in WHERE clauses
- Columns in ORDER BY clauses
- Columns in GROUP BY clauses
- Columns in JOIN conditions
- Columns with high cardinality (many unique values)
Index Types
B-Tree Index (Default):
-- Best for: equality (=) and range (<, >, BETWEEN) queries CREATE INDEX idx_users_created_at ON users(created_at); -- Good for: SELECT * FROM users WHERE created_at > '2024-01-01'; SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
Composite Index:
-- Index on multiple columns CREATE INDEX idx_orders_user_status ON orders(user_id, status); -- Used for queries filtering both columns SELECT * FROM orders WHERE user_id = 123 AND status = 'pending'; -- Also used for queries on first column only SELECT * FROM orders WHERE user_id = 123; -- NOT used for queries on second column only SELECT * FROM orders WHERE status = 'pending'; -- Won't use this index -- Column order matters! Most selective first
Partial Index (PostgreSQL):
-- Index only subset of rows CREATE INDEX idx_active_users ON users(email) WHERE active = true; -- Smaller index, faster queries for active users SELECT * FROM users WHERE email = 'john@example.com' AND active = true;
GIN Index (PostgreSQL - for arrays, JSONB, full-text):
-- For JSONB columns CREATE INDEX idx_products_metadata ON products USING GIN(metadata); -- Query JSONB data SELECT * FROM products WHERE metadata @> '{"brand": "Apple"}'; -- For array columns CREATE INDEX idx_tags ON posts USING GIN(tags); -- Query arrays SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
Full-Text Search Index:
-- PostgreSQL CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name || ' ' || description)); -- Full-text search query SELECT * FROM products WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'laptop & gaming');
Covering Index
Concept:
-- Covering index includes all columns needed by query CREATE INDEX idx_users_email_username ON users(email, username); -- This query can be answered entirely from index (no table access) SELECT username FROM users WHERE email = 'john@example.com'; -- PostgreSQL: Index-Only Scan -- MySQL: Using index
With INCLUDE (PostgreSQL 11+):
-- Include non-indexed columns in index leaf nodes CREATE INDEX idx_users_email ON users(email) INCLUDE (username, created_at); -- Query can use index without table access SELECT username, created_at FROM users WHERE email = 'john@example.com';
Index Maintenance
Find Unused Indexes (PostgreSQL):
SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE '%_pkey' ORDER BY pg_relation_size(indexrelid) DESC; -- Drop unused indexes to save space and improve write performance
Find Duplicate Indexes:
-- PostgreSQL query to find duplicate indexes SELECT indrelid::regclass AS table_name, array_agg(indexrelid::regclass) AS indexes FROM pg_index GROUP BY indrelid, indkey HAVING COUNT(*) > 1;
Rebuild Fragmented Indexes:
-- PostgreSQL REINDEX INDEX idx_users_email; REINDEX TABLE users; -- MySQL OPTIMIZE TABLE users;
Query Rewriting Examples
Example 1: Aggregation Optimization
Before:
SELECT u.id, u.username, (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count, (SELECT SUM(total_amount) FROM orders WHERE user_id = u.id) as total_spent FROM users u WHERE u.active = true; -- N+1 problem: 1 query + 2 subqueries per user
After:
SELECT u.id, u.username, COUNT(o.id) as order_count, COALESCE(SUM(o.total_amount), 0) as total_spent FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.active = true GROUP BY u.id, u.username; -- Single query with JOIN -- Much faster!
Example 2: EXISTS vs IN
Before:
SELECT * FROM products WHERE id IN ( SELECT product_id FROM order_items WHERE created_at >= '2024-01-01' ); -- Subquery returns all product_ids (potentially large result set)
After:
SELECT p.* FROM products p WHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id AND oi.created_at >= '2024-01-01' ); -- EXISTS stops at first match (more efficient)
Example 3: Avoid Cartesian Products
Before:
-- Accidental cartesian product SELECT * FROM users u, orders o WHERE u.active = true AND o.status = 'completed'; -- Returns every user combined with every completed order! -- Missing JOIN condition
After:
SELECT u.*, o.* FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE u.active = true AND o.status = 'completed'; -- Proper JOIN condition
Example 4: Optimize DISTINCT
Before:
SELECT DISTINCT user_id FROM orders WHERE status = 'completed'; -- DISTINCT requires sorting/deduplication
After:
SELECT user_id FROM orders WHERE status = 'completed' GROUP BY user_id; -- GROUP BY often faster than DISTINCT -- Or if unique constraint exists: SELECT DISTINCT ON (user_id) user_id, created_at FROM orders WHERE status = 'completed' ORDER BY user_id, created_at DESC;
Monitoring Slow Queries
PostgreSQL - pg_stat_statements
-- Enable extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Find slowest queries SELECT substring(query, 1, 50) AS short_query, round(total_exec_time::numeric, 2) AS total_time, calls, round(mean_exec_time::numeric, 2) AS mean_time, round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- Find queries with most calls SELECT substring(query, 1, 50) AS short_query, calls, round(mean_exec_time::numeric, 2) AS mean_time FROM pg_stat_statements ORDER BY calls DESC LIMIT 10; -- Reset statistics SELECT pg_stat_statements_reset();
MySQL - Slow Query Log
-- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log'; -- Log queries not using indexes SET GLOBAL log_queries_not_using_indexes = 'ON'; -- Analyze slow query log -- Use mysqldumpslow tool: -- mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log
Performance Schema (MySQL)
-- Enable performance schema SET GLOBAL performance_schema = ON; -- Find slowest statements SELECT DIGEST_TEXT, COUNT_STAR AS executions, ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms, ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_ms FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
Best Practices
DO ✓
- Use EXPLAIN before and after optimization
- Add indexes on foreign keys - Critical for JOINs
- Use covering indexes when possible
- Paginate large result sets - Avoid loading all data
- Monitor query performance - Use pg_stat_statements or slow query log
- Test on production-like data - Performance differs with data volume
- Use connection pooling - Reduce connection overhead
- Cache frequently accessed data - Redis, Memcached
- Archive old data - Keep active tables smaller
- Regular VACUUM/ANALYZE (PostgreSQL) - Update statistics
DON'T ✗
- **Don't use SELECT *** - Fetch only needed columns
- Don't over-index - Each index slows down writes
- Don't ignore EXPLAIN warnings - They indicate problems
- Don't use functions on indexed columns - Prevents index usage
- Don't fetch more data than needed - Use LIMIT
- Don't use OFFSET for deep pagination - Use cursor-based instead
- Don't ignore database logs - Monitor for errors
- Don't optimize prematurely - Profile first, optimize bottlenecks
- Don't forget about write performance - Indexes slow down INSERTs
- Don't skip testing - Verify optimizations actually help
Query Optimization Checklist
## Query Optimization Checklist - [ ] Run EXPLAIN/EXPLAIN ANALYZE on query - [ ] Check if query uses indexes (no Seq Scan on large tables) - [ ] Verify indexes exist on: - [ ] Foreign key columns - [ ] WHERE clause columns - [ ] JOIN condition columns - [ ] ORDER BY columns - [ ] SELECT only needed columns (avoid SELECT *) - [ ] Use appropriate JOIN type (INNER vs LEFT) - [ ] Avoid N+1 queries (use JOINs or eager loading) - [ ] Use pagination for large result sets - [ ] Check for unused indexes (slow down writes) - [ ] Consider query caching for frequent queries - [ ] Test with production-like data volumes - [ ] Monitor query performance over time
Notes
- Always measure before and after optimization
- Index creation can take time on large tables
- Too many indexes slow down INSERT/UPDATE/DELETE
- Keep database statistics up to date (ANALYZE)
- Consider read replicas for read-heavy workloads
- Use database-specific features when beneficial
- Document optimization decisions for team
- Regular performance audits prevent degradation