NWave nw-query-optimization

SQL and NoSQL query optimization techniques, indexing strategies, execution plan analysis, JOIN algorithms, cardinality estimation, and database-specific query patterns

install
source · Clone the upstream repo
git clone https://github.com/nWave-ai/nWave
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/nWave-ai/nWave "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/nw/skills/nw-query-optimization" ~/.claude/skills/nwave-ai-nwave-nw-query-optimization-6738e2 && rm -rf "$T"
manifest: plugins/nw/skills/nw-query-optimization/SKILL.md
source content

Query Optimization

Cost-Based Optimization

Modern relational DBs use cost-based optimizers (CBO): generate plan candidates -> estimate cost via statistics (row counts, distributions, selectivity) -> select lowest I/O/CPU/memory plan. Stale statistics lead to suboptimal plans.

Execution Plan Analysis

Validate optimization with EXPLAIN before and after changes.

-- PostgreSQL (add ANALYZE for actual runtime stats)
EXPLAIN ANALYZE SELECT order_id, total FROM orders WHERE customer_id = 12345;
-- MySQL: EXPLAIN FORMAT=JSON ... | SQL Server: SET STATISTICS IO ON

Key indicators: Seq Scan/Table Scan = missing index | Index Scan/Seek = efficient | Hash Join = large equality joins | Nested Loop = small/indexed inner | Merge Join = pre-sorted inputs | Sort = watch disk spills

Indexing Strategies

B-Tree (Default)

Supports: equality, range, sorting, prefix matching | O(log n) lookup | General-purpose, all major DBs default

Hash

Equality only | O(1) lookup | High-cardinality exact-match | No range/sorting/pattern support

Covering Indexes

Include all query columns in index -> eliminates table access (index-only scan) | Trade-off: larger index, slower writes

-- Covering index for: SELECT name, email FROM users WHERE status = 'active'
CREATE INDEX idx_users_status_covering ON users(status) INCLUDE (name, email);

PostgreSQL Specialized

  • GiST: Geometric data, full-text search, nearest-neighbor
  • GIN: Arrays, full-text search, JSONB queries
  • BRIN: Large tables with physically correlated data (timestamps), minimal storage
  • SP-GiST: Non-balanced structures, point-based geometric queries

Compound Index Design

Order by: 1. Equality conditions first (highest selectivity) | 2. Sort columns second | 3. Range conditions last

MongoDB ESR Rule

Equality-Sort-Range ordering for compound indexes:

// Query: status = "A", qty > 20, sorted by item
// Optimal index:
db.collection.createIndex({ status: 1, item: 1, qty: 1 })
//                          E(quality)  S(ort)   R(ange)

SQL Optimization Patterns

Select Only Needed Columns

-- Bad: SELECT * retrieves unnecessary data, prevents covering indexes
SELECT * FROM orders WHERE customer_id = 12345;

-- Good: Specify columns, enables covering index
SELECT order_id, order_date, total FROM orders WHERE customer_id = 12345;

Other Key Patterns

  • CTEs: Improve readability but not always performance -- PostgreSQL may materialize CTEs (pre-v12), MySQL inlines them
  • Window functions: Use
    SUM() OVER
    ,
    RANK() OVER (PARTITION BY ...)
    for analytics without self-joins
  • Pagination: Prefer keyset (
    WHERE id > last_seen ORDER BY id LIMIT N
    ) over OFFSET for deep pages
  • Parameterized queries: Prevent SQL injection AND enable plan caching (
    cursor.execute("... WHERE id = %s", (id,))
    )

JOIN Algorithm Selection

AlgorithmBest WhenCost
Nested LoopSmall outer table, indexed inner tableO(n * m) worst, O(n * log m) with index
Hash JoinLarge tables, equality joins, no useful indexesO(n + m) build + probe
Merge JoinBoth inputs already sorted (index order)O(n + m) after sort

Cardinality Estimation

Optimizer predicts row counts using: Histograms (value distribution) | Density vectors (non-histogram columns) | Statistics objects via ANALYZE (PostgreSQL) / UPDATE STATISTICS (SQL Server)

When estimation is wrong (correlated columns, skewed data, multi-table joins): 1. Run ANALYZE/UPDATE STATISTICS | 2. Create multi-column statistics | 3. Query hints as last resort

NoSQL Query Optimization

MongoDB

Place

$match
/
$project
early in pipelines | Use
$lookup
sparingly (left outer joins) | Compound indexes following ESR | Validate with
explain("executionStats")

Cassandra

Always include partition key | Design tables around query patterns (query-first) | Use SAI over SASI (43% throughput gain) | Avoid ALLOW FILTERING (full cluster scan) | Materialized views add write overhead

DynamoDB

Use Query not Scan | Design partition keys for even distribution | GSIs for alternative access patterns | Single-table design with composite sort keys

Redis

FT.SEARCH for complex queries (RediSearch module) | Design key naming for efficient SCAN | Use pipelining for batch ops

Anti-Patterns to Detect

  • **SELECT ***: Wastes I/O, prevents covering indexes
  • Missing indexes on WHERE/JOIN/ORDER BY columns: full table scans
  • N+1 queries: Fetch in loops instead of JOINs/batch
  • Implicit type conversions: Prevents index use (WHERE varchar_col = 123)
  • Functions on indexed columns:
    WHERE UPPER(name) = 'JOHN'
    blocks index; use function-based indexes
  • Missing pagination: Unbounded result sets
  • Hot partitions (NoSQL): Low-cardinality partition keys concentrate load
  • ALLOW FILTERING (Cassandra): Expensive full-cluster scans
  • Large partitions (Cassandra): >100MB degrades performance
nw-query-optimization — OpenSkillIndex