Harness-engineering harness-sql-review

Harness SQL Review

install

source · Clone the upstream repo

git clone https://github.com/Intense-Visions/harness-engineering

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/harness-sql-review" ~/.claude/skills/intense-visions-harness-engineering-harness-sql-review-fa1fca && rm -rf "$T"

manifest: agents/skills/claude-code/harness-sql-review/SKILL.md

source content

Harness SQL Review

Adversarial review of SQL queries for performance anti-patterns, missing indexes, N+1 queries, and unsafe operations. Analyzes raw SQL, ORM-generated queries, and migration scripts to produce optimization recommendations with estimated impact.

When to Use

When reviewing a PR that adds or modifies SQL queries, repository methods, or database access patterns
When investigating slow database performance or timeout issues
When adding new database tables or indexes and validating query access patterns
NOT for schema design or migration safety (use harness-database)
NOT for data pipeline DAG structure or ETL patterns (use harness-data-pipeline)
NOT for ORM selection or configuration decisions (use harness-database)

Process

Phase 1: SCAN -- Locate SQL Queries

Resolve project root. Use provided path or cwd.
Detect SQL dialect. Determine the database from project configuration:
- PostgreSQL:
```
DATABASE_URL
```
  containing
```
postgres
```
  , Prisma
```
provider = "postgresql"
```
  ,
```
pg
```
  package
- MySQL:
```
DATABASE_URL
```
  containing
```
mysql
```
  , Prisma
```
provider = "mysql"
```
  ,
```
mysql2
```
  package
- SQLite:
```
.sqlite
```
  or
```
.db
```
  files, Prisma
```
provider = "sqlite"
```
  ,
```
better-sqlite3
```
  package
- MSSQL:
```
DATABASE_URL
```
  containing
```
sqlserver
```
  ,
```
mssql
```
  or
```
tedious
```
  package
- Override with
```
--dialect
```
  if auto-detection fails
Extract raw SQL queries. Scan source files for:
- SQL string literals: template literals containing
```
SELECT
```
  ,
```
INSERT
```
  ,
```
UPDATE
```
  ,
```
DELETE
```
  ,
```
CREATE
```
- Query builder calls:
```
.query()
```
  ,
```
.raw()
```
  ,
```
.execute()
```
  ,
```
$queryRaw
```
  ,
```
$executeRaw
```
- SQL files:
```
*.sql
```
  in
```
queries/
```
  ,
```
src/**/sql/
```
  , migration directories
- Stored procedures and views referenced in application code
Extract ORM queries. Identify ORM usage and extract the implied SQL:
- Prisma:
```
findMany
```
  ,
```
findUnique
```
  ,
```
include
```
  ,
```
select
```
  with nested relations
- TypeORM:
```
createQueryBuilder
```
  ,
```
find
```
  with
```
relations
```
  ,
```
@OneToMany
```
  eager loading
- Sequelize:
```
findAll
```
  with
```
include
```
  , raw queries
- Knex: query chain analysis (
```
.select()
```
  ,
```
.where()
```
  ,
```
.join()
```
  )
- Drizzle: query builder chains and relational queries
Scope to PR changes. If triggered by a PR, focus on queries in changed files. Also scan unchanged files that import changed models (they may be affected by schema changes).
Build query inventory. For each query, record:
- Source file and line number
- Query type (SELECT, INSERT, UPDATE, DELETE, DDL)
- Tables accessed
- Whether it is in a loop (potential N+1)
- Estimated complexity (simple, join, subquery, CTE, window function)

Phase 2: ANALYZE -- Evaluate Query Patterns

Detect N+1 queries. The most common and impactful anti-pattern:
- Look for queries inside loops:
```
for
```
  /
```
forEach
```
  /
```
map
```
  containing
```
await db.query
```
- Look for ORM eager loading gaps:
```
findMany
```
  followed by individual
```
findUnique
```
  per result
- Look for GraphQL resolvers that query per-field without DataLoader
- Classify: Error if in a known hot path, Warning if in administrative code
Detect missing indexes. For each
```
WHERE
```
,
```
JOIN
```
, and
```
ORDER BY
```
clause:
- Identify the columns being filtered, joined, or sorted
- Check if a supporting index exists (read migration files,
```
CREATE INDEX
```
  statements, Prisma
```
@@index
```
  )
- Flag unindexed columns on tables likely to be large (referenced in multiple queries or with
```
count
```
  operations)
- Recommend composite indexes when queries filter on multiple columns
Detect full table scans. Flag queries that:
- Use
```
SELECT *
```
  on large tables without a
```
WHERE
```
  clause
- Use
```
LIKE '%pattern%'
```
  (leading wildcard prevents index use)
- Use functions on indexed columns in
```
WHERE
```
  (
```
WHERE LOWER(email) = ...
```
  )
- Use
```
OR
```
  across different columns without index support
- Use
```
NOT IN
```
  with large subqueries
Detect unsafe operations. Flag queries that:
- ```
UPDATE
```
  or
```
DELETE
```
  without a
```
WHERE
```
  clause
- Use
```
TRUNCATE
```
  in application code (should be migration/admin only)
- Perform DDL (
```
ALTER TABLE
```
  ,
```
DROP
```
  ) in application code
- Use
```
SELECT ... FOR UPDATE
```
  without a transaction
- Have unbounded result sets (
```
SELECT
```
  without
```
LIMIT
```
  on user-facing endpoints)
Detect join inefficiencies. Evaluate:
- Cartesian products (missing
```
ON
```
  clause or cross join without intent)
- Joining on non-indexed columns
- Joining large tables without filtering first (suggest subquery or CTE)
- Multiple joins that could be replaced with a single denormalized query
- Correlated subqueries that could be rewritten as joins
Detect transaction issues. Check for:
- Long-running transactions (multiple queries without commit)
- Read-after-write without transaction isolation
- Deadlock-prone patterns (acquiring locks in inconsistent order)
- Missing transactions on multi-step writes

Phase 3: OPTIMIZE -- Produce Recommendations

Generate optimized query alternatives. For each finding, provide:
- The original query (with file location)
- The optimized alternative
- Explanation of why the optimization helps
- Estimated impact (order of magnitude: "reduces from O(N) queries to O(1)")

Recommend index additions. For each missing index:

-- Recommended: speeds up user lookup by email in login flow
-- Estimated impact: O(N) full scan -> O(log N) index seek
CREATE INDEX idx_users_email ON users (email);

Include:

Index type recommendation (B-tree, hash, GIN, GiST) based on query pattern
Composite index column order (most selective first)
Partial index suggestions when queries always filter on a condition

Recommend batching strategies for N+1. Provide specific alternatives:
- ORM eager loading: show the
```
include
```
  /
```
relations
```
  /
```
with
```
  syntax
- DataLoader pattern: show the DataLoader implementation for GraphQL resolvers
- ```
WHERE IN
```
  batching: show the batched query for loop-based N+1
- SQL
```
JOIN
```
  rewrite: show how to combine parent + child in one query
Recommend query rewrites. For complex subqueries and inefficient patterns:
- Correlated subquery -> JOIN or CTE
- ```
NOT IN
```
  ->
```
NOT EXISTS
```
  (NULL-safe and often faster)
- Multiple
```
UNION
```
  -> single query with
```
CASE WHEN
```
- ```
DISTINCT
```
  on large result sets ->
```
GROUP BY
```
  or
```
EXISTS
```
- ```
SELECT *
```
  -> explicit column list
Prioritize by impact. Order recommendations:
- N+1 queries in hot paths (highest impact)
- Missing indexes on high-traffic queries
- Unsafe operations (correctness risk)
- Query rewrites for efficiency
- Style and readability improvements

Phase 4: VALIDATE -- Verify Optimization Correctness

Verify semantic equivalence. For every rewritten query:
- Does it return the same result set as the original?
- Does it handle NULL values the same way? (
```
NOT IN
```
  vs
```
NOT EXISTS
```
  differs on NULLs)
- Does it handle empty sets the same way?
- Does it maintain the same ordering?
Verify index recommendations. For each proposed index:
- Does the index cover the query it is intended to speed up?
- Does the index column order match the query filter order?
- Will the index cause write performance degradation? (flag if table has heavy write load)
- Are there existing indexes that already cover this query? (avoid duplicates)
Check for regression risks. Flag:
- Index additions on tables with high write throughput (may slow inserts)
- Query rewrites that increase memory usage (e.g., large
```
IN
```
  lists)
- Eager loading that may fetch too much data (the "N+1 overcorrection" of loading everything)
- Batched queries that exceed the database's parameter limit

Output structured report. Present findings in review format:

SQL Review: [PASS/NEEDS_ATTENTION/FAIL]
Queries analyzed: N
Findings: E errors, W warnings, I info

ERRORS:
[SQL-N1-001] src/repositories/OrderRepository.ts:45
  N+1 query in getOrdersWithItems(): queries items per order in loop
  Current: 1 + N queries (N = number of orders)
  Recommended: Single query with JOIN or Prisma include
  Impact: O(N) -> O(1) database round trips

[SQL-UNSAFE-001] src/services/CleanupService.ts:23
  DELETE without WHERE clause: db.execute("DELETE FROM temp_records")
  Risk: Deletes all records if called outside intended context
  Recommended: Add WHERE clause with date filter or use TRUNCATE in migration

WARNINGS:
[SQL-IDX-001] src/repositories/UserRepository.ts:78
  Missing index: WHERE email = ? on users table (estimated 500K rows)
  Recommended: CREATE INDEX idx_users_email ON users (email);

Verify no new anti-patterns introduced. Check that recommended optimizations do not introduce new problems (e.g., a JOIN recommendation that creates a Cartesian product, or an index recommendation on a column already indexed).

Harness Integration

harness skill run harness-sql-review
-- Primary command for SQL query analysis.
harness validate
-- Run after applying query optimizations to verify project health.
Glob
-- Used to locate SQL files, repository files, DAO classes, and migration scripts.
Grep
-- Used to extract SQL strings, query builder calls, ORM methods, and index definitions.
Read
-- Used to read query files, repository implementations, and schema definitions.
Write
-- Used to generate optimized query files and index migration scripts.
Bash
-- Used to run
```
EXPLAIN
```
queries when database connection is available and to check migration files.
emit_interaction
-- Used to present the review report and confirm optimization approach for complex rewrites.

Success Criteria

All SQL queries in scope are inventoried with source location and type
N+1 patterns are detected in both raw SQL and ORM usage
Missing indexes are identified with specific
```
CREATE INDEX
```
recommendations
Unsafe operations are flagged with risk assessment
Optimized alternatives are semantically equivalent to originals
Recommendations are prioritized by estimated performance impact
Report follows structured format with actionable findings

Examples

Example: Express.js API with Prisma ORM

Phase 1: SCAN
  Dialect: PostgreSQL (detected from DATABASE_URL)
  ORM: Prisma 5.10
  Queries found: 34 (28 Prisma, 4 $queryRaw, 2 $executeRaw)
  Scope: PR diff (3 changed repository files)

Phase 2: ANALYZE
  [SQL-N1-001] src/routes/orders.ts:67
    findMany orders then findUnique for each order's customer
    Pattern: 1 query for orders + N queries for customers
  [SQL-IDX-001] src/routes/products.ts:23
    findMany with where: { category: catId, status: "active" }
    No composite index on (category_id, status)
  [SQL-UNSAFE-001] src/routes/admin.ts:89
    $executeRaw DELETE FROM sessions (no WHERE clause)

Phase 3: OPTIMIZE
  N+1 fix: Use Prisma include: { customer: true }
  Index: CREATE INDEX idx_products_category_status ON products (category_id, status);
  Safety: Add WHERE expired_at < NOW() to session cleanup

Phase 4: VALIDATE
  All rewrites produce equivalent results: YES
  Index does not duplicate existing: YES (checked schema.prisma)
  Report: FAIL (1 N+1 error, 1 unsafe operation, 1 missing index)

Example: Django REST Framework with Raw SQL

Phase 1: SCAN
  Dialect: PostgreSQL (detected from DATABASES setting)
  ORM: Django ORM + 6 raw SQL queries
  Queries found: 52 (46 ORM, 6 raw)
  Scope: full project audit

Phase 2: ANALYZE
  [SQL-N1-001] views/analytics.py:34
    for report in reports: report.author.name (lazy loading author per report)
  [SQL-SCAN-001] queries/search.sql:12
    WHERE description LIKE '%' || search_term || '%' (leading wildcard, full scan)
  [SQL-JOIN-001] views/dashboard.py:78
    3 sequential queries that could be a single JOIN
  [SQL-TXN-001] views/transfer.py:45
    Debit + credit operations without @transaction.atomic

Phase 3: OPTIMIZE
  N+1: Add select_related('author') to queryset
  Search: Recommend PostgreSQL full-text search with GIN index:
    CREATE INDEX idx_items_description_gin ON items USING GIN (to_tsvector('english', description));
  Join: Combine 3 queries into single query with LEFT JOIN
  Transaction: Wrap debit+credit in @transaction.atomic

Phase 4: VALIDATE
  All optimizations preserve correctness: YES
  Full-text search returns equivalent results: YES (for word-boundary matches)
  Report: FAIL (1 N+1, 1 missing transaction) + 2 warnings

Example: Go Service with sqlx and Raw Queries

Phase 1: SCAN
  Dialect: PostgreSQL (detected from connection string)
  Query library: sqlx
  Queries found: 28 (all raw SQL in repository files)
  SQL files: 12 in queries/ directory

Phase 2: ANALYZE
  [SQL-N1-001] internal/repo/order_repo.go:56
    GetOrdersByUser calls GetItemsByOrderID in loop
  [SQL-IDX-001] queries/search_users.sql:3
    WHERE created_at > $1 AND status = $2 ORDER BY created_at
    Missing composite index on (status, created_at)
  [SQL-PERF-001] queries/report_daily.sql:8
    Correlated subquery for calculating running total
    Recommend: window function SUM() OVER (ORDER BY date)

Phase 3: OPTIMIZE
  N+1: Rewrite with single query using LEFT JOIN and GROUP BY
  Index: CREATE INDEX idx_users_status_created ON users (status, created_at);
  Rewrite: Replace correlated subquery with:
    SELECT date, amount, SUM(amount) OVER (ORDER BY date) as running_total
    FROM daily_revenue;

Phase 4: VALIDATE
  Window function produces identical results: YES
  Index column order matches query pattern: YES (equality on status, range on created_at)
  Report: NEEDS_ATTENTION (1 N+1 error, 1 missing index, 1 query rewrite)

Rationalizations to Reject

Rationalization	Reality
"The ORM handles query optimization automatically"	ORMs generate syntactically correct queries but do not detect N+1 patterns, choose optimal join strategies, or add missing indexes. The ORM executes what the code asks for. A `findMany` followed by per-item `findUnique` calls in a loop is an N+1 regardless of which ORM executes it.
"That endpoint is only called by admins so performance doesn't matter"	Admin endpoints frequently become user-facing as products grow. An N+1 query on a 10-row table becomes a crisis when the table grows to 100,000 rows. Query correctness should not be conditional on current data volume.
"We can add indexes later if performance becomes a problem"	Adding indexes to large production tables requires exclusive locks or online rebuild procedures that carry risk. Identifying and adding the correct index during development, before the table grows, costs minutes instead of hours of planned maintenance.
"That DELETE without a WHERE clause is wrapped in application logic that only calls it correctly"	Application logic has bugs. A missing WHERE clause is a single misrouted request away from deleting the entire table. Database safety constraints must not depend on application-layer correctness.
"The query is fast in development — the test database only has 100 rows"	Development databases do not represent production query plans. Full table scans, missing indexes, and N+1 patterns only manifest at production data volumes. Static analysis catches these issues regardless of local data size.

Gates

No approving N+1 queries in user-facing hot paths. An N+1 query in an endpoint called per page load is always an error. It must be fixed with eager loading, batching, or a JOIN before the PR can merge.
No recommending indexes without checking for duplicates. Before recommending a new index, verify no existing index covers the same columns. Duplicate indexes waste write performance and storage.
No rewriting queries without semantic equivalence verification. Every optimized query must produce the same result set as the original, including NULL handling and ordering. If equivalence cannot be confirmed, flag the rewrite as "needs manual verification."
No ignoring unsafe DELETE/UPDATE without WHERE. These are always errors regardless of context. Even if the developer intends to delete all records, it should be an explicit
```
TRUNCATE
```
or have a documented justification.

Escalation

When query optimization requires schema changes: If the best optimization involves adding a column, denormalizing a table, or changing a relationship, flag it: "Optimal fix for SQL-N1-001 requires a denormalized
```
customer_name
```
column on orders. This is a schema change that needs harness-database review."
When EXPLAIN ANALYZE is needed but no database connection is available: Report that static analysis has limits: "SQL-IDX-001 is flagged based on static analysis. Running EXPLAIN ANALYZE on the actual database would confirm whether a sequential scan is occurring. Consider adding
```
--explain
```
with a database connection."
When an N+1 pattern is intentional: If the developer asserts the N+1 is acceptable (e.g., N is always small, capped at 5), require documentation: "Add a comment explaining the bounded N (max 5) and a test asserting the bound."
When ORM-generated SQL cannot be optimized without dropping to raw SQL: Present the tradeoff: "Prisma cannot express this query efficiently. Options: (A) use
```
$queryRaw
```
for this specific query, (B) accept the suboptimal query with a performance budget note, (C) restructure the data access pattern."