Babysitter sql-query-optimizer

Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.

install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-engineering-analytics/skills/sql-query-optimizer" ~/.claude/skills/a5c-ai-babysitter-sql-query-optimizer && rm -rf "$T"
manifest: library/specializations/data-engineering-analytics/skills/sql-query-optimizer/SKILL.md
source content

SQL Query Optimizer

Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.

Overview

This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.

Capabilities

  • Query execution plan analysis - Parse and analyze EXPLAIN outputs
  • Index recommendations - Suggest clustering keys, sort keys, partition keys
  • Join optimization - Identify inefficient join patterns and suggest improvements
  • Subquery elimination - Convert correlated subqueries to CTEs or joins
  • CTE optimization - Materialize vs reference optimization
  • Window function optimization - Frame and partition optimization
  • Predicate pushdown validation - Verify filter pushdown effectiveness
  • Clustering key recommendations - Platform-specific clustering strategies
  • Materialized view suggestions - Identify candidates for materialized views
  • Platform-specific optimizations - Snowflake, BigQuery, Redshift, Databricks

Input Schema

{
  "query": {
    "type": "string",
    "description": "The SQL query to analyze",
    "required": true
  },
  "platform": {
    "type": "string",
    "enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
    "required": true,
    "description": "Target data warehouse platform"
  },
  "tableStatistics": {
    "type": "object",
    "description": "Table statistics including row counts, column cardinality",
    "properties": {
      "tables": {
        "type": "array",
        "items": {
          "name": "string",
          "rowCount": "number",
          "sizeGB": "number",
          "columns": "array"
        }
      }
    }
  },
  "executionPlan": {
    "type": "object",
    "description": "Query execution plan (EXPLAIN output)"
  },
  "queryHistory": {
    "type": "object",
    "description": "Historical query performance metrics"
  },
  "optimizationGoals": {
    "type": "array",
    "items": {
      "type": "string",
      "enum": ["latency", "cost", "throughput", "scan_reduction"]
    },
    "default": ["latency", "cost"]
  }
}

Output Schema

{
  "optimizedQuery": {
    "type": "string",
    "description": "The optimized SQL query"
  },
  "improvements": {
    "type": "array",
    "items": {
      "type": {
        "type": "string",
        "enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
      },
      "description": "string",
      "impact": "high|medium|low",
      "lineNumber": "number",
      "originalCode": "string",
      "optimizedCode": "string"
    }
  },
  "indexRecommendations": {
    "type": "array",
    "items": {
      "table": "string",
      "type": "clustering|sort|partition|index",
      "columns": "array",
      "rationale": "string",
      "ddl": "string"
    }
  },
  "estimatedImprovement": {
    "scanReduction": {
      "type": "number",
      "description": "Percentage reduction in data scanned"
    },
    "timeReduction": {
      "type": "number",
      "description": "Percentage reduction in execution time"
    },
    "costReduction": {
      "type": "number",
      "description": "Percentage reduction in query cost"
    }
  },
  "antiPatterns": {
    "type": "array",
    "items": {
      "pattern": "string",
      "severity": "high|medium|low",
      "location": "string",
      "suggestion": "string"
    }
  },
  "platformSpecificNotes": {
    "type": "array",
    "items": "string"
  }
}

Usage Examples

Basic Query Optimization

{
  "query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
  "platform": "snowflake"
}

With Execution Plan Analysis

{
  "query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
  "platform": "bigquery",
  "executionPlan": {
    "stages": [...],
    "totalBytesProcessed": 1073741824
  },
  "optimizationGoals": ["cost", "scan_reduction"]
}

With Table Statistics

{
  "query": "SELECT ... complex query ...",
  "platform": "redshift",
  "tableStatistics": {
    "tables": [
      {
        "name": "orders",
        "rowCount": 10000000,
        "sizeGB": 50,
        "columns": [
          {"name": "order_id", "cardinality": 10000000},
          {"name": "customer_id", "cardinality": 500000}
        ]
      }
    ]
  }
}

Platform-Specific Optimizations

Snowflake

OptimizationDescription
Clustering keysRecommend micro-partition clustering
Result cacheIdentify queries benefiting from caching
Query accelerationSuggest QUERY_ACCELERATION_MAX_SCALE_FACTOR
Warehouse sizingRight-size warehouse recommendations

BigQuery

OptimizationDescription
PartitioningDATE/TIMESTAMP partitioning recommendations
ClusteringUp to 4 clustering columns
BI EngineIdentify BI Engine-eligible queries
SlotsEstimate slot usage optimization

Redshift

OptimizationDescription
Sort keysCOMPOUND vs INTERLEAVED recommendations
DistributionKEY, EVEN, ALL distribution strategies
CompressionColumn encoding recommendations
VacuumVACUUM and ANALYZE recommendations

Databricks

OptimizationDescription
Z-orderingMulti-column Z-order recommendations
Delta cacheCaching strategy recommendations
PhotonPhoton-eligible query patterns
Adaptive executionAQE configuration suggestions

Common Anti-Patterns Detected

Query Structure

Anti-PatternImpactFix
SELECT *HighSpecify columns explicitly
Correlated subqueriesHighConvert to JOIN or CTE
DISTINCT on large datasetsMediumUse GROUP BY or window functions
Non-SARGable predicatesHighRewrite for index usage

Join Issues

Anti-PatternImpactFix
Cartesian productsCriticalAdd join conditions
Implicit joinsMediumUse explicit JOIN syntax
Wrong join orderHighReorder by selectivity
Missing indexes on join keysHighAdd clustering/sort keys

Aggregation Issues

Anti-PatternImpactFix
GROUP BY ordinalLowUse column names
Aggregating before filterHighFilter first, then aggregate
Over-groupingMediumReduce GROUP BY columns

Integration Points

MCP Server Integration

  • Snowflake MCP - Real-time execution plan analysis
  • BigQuery MCP - Cost estimation and slot analysis
  • Redshift MCP - Query execution and statistics

Related Skills

  • Data Quality Profiler (SK-DEA-005) - Table statistics gathering
  • dbt Project Analyzer (SK-DEA-003) - Model query optimization

Applicable Processes

  • Query Optimization (
    query-optimization.js
    )
  • Data Warehouse Setup (
    data-warehouse-setup.js
    )
  • BI Dashboard Development (
    bi-dashboard.js
    )
  • OBT Creation (
    obt-creation.js
    )

References

Version History

  • 1.0.0 - Initial release with multi-platform support