Claude-code-plugins-plus-skills snowflake-performance-tuning

install

source · Clone the upstream repo

git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/snowflake-pack/skills/snowflake-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-snowflake-performance-tuning && rm -rf "$T"

manifest: plugins/saas-packs/snowflake-pack/skills/snowflake-performance-tuning/SKILL.md

Snowflake Performance Tuning

Overview

Optimize Snowflake query performance using clustering keys, materialized views, result caching, query profiling, and warehouse tuning.

Prerequisites

Access to
```
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
```
Understanding of micro-partitions and pruning
Role with
```
MONITOR
```
privilege on warehouses

Instructions

Step 1: Identify Slow Queries

-- Top 20 slowest queries in last 24 hours
SELECT query_id, query_text, total_elapsed_time / 1000 AS seconds,
       bytes_scanned / 1e9 AS gb_scanned,
       partitions_scanned, partitions_total,
       ROUND(partitions_scanned / NULLIF(partitions_total, 0) * 100, 1) AS pct_scanned,
       warehouse_name, warehouse_size
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE execution_status = 'SUCCESS'
  AND start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
  AND query_type = 'SELECT'
ORDER BY total_elapsed_time DESC
LIMIT 20;

-- Queries scanning too many partitions (poor pruning)
SELECT query_id, query_text,
       partitions_scanned, partitions_total,
       bytes_scanned / 1e9 AS gb_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE partitions_scanned > partitions_total * 0.5
  AND partitions_total > 100
  AND start_time >= DATEADD(hours, -24, CURRENT_TIMESTAMP())
ORDER BY partitions_scanned DESC
LIMIT 10;

Step 2: Add Clustering Keys

-- Clustering improves pruning for large tables (> 1TB)
-- Choose columns used in WHERE and JOIN clauses

-- Cluster by date (most common filter)
ALTER TABLE orders CLUSTER BY (order_date);

-- Multi-column clustering
ALTER TABLE events CLUSTER BY (event_date, event_type);

-- Check clustering depth (lower = better)
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date)');

-- Monitor automatic reclustering
SELECT table_name, num_rows, bytes,
       SYSTEM$CLUSTERING_DEPTH('orders') AS clustering_depth
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'ORDERS';

Step 3: Use Materialized Views

-- Pre-compute expensive aggregations
CREATE OR REPLACE MATERIALIZED VIEW daily_revenue_mv
  CLUSTER BY (metric_date)
AS
  SELECT
    DATE_TRUNC('day', order_date) AS metric_date,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
  FROM orders
  GROUP BY DATE_TRUNC('day', order_date);

-- Query the MV instead of base table — automatic rewrite may also apply
SELECT * FROM daily_revenue_mv
WHERE metric_date >= DATEADD(days, -30, CURRENT_DATE());

-- Check MV freshness
SELECT name, is_secure, text, refresh_on
FROM INFORMATION_SCHEMA.MATERIALIZED_VIEWS
WHERE name = 'DAILY_REVENUE_MV';

Step 4: Leverage Result Caching

-- Result cache is ON by default — same query returns instantly
-- Cache is valid for 24 hours if underlying data hasn't changed

-- Check if a query used cache
SELECT query_id, query_text,
       CASE WHEN bytes_scanned = 0 AND rows_produced > 0
            THEN 'CACHE HIT' ELSE 'CACHE MISS' END AS cache_status,
       total_elapsed_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(hours, -1, CURRENT_TIMESTAMP())
ORDER BY start_time DESC;

-- Disable cache for benchmarking
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
-- Re-enable
ALTER SESSION SET USE_CACHED_RESULT = TRUE;

Step 5: Optimize Common Query Patterns

-- ANTI-PATTERN: SELECT * on wide tables
-- SELECT * FROM events;  -- Scans all columns

-- BETTER: Select only needed columns
SELECT event_id, event_type, event_date FROM events WHERE event_date = CURRENT_DATE();

-- ANTI-PATTERN: Cartesian joins
-- SELECT * FROM a, b WHERE a.id = b.id;

-- BETTER: Explicit JOIN with filter pushdown
SELECT a.id, a.name, b.amount
FROM customers a
INNER JOIN orders b ON a.id = b.customer_id
WHERE b.order_date >= '2026-01-01';

-- ANTI-PATTERN: LIKE with leading wildcard
-- WHERE name LIKE '%smith%'  -- Full scan

-- BETTER: Use search optimization service for LIKE queries
ALTER TABLE customers ADD SEARCH OPTIMIZATION ON EQUALITY(name), SUBSTRING(name);

-- ANTI-PATTERN: Many small queries in a loop
-- for row in rows: execute(f"INSERT INTO t VALUES ({row})")

-- BETTER: Batch inserts
INSERT INTO target_table
SELECT * FROM source_table WHERE condition;

Step 6: Query Profile Analysis

-- Use EXPLAIN to see execution plan
EXPLAIN SELECT * FROM orders WHERE order_date = CURRENT_DATE();

-- Get query profile data programmatically
SELECT *
FROM TABLE(GET_QUERY_OPERATOR_STATS('<query_id>'));

-- Key metrics to watch:
-- TableScan: partitions_scanned vs partitions_total
-- Filter: if filter is AFTER scan, consider clustering
-- Sort: high spilling_to_remote_storage = needs bigger warehouse
-- Join: broadcast vs hash, skew detection

Step 7: Warehouse Tuning

-- Match warehouse size to workload
-- Small: simple queries, < 100GB scans
-- Medium: moderate joins, 100GB-1TB
-- Large: complex analytics, > 1TB scans

-- Scale up for single-query performance
ALTER WAREHOUSE ANALYTICS_WH SET WAREHOUSE_SIZE = 'LARGE';

-- Scale out for concurrent queries (multi-cluster)
ALTER WAREHOUSE ANALYTICS_WH SET
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 4
  SCALING_POLICY = 'STANDARD';

-- Monitor warehouse efficiency
SELECT warehouse_name,
       SUM(credits_used) AS total_credits,
       COUNT(DISTINCT query_id) AS total_queries,
       SUM(credits_used) / NULLIF(COUNT(DISTINCT query_id), 0) AS credits_per_query
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY q
JOIN SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY w
  ON q.warehouse_name = w.warehouse_name
  AND DATE_TRUNC('hour', q.start_time) = w.start_time
WHERE q.start_time >= DATEADD(days, -7, CURRENT_TIMESTAMP())
GROUP BY warehouse_name
ORDER BY total_credits DESC;

Performance Benchmarks

Optimization	Typical Improvement
Clustering key on filter column	10-100x fewer partitions scanned
Materialized view	10-1000x for aggregation queries
Result cache hit	Instant (0ms scan)
Column pruning (SELECT specific cols)	2-10x less data scanned
Search optimization service	10-100x for point lookups

Error Handling

Issue	Cause	Solution
Spilling to disk	Warehouse too small for query	Scale up warehouse size
High partition scan ratio	No clustering on filter column	Add clustering key
MV stale	Background refresh delayed	Check MV refresh status
Cache miss on same query	Data changed or session setting	Verify `USE_CACHED_RESULT = TRUE`

Resources

Next Steps

For cost optimization, see

snowflake-cost-tuning