Claude-code-plugins-plus-skills snowflake-migration-deep-dive
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/snowflake-pack/skills/snowflake-migration-deep-dive" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-snowflake-migration-deep-dive && rm -rf "$T"
manifest:
plugins/saas-packs/snowflake-pack/skills/snowflake-migration-deep-dive/SKILL.mdsource content
Snowflake Migration Deep Dive
Overview
Comprehensive guide for migrating to Snowflake from Redshift, BigQuery, on-prem databases, or other data warehouses.
Migration Types
| Source | Complexity | Duration | Key Challenge |
|---|---|---|---|
| Amazon Redshift | Medium | 2-6 weeks | SQL dialect differences |
| Google BigQuery | Medium | 2-6 weeks | Nested/repeated fields |
| On-prem (Oracle/SQL Server) | High | 1-3 months | Data transfer bandwidth |
| Another Snowflake account | Low | Days | Replication or data sharing |
Instructions
Step 1: Schema Conversion
-- Common SQL differences from Redshift/BigQuery -- Redshift DISTKEY/SORTKEY → Snowflake clustering (optional, for large tables) -- Redshift: CREATE TABLE orders (id INT) DISTSTYLE KEY DISTKEY(customer_id) SORTKEY(order_date); -- Snowflake: CREATE TABLE orders ( id INTEGER AUTOINCREMENT, customer_id INTEGER, order_date TIMESTAMP_NTZ ); ALTER TABLE orders CLUSTER BY (order_date); -- Only for tables > 1TB -- Redshift IDENTITY → Snowflake AUTOINCREMENT -- Redshift: id INT IDENTITY(1,1) -- Snowflake: id INTEGER AUTOINCREMENT START 1 INCREMENT 1 -- BigQuery STRUCT/ARRAY → Snowflake VARIANT/ARRAY -- BigQuery: address STRUCT<street STRING, city STRING> -- Snowflake: CREATE TABLE customers ( id INTEGER, address VARIANT -- Store as JSON: {"street": "...", "city": "..."} ); -- Access: SELECT address:street::VARCHAR FROM customers -- BigQuery REPEATED fields → Snowflake ARRAY -- BigQuery: tags ARRAY<STRING> -- Snowflake: tags ARRAY -- Data types mapping -- Redshift VARCHAR(MAX) → Snowflake VARCHAR (16MB max) -- Redshift TIMESTAMPTZ → Snowflake TIMESTAMP_TZ -- BigQuery INT64 → Snowflake NUMBER(38,0) -- BigQuery FLOAT64 → Snowflake FLOAT -- BigQuery BYTES → Snowflake BINARY -- Oracle CLOB → Snowflake VARCHAR -- SQL Server DATETIME2 → Snowflake TIMESTAMP_NTZ
Step 2: Data Transfer Methods
# Method 1: Through cloud storage (recommended for large datasets) # From Redshift → S3 → Snowflake # Step A: Unload from Redshift to S3 psql -h redshift-cluster.xxx.region.redshift.amazonaws.com -d mydb -c " UNLOAD ('SELECT * FROM orders') TO 's3://migration-bucket/redshift/orders/' IAM_ROLE 'arn:aws:iam::123456789:role/RedshiftUnload' FORMAT PARQUET; " # Step B: Load from S3 to Snowflake snowsql -c prod -q " CREATE STAGE migration_stage STORAGE_INTEGRATION = s3_integration URL = 's3://migration-bucket/redshift/'; COPY INTO orders FROM @migration_stage/orders/ FILE_FORMAT = (TYPE = 'PARQUET') MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE; "
# Method 2: Direct transfer via Python (for smaller tables) import snowflake.connector import pandas as pd from snowflake.connector.pandas_tools import write_pandas # Read from source (Redshift example) import psycopg2 source_conn = psycopg2.connect( host='redshift-cluster.xxx.redshift.amazonaws.com', dbname='source_db', user='admin', password='***', port=5439 ) df = pd.read_sql('SELECT * FROM orders', source_conn) print(f"Read {len(df)} rows from Redshift") # Write to Snowflake sf_conn = snowflake.connector.connect( account=os.environ['SNOWFLAKE_ACCOUNT'], user=os.environ['SNOWFLAKE_USER'], password=os.environ['SNOWFLAKE_PASSWORD'], warehouse='ETL_WH', database='PROD_DW', schema='SILVER', ) success, nchunks, nrows, _ = write_pandas(sf_conn, df, 'ORDERS') print(f"Loaded {nrows} rows to Snowflake in {nchunks} chunks")
Step 3: Data Validation
-- Row count comparison SELECT 'orders' AS table_name, (SELECT COUNT(*) FROM prod_dw.silver.orders) AS snowflake_count, 12345678 AS source_count, -- Replace with actual source count (SELECT COUNT(*) FROM prod_dw.silver.orders) - 12345678 AS diff; -- Checksum validation (aggregate comparison) SELECT COUNT(*) AS row_count, SUM(amount) AS total_amount, MIN(order_date) AS min_date, MAX(order_date) AS max_date, COUNT(DISTINCT customer_id) AS unique_customers FROM prod_dw.silver.orders; -- Compare these values with source system -- Sample-based validation SELECT * FROM prod_dw.silver.orders WHERE order_id IN (1001, 5000, 10000, 50000, 100000) ORDER BY order_id; -- Compare row-by-row with source -- Data type validation SELECT column_name, data_type, is_nullable, character_maximum_length, numeric_precision, numeric_scale FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'ORDERS' ORDER BY ordinal_position;
Step 4: Query Migration
-- Common SQL translation patterns -- Redshift: GETDATE() → Snowflake: CURRENT_TIMESTAMP() -- Redshift: DATEDIFF(day, a, b) → Snowflake: DATEDIFF('day', a, b) (string date part) -- Redshift: NVL(a, b) → Snowflake: COALESCE(a, b) or NVL(a, b) (both work) -- Redshift: LISTAGG(col, ',') → Snowflake: LISTAGG(col, ',') (same) -- Redshift: DECODE(a, 1, 'x', 2, 'y') → Snowflake: DECODE(a, 1, 'x', 2, 'y') (same) -- BigQuery: SAFE_DIVIDE(a, b) → Snowflake: DIV0(a, b) or a / NULLIF(b, 0) -- BigQuery: FORMAT_DATE('%Y-%m', date) → Snowflake: TO_CHAR(date, 'YYYY-MM') -- BigQuery: UNNEST(array) → Snowflake: LATERAL FLATTEN(input => array) -- BigQuery: STRUCT access a.b.c → Snowflake: a:b:c (colon path notation) -- Example: BigQuery UNNEST → Snowflake FLATTEN -- BigQuery: -- SELECT id, tag FROM orders, UNNEST(tags) AS tag -- Snowflake: SELECT o.id, f.value::VARCHAR AS tag FROM orders o, LATERAL FLATTEN(input => o.tags) f;
Step 5: Cutover Plan
Week 1-2: Setup ├─ Create Snowflake account and configure ├─ Design schema (converted from source) ├─ Set up storage integration for data transfer └─ Create roles, warehouses, resource monitors Week 3-4: Data Migration ├─ Full historical load via cloud storage ├─ Validate row counts and checksums ├─ Convert and test critical queries └─ Set up ongoing CDC (if parallel run needed) Week 5-6: Parallel Run ├─ Run both systems simultaneously ├─ Compare query results between source and Snowflake ├─ Migrate BI tools to point at Snowflake └─ Train users on Snowflake SQL differences Week 7: Cutover ├─ Final delta sync from source ├─ Switch all connections to Snowflake ├─ Decommission source system (after validation period) └─ Document and postmortem
Rollback Plan
-- Keep source system running for rollback period (2-4 weeks) -- If rollback needed: -- 1. Redirect connections back to source -- 2. Sync any new data from Snowflake back to source (if needed) -- 3. Document what went wrong -- Snowflake Time Travel as safety net during migration ALTER DATABASE PROD_DW SET DATA_RETENTION_TIME_IN_DAYS = 30;
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Data type mismatch | Schema conversion error | Use TRY_CAST for safe conversion |
| Row count mismatch | Duplicate handling differs | Check dedup logic in source vs target |
| Query results differ | SQL dialect difference | Test each function translation |
| Transfer too slow | Large dataset, small warehouse | Use LARGE warehouse for COPY INTO |
| Parquet schema evolution | Column added mid-migration | Use |
Resources
Next Steps
For advanced troubleshooting, see
snowflake-advanced-troubleshooting.