Claude-code-plugins-plus snowflake-architecture-variants
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/snowflake-pack/skills/snowflake-architecture-variants" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-snowflake-architecture-variants && rm -rf "$T"
manifest:
plugins/saas-packs/snowflake-pack/skills/snowflake-architecture-variants/SKILL.mdsource content
Snowflake Architecture Variants
Overview
Three validated architecture blueprints for Snowflake deployments: traditional data warehouse, lakehouse with Iceberg, and data mesh with data sharing.
Variant A: Traditional Data Warehouse
Best for: Single team, centralized analytics, < 50 users
┌──────────────────────────┐ │ Snowflake Account │ │ │ │ ┌────────┐ ┌────────┐ │ │ │ Bronze │→ │ Silver │→ Gold │ │ └────────┘ └────────┘ │ │ │ │ ┌─────────────────────┐ │ │ │ Single ETL Warehouse │ │ │ └─────────────────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ BI Tools │ │ Analysts │ │ │ └──────────┘ └──────────┘ │ └──────────────────────────────┘
-- Simple single-account setup CREATE DATABASE DW; CREATE SCHEMA DW.RAW; CREATE SCHEMA DW.CURATED; CREATE SCHEMA DW.ANALYTICS; CREATE WAREHOUSE ETL_WH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 120; CREATE WAREHOUSE QUERY_WH WAREHOUSE_SIZE = 'SMALL' AUTO_SUSPEND = 60;
Variant B: Lakehouse with Iceberg Tables
Best for: Hybrid cloud/on-prem, existing data lake, open table format requirement
┌──────────────────────┐ ┌─────────────────────┐ │ External Storage │ │ Snowflake Account │ │ (S3/GCS/Azure) │ │ │ │ │ │ ┌────────────────┐ │ │ ┌─────────────┐ │←───→│ │ Iceberg Tables │ │ │ │ Parquet/ │ │ │ │ (managed) │ │ │ │ Iceberg │ │ │ └────────────────┘ │ │ │ files │ │ │ │ │ └─────────────┘ │ │ ┌────────────────┐ │ │ │ │ │ Native Tables │ │ │ ┌─────────────┐ │ │ │ (hot data) │ │ │ │ Spark/Flink │ │ │ └────────────────┘ │ │ │ (external) │ │ │ │ │ └─────────────┘ │ │ ┌────────────────┐ │ └──────────────────────┘ │ │ Dynamic Tables │ │ │ │ (transforms) │ │ │ └────────────────┘ │ └──────────────────────┘
-- Iceberg table backed by external storage CREATE ICEBERG TABLE events_iceberg ( event_id STRING, event_type STRING, event_data VARIANT, event_timestamp TIMESTAMP_NTZ ) CATALOG = 'SNOWFLAKE' EXTERNAL_VOLUME = 'my_s3_volume' BASE_LOCATION = 'iceberg/events/'; -- External volume for S3 CREATE EXTERNAL VOLUME my_s3_volume STORAGE_LOCATIONS = ( (NAME = 'primary' STORAGE_BASE_URL = 's3://my-data-lake/' STORAGE_PROVIDER = 'S3' STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789:role/snowflake-iceberg') ); -- Dynamic Iceberg table for transforms (writes back to your storage) CREATE DYNAMIC ICEBERG TABLE curated_events TARGET_LAG = '30 minutes' WAREHOUSE = ETL_WH CATALOG = 'SNOWFLAKE' EXTERNAL_VOLUME = 'my_s3_volume' BASE_LOCATION = 'iceberg/curated_events/' AS SELECT event_id, event_type, event_data, event_timestamp, CURRENT_TIMESTAMP() AS processed_at FROM events_iceberg WHERE event_type IS NOT NULL;
Variant C: Data Mesh with Data Sharing
Best for: Multi-team, multi-account, decentralized ownership
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Finance Account │ │ Marketing Acct │ │ Engineering │ │ │ │ │ │ Account │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ Finance DB │ │ │ │ Marketing │ │ │ │ Product DB │ │ │ │ (owner) │──┼──→│ │ DB (owner) │──┼──→│ │ (owner) │ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ │ │ │ │ │ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │ Shared: │ │ │ │ Shared: │ │ │ │ Shared: │ │ │ │ Product, │←─┼───┼──│ Finance │←─┼───┼──│ Marketing, │ │ │ │ Marketing │ │ │ │ Product │ │ │ │ Finance │ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ └─────────────────┘ └─────────────────┘ └─────────────────┘
-- PROVIDER: Create a share from Finance account CREATE SHARE finance_share; GRANT USAGE ON DATABASE FINANCE_DW TO SHARE finance_share; GRANT USAGE ON SCHEMA FINANCE_DW.GOLD TO SHARE finance_share; -- Only share secure views (hides underlying SQL) CREATE SECURE VIEW FINANCE_DW.GOLD.REVENUE_SUMMARY AS SELECT region, product_line, SUM(revenue) AS total_revenue, COUNT(DISTINCT customer_id) AS customer_count FROM FINANCE_DW.SILVER.TRANSACTIONS GROUP BY region, product_line; GRANT SELECT ON VIEW FINANCE_DW.GOLD.REVENUE_SUMMARY TO SHARE finance_share; -- Add consumer accounts ALTER SHARE finance_share ADD ACCOUNTS = myorg.marketing_account, myorg.engineering_account; -- CONSUMER: Create database from share CREATE DATABASE FINANCE_SHARED FROM SHARE myorg.finance_account.finance_share; -- Zero-copy, real-time, no data movement -- Query shared data as if it's local SELECT * FROM FINANCE_SHARED.GOLD.REVENUE_SUMMARY WHERE region = 'North America';
Variant D: Snowpark-Native Application
Best for: ML/AI workloads, Python-heavy teams, stored procedure logic
# Snowpark Python — run Python natively inside Snowflake from snowflake.snowpark import Session from snowflake.snowpark.functions import col, sum as sf_sum, avg # Create session session = Session.builder.configs({ "account": os.environ['SNOWFLAKE_ACCOUNT'], "user": os.environ['SNOWFLAKE_USER'], "password": os.environ['SNOWFLAKE_PASSWORD'], "warehouse": "ML_WH", "database": "PROD_DW", "schema": "GOLD", }).create() # DataFrame API (lazy evaluation, pushdown to Snowflake) orders_df = session.table("orders") revenue = ( orders_df .filter(col("order_date") >= "2026-01-01") .group_by("customer_id") .agg( sf_sum("amount").alias("total_spend"), avg("amount").alias("avg_order"), ) .filter(col("total_spend") > 1000) .sort(col("total_spend").desc()) ) revenue.show() # Executes in Snowflake, not locally # Register as stored procedure (runs inside Snowflake) @session.sproc(name="train_model", replace=True, is_permanent=True, stage_location="@ML_STAGE", packages=["scikit-learn"]) def train_model(session: Session, table_name: str) -> str: df = session.table(table_name).to_pandas() from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit(df[['feature1', 'feature2']], df['label']) return f"Trained on {len(df)} rows, score: {model.score(...)}" # Register UDF @session.udf(name="predict_churn", replace=True, is_permanent=True, stage_location="@ML_STAGE") def predict_churn(tenure: int, monthly_charge: float) -> float: # Model loaded from stage at runtime return model.predict_proba([[tenure, monthly_charge]])[0][1]
Decision Matrix
| Factor | Traditional DW | Lakehouse | Data Mesh | Snowpark |
|---|---|---|---|---|
| Team Size | 1-10 | 5-30 | 10+ (multi-team) | 3-20 |
| Data Volume | Any | Large (10TB+) | Any | Any |
| External Tools | BI only | Spark, Flink, Presto | BI per domain | Python/ML |
| Governance | Centralized | Centralized | Federated | Centralized |
| Complexity | Low | Medium | High | Medium |
| Cost Model | Compute + storage | Reduced storage | Per-domain | Compute-heavy |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Share access denied | Consumer not added | |
| Iceberg sync delay | External catalog lag | Check external volume config |
| Snowpark OOM | Large DataFrame | Use not for large data |
| Cross-account query slow | Network latency | Deploy in same region |
Resources
Next Steps
For common anti-patterns, see
snowflake-known-pitfalls.