Skills motherduck
Expert guidance for MotherDuck, the serverless analytics platform built on DuckDB that combines local and cloud query execution. Helps developers run SQL analytics on cloud-hosted data, share datasets, and build hybrid local-cloud data pipelines using DuckDB's familiar interface.
git clone https://github.com/TerminalSkills/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/motherduck" ~/.claude/skills/terminalskills-skills-motherduck && rm -rf "$T"
skills/motherduck/SKILL.md- pip install
- references .env files
MotherDuck — Serverless DuckDB in the Cloud
Overview
MotherDuck, the serverless analytics platform built on DuckDB that combines local and cloud query execution. Helps developers run SQL analytics on cloud-hosted data, share datasets, and build hybrid local-cloud data pipelines using DuckDB's familiar interface.
Instructions
Connection and Setup
# Connect to MotherDuck from Python import duckdb # Authenticate (opens browser for first-time auth) conn = duckdb.connect("md:") # Or with explicit token conn = duckdb.connect("md:?motherduck_token=your_token_here") # Create a cloud database conn.sql("CREATE DATABASE IF NOT EXISTS analytics") conn.sql("USE analytics") # Cloud databases persist — accessible from any machine
Hybrid Queries (Local + Cloud)
# Query that combines local files with cloud tables import duckdb conn = duckdb.connect("md:analytics") # Upload a local Parquet file to MotherDuck conn.sql(""" CREATE OR REPLACE TABLE events AS SELECT * FROM read_parquet('local_data/events_2026_q1.parquet') """) # Query cloud data alongside local files result = conn.sql(""" SELECT e.event_type, COUNT(*) AS event_count, AVG(e.duration_ms) AS avg_duration, u.plan AS user_plan FROM analytics.events e -- Cloud table JOIN read_parquet('local_data/users.parquet') u -- Local file ON e.user_id = u.id WHERE e.created_at >= '2026-01-01' GROUP BY 1, 4 ORDER BY 2 DESC """).df() print(result)
Data Loading
# Load data from various sources into MotherDuck conn = duckdb.connect("md:analytics") # From CSV files (local or URL) conn.sql(""" CREATE OR REPLACE TABLE sales AS SELECT * FROM read_csv('https://data.example.com/sales_2026.csv', header=true, auto_detect=true) """) # From Parquet files on S3 conn.sql(""" CREATE OR REPLACE TABLE events AS SELECT * FROM read_parquet('s3://my-bucket/events/*.parquet', hive_partitioning=true) """) # From JSON files conn.sql(""" CREATE OR REPLACE TABLE logs AS SELECT * FROM read_json_auto('logs/*.json') """) # Incremental loading — append new data conn.sql(""" INSERT INTO events SELECT * FROM read_parquet('s3://my-bucket/events/2026-03/*.parquet') WHERE event_id NOT IN (SELECT event_id FROM events) """)
Sharing Databases
# Share a database with team members conn = duckdb.connect("md:") # Create a shared database conn.sql("CREATE DATABASE team_analytics") conn.sql("USE team_analytics") # Load data conn.sql(""" CREATE TABLE monthly_metrics AS SELECT date_trunc('month', created_at) AS month, COUNT(DISTINCT user_id) AS mau, SUM(revenue) AS total_revenue, COUNT(*) AS total_events FROM analytics.events GROUP BY 1 ORDER BY 1 """) # Share with specific users (via MotherDuck UI or SQL) conn.sql("GRANT SELECT ON DATABASE team_analytics TO 'colleague@company.com'") # Recipients connect and query immediately: # conn = duckdb.connect("md:team_analytics") # conn.sql("SELECT * FROM monthly_metrics").df()
Node.js / TypeScript Integration
// src/analytics/motherduck.ts — MotherDuck in a Node.js application import duckdb from "duckdb-async"; const db = await duckdb.Database.create("md:analytics", { motherduck_token: process.env.MOTHERDUCK_TOKEN!, }); // Run analytics queries from your API async function getMonthlyRevenue(months: number = 12) { const result = await db.all(` SELECT date_trunc('month', created_at) AS month, SUM(amount) AS revenue, COUNT(DISTINCT user_id) AS paying_users FROM orders WHERE created_at >= current_date - interval '${months} months' AND status = 'completed' GROUP BY 1 ORDER BY 1 `); return result; } // Use in an API endpoint app.get("/api/analytics/revenue", async (req, res) => { const data = await getMonthlyRevenue(parseInt(req.query.months) || 12); res.json(data); });
Scheduled Transformations
# scripts/daily_transform.py — Run as a cron job to update materialized tables import duckdb conn = duckdb.connect("md:analytics") # Daily aggregation — rebuild summary tables conn.sql(""" CREATE OR REPLACE TABLE daily_kpis AS SELECT DATE(created_at) AS date, COUNT(DISTINCT user_id) AS dau, COUNT(DISTINCT CASE WHEN event_type = 'signup' THEN user_id END) AS signups, COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS purchasers, SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) AS revenue, COUNT(*) AS total_events FROM events WHERE created_at >= current_date - interval '90 days' GROUP BY 1 ORDER BY 1 """) # Cohort analysis table conn.sql(""" CREATE OR REPLACE TABLE cohort_retention AS WITH first_seen AS ( SELECT user_id, MIN(DATE(created_at)) AS cohort_date FROM events GROUP BY 1 ) SELECT f.cohort_date, DATE_DIFF('week', f.cohort_date, DATE(e.created_at)) AS week_number, COUNT(DISTINCT e.user_id) AS active_users FROM events e JOIN first_seen f ON e.user_id = f.user_id WHERE f.cohort_date >= current_date - interval '12 weeks' GROUP BY 1, 2 """) print("✅ Daily transformations complete")
Installation
# Python pip install duckdb # Node.js npm install duckdb-async # CLI (DuckDB CLI with MotherDuck support) # Download from https://motherduck.com/docs/getting-started/
Examples
Example 1: Setting up an evaluation pipeline for a RAG application
User request:
I have a RAG chatbot that answers questions from our docs. Set up Motherduck to evaluate answer quality.
The agent creates an evaluation suite with appropriate metrics (faithfulness, relevance, answer correctness), configures test datasets from real user questions, runs baseline evaluations, and sets up CI integration so evaluations run on every prompt or retrieval change.
Example 2: Comparing model performance across prompts
User request:
We're testing GPT-4o vs Claude on our customer support prompts. Set up a comparison with Motherduck.
The agent creates a structured experiment with the existing prompt set, configures both model providers, defines scoring criteria specific to customer support (accuracy, tone, completeness), runs the comparison, and generates a summary report with statistical significance indicators.
Guidelines
- Hybrid execution — MotherDuck pushes queries to the cloud OR runs locally depending on where data lives; leverage both
- Parquet for everything — Store raw data as Parquet files on S3; query them directly without loading into tables
- Materialized tables for dashboards — Pre-compute daily/weekly aggregates; don't recalculate on every dashboard load
- Share databases, not exports — Instead of emailing CSVs, share a MotherDuck database; recipients get live, queryable data
- Use DuckDB locally for development — Write and test queries locally with DuckDB; deploy the same SQL to MotherDuck
- Token management — Store
in environment variables; never commit tokens to codeMOTHERDUCK_TOKEN - Incremental loads — For large datasets, append new data with
; avoid full reloadsINSERT INTO ... WHERE NOT IN - Cost awareness — MotherDuck charges by query compute; pre-aggregate expensive queries into summary tables