Agent-skills motherduck-load-data
Load data into MotherDuck from local files, object storage, HTTPS, dataframes, or external databases. Use when choosing a MotherDuck-specific ingestion path, especially CTAS and INSERT...SELECT, bulk loading, secrets, and Postgres-endpoint versus DuckDB-client tradeoffs.
install
source · Clone the upstream repo
git clone https://github.com/motherduckdb/agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/motherduckdb/agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/motherduck-skills/skills/motherduck-load-data" ~/.claude/skills/motherduckdb-agent-skills-motherduck-load-data-6863bf && rm -rf "$T"
manifest:
plugins/motherduck-skills/skills/motherduck-load-data/SKILL.mdsource content
Load Data into MotherDuck
Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.
Source Of Truth
- Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
- Use
and cloud-storage docs for protected-object-store workflows.CREATE SECRET - Keep the loading advice aligned with MotherDuck's documented posture:
- batch over streaming
- Parquet over CSV when you control the format
- dataframe,
, CTAS, orCOPY
over row-by-row insertsINSERT ... SELECT - native MotherDuck storage first unless DuckLake is explicitly required
Default Posture
- Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.
- Prefer
for first loads andCREATE TABLE AS SELECT
for appends.INSERT INTO ... SELECT - Use Parquet for durable bulk movement whenever you control the source format.
- Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
- Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
- Keep source storage close to the MotherDuck region when you control placement.
Workflow
- Identify where the source data actually lives.
- Choose the loading path:
- object storage or HTTPS: remote read into MotherDuck
- local file or local DuckDB: use a DuckDB client path
- in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
- external database: use the appropriate scan or replication path from a DuckDB-capable environment
- Land the data into a raw or staging table with minimal transformation.
- Validate row counts, types, and a few business aggregates immediately after the load.
- Promote into modeled tables only after the landing step is correct.
Open Next
for format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patternsreferences/INGESTION_PATTERNS.md
Related Skills
for choosing between the Postgres endpoint and a DuckDB client pathmotherduck-connect
for inspecting destination databases and validating landed tablesmotherduck-explore
for writing CTAS, append, and validation SQLmotherduck-query
for promoting landed data into staging and analytics tablesmotherduck-model-data
only when object-storage-backed lakehouse storage is an explicit requirementmotherduck-ducklake