Asi ducklake
Create, query, migrate, and maintain DuckLake lakehouse databases using the DuckDB ducklake extension. Use when working with DuckLake catalogs, time travel queries, snapshots, schema evolution, cross-lake federation, or migrating plain DuckDB files to DuckLake format.
git clone https://github.com/plurigrid/asi
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ducklake" ~/.claude/skills/plurigrid-asi-ducklake && rm -rf "$T"
skills/ducklake/SKILL.mdDuckLake
DuckLake is an integrated data lake and catalog format from the DuckDB team. It stores metadata in a catalog database (DuckDB, PostgreSQL, SQLite, MySQL) and data as Parquet files. MIT licensed.
Docs: https://ducklake.select/docs/stable/
Setup
INSTALL ducklake; LOAD ducklake;
Connecting
-- DuckDB catalog (auto-creates if not exists, data in .files/ sibling dir) ATTACH 'ducklake:my_catalog.ducklake' AS my_lake; -- PostgreSQL catalog + S3 storage ATTACH 'ducklake:postgres:dbname=catalog host=pg_host' AS my_lake (DATA_PATH 's3://bucket/data/'); -- SQLite catalog ATTACH 'ducklake:sqlite:metadata.sqlite' AS my_lake (DATA_PATH 'data_files/'); -- Read-only ATTACH 'ducklake:my_catalog.ducklake' (READ_ONLY); -- At specific snapshot ATTACH 'ducklake:my_catalog.ducklake' (SNAPSHOT_VERSION 3); ATTACH 'ducklake:my_catalog.ducklake' (SNAPSHOT_TIME '2025-01-15 00:00:00');
Snapshots
Every mutation creates a snapshot. List them:
SELECT * FROM my_lake.snapshots(); SELECT * FROM my_lake.current_snapshot();
Add commit messages inside transactions:
BEGIN; INSERT INTO my_lake.events VALUES (...); CALL my_lake.set_commit_message('alice', 'Daily ingest', extra_info => '{"source": "amp"}'); COMMIT;
Time Travel
SELECT * FROM tbl AT (VERSION => 3); SELECT * FROM tbl AT (TIMESTAMP => now() - INTERVAL '1 week');
Schema Evolution
Add/drop/rename columns freely. Each change creates a new snapshot:
ALTER TABLE my_lake.tbl ADD COLUMN new_col VARCHAR; ALTER TABLE my_lake.tbl DROP COLUMN old_col; ALTER TABLE my_lake.tbl RENAME COLUMN x TO y;
Cross-Lake Federation
Attach multiple DuckLakes and JOIN across them:
ATTACH 'ducklake:lake_a.ducklake' AS a; ATTACH 'ducklake:lake_b.ducklake' AS b; SELECT * FROM a.events JOIN b.users ON a.events.user_id = b.users.id;
Maintenance
Run periodically or use
CHECKPOINT:
-- Merge small Parquet files (after many small inserts) CALL my_lake.merge_adjacent_files('my_table'); -- Expire old snapshots (frees time travel before cutoff) CALL my_lake.expire_snapshots(TIMESTAMP '2025-01-01'); -- Cleanup unreferenced files after expiry CALL my_lake.cleanup_files(); -- Rewrite files with many deletes CALL my_lake.rewrite_data_files('my_table'); -- All-in-one CHECKPOINT my_lake;
Migrating DuckDB to DuckLake
COPY FROM DATABASE works but fails on:
- Non-literal defaults (e.g.,
) -- useCURRENT_TIMESTAMP
insteadCREATE TABLE AS SELECT - PRIMARY KEY / UNIQUE constraints -- not supported in DuckLake by design
Workaround pattern:
ATTACH 'old.duckdb' AS src; ATTACH 'ducklake:new.ducklake' AS dst; CREATE TABLE dst.my_table AS SELECT * FROM src.my_table;
Unsupported Features
Will not be supported: PRIMARY KEY, UNIQUE, FOREIGN KEY, indexes, sequences, VARINT, BITSTRING, UNION types.
May be supported later: User-defined types, ARRAY type, ENUM type, CHECK constraints, non-literal defaults, generated columns, macros.
Extension limitations: Data inlining only works with DuckDB catalogs. MySQL catalogs not fully supported.
Advanced Features
- Partitioning:
ALTER TABLE tbl ADD PARTITION (col) - Data inlining: Small tables stored inline in catalog (DuckDB catalogs only):
ATTACH ... (DATA_INLINING_ROW_LIMIT 1000) - Encryption:
ATTACH ... (ENCRYPTED true) - Row lineage: Track row-level provenance
- Views:
CREATE VIEW my_lake.v AS SELECT ... - Data change feed: Query changes between snapshots
- Upserting: Via
(notMERGE INTO
since no PKs)INSERT ON CONFLICT - Secrets: Persist connection config:
CREATE PERSISTENT SECRET (TYPE ducklake, METADATA_PATH '...', DATA_PATH '...')