Skillshub clickhouse-migration-deep-dive
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/clickhouse-migration-deep-dive" ~/.claude/skills/comeonoliver-skillshub-clickhouse-migration-deep-dive && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/clickhouse-migration-deep-dive/SKILL.mdsource content
ClickHouse Migration Deep Dive
Overview
Plan and execute ClickHouse schema migrations: column changes, engine migrations, ORDER BY modifications, and versioned migration runners.
Prerequisites
- ClickHouse admin access
- Backup of production data (see
)clickhouse-prod-checklist - Test environment for validation
Instructions
Step 1: Understanding ClickHouse DDL
ClickHouse ALTER operations are mutations — they run asynchronously and rewrite data parts in the background. This is fundamentally different from PostgreSQL/MySQL where ALTER is often instant or blocking.
-- Lightweight operations (instant, metadata only) ALTER TABLE events ADD COLUMN country LowCardinality(String) DEFAULT ''; ALTER TABLE events RENAME COLUMN old_name TO new_name; ALTER TABLE events COMMENT COLUMN user_id 'Unique user identifier'; -- Heavyweight operations (mutations — rewrite parts in background) ALTER TABLE events MODIFY COLUMN properties String CODEC(ZSTD(3)); ALTER TABLE events DROP COLUMN deprecated_field; ALTER TABLE events DELETE WHERE user_id = 0; ALTER TABLE events UPDATE email = '' WHERE created_at < '2024-01-01'; -- Check mutation progress SELECT database, table, mutation_id, command, is_done, parts_to_do, create_time FROM system.mutations WHERE NOT is_done ORDER BY create_time;
Step 2: Column Operations
-- Add a column (instant — no data rewrite) ALTER TABLE analytics.events ADD COLUMN IF NOT EXISTS country LowCardinality(String) DEFAULT '' AFTER user_id; -- Add column with materialized default (fills new data, not old) ALTER TABLE analytics.events ADD COLUMN IF NOT EXISTS event_date Date MATERIALIZED toDate(created_at); -- Modify column type (mutation — rewrites all parts) ALTER TABLE analytics.events MODIFY COLUMN user_id UInt64; -- Was UInt32, now UInt64 -- Drop a column ALTER TABLE analytics.events DROP COLUMN IF EXISTS deprecated_field; -- Change default value ALTER TABLE analytics.events MODIFY COLUMN created_at DateTime DEFAULT now(); -- Add codec to existing column (mutation) ALTER TABLE analytics.events MODIFY COLUMN properties String CODEC(ZSTD(3));
Step 3: Change ORDER BY (Requires Table Recreation)
ClickHouse does not support
ALTER TABLE ... MODIFY ORDER BY. You must
create a new table and migrate data.
-- Step 1: Create new table with desired ORDER BY CREATE TABLE analytics.events_v2 AS analytics.events ENGINE = MergeTree() ORDER BY (tenant_id, event_type, toDate(created_at)) -- New key PARTITION BY toYYYYMM(created_at); -- Step 2: Copy data INSERT INTO analytics.events_v2 SELECT * FROM analytics.events; -- Step 3: Atomic swap (zero-downtime if app handles reconnect) RENAME TABLE analytics.events TO analytics.events_old, analytics.events_v2 TO analytics.events; -- Step 4: Verify and drop old table SELECT count() FROM analytics.events; SELECT count() FROM analytics.events_old; -- When satisfied: DROP TABLE analytics.events_old;
Step 4: Change Engine (MergeTree to ReplacingMergeTree)
-- Create new table with ReplacingMergeTree CREATE TABLE analytics.users_v2 ( user_id UInt64, email String, plan LowCardinality(String), updated_at DateTime DEFAULT now() ) ENGINE = ReplacingMergeTree(updated_at) ORDER BY user_id; -- Migrate data INSERT INTO analytics.users_v2 SELECT * FROM analytics.users; -- Atomic swap RENAME TABLE analytics.users TO analytics.users_old, analytics.users_v2 TO analytics.users; DROP TABLE analytics.users_old;
Step 5: Versioned Migration Runner
// src/clickhouse/migrations/runner.ts import { createClient } from '@clickhouse/client'; import { readFileSync, readdirSync } from 'fs'; import { join } from 'path'; const client = createClient({ url: process.env.CLICKHOUSE_HOST! }); async function runMigrations() { // Create migration tracking table await client.command({ query: ` CREATE TABLE IF NOT EXISTS _migrations ( version String, name String, applied_at DateTime DEFAULT now(), checksum String ) ENGINE = ReplacingMergeTree(applied_at) ORDER BY version `, }); // Get applied migrations const rs = await client.query({ query: 'SELECT version FROM _migrations FINAL', format: 'JSONEachRow', }); const applied = new Set((await rs.json<{ version: string }>()).map((r) => r.version)); // Read migration files const migrationsDir = join(__dirname, 'sql'); const files = readdirSync(migrationsDir) .filter((f) => f.endsWith('.sql')) .sort(); // 001-create-events.sql, 002-add-country.sql, etc. for (const file of files) { const version = file.split('-')[0]; // "001" if (applied.has(version)) { console.log(` [SKIP] ${file} (already applied)`); continue; } const sql = readFileSync(join(migrationsDir, file), 'utf-8'); console.log(` [APPLY] ${file}...`); try { // Split on semicolons to handle multi-statement files const statements = sql.split(';').filter((s) => s.trim()); for (const stmt of statements) { await client.command({ query: stmt }); } // Record migration await client.insert({ table: '_migrations', values: [{ version, name: file, checksum: '' }], format: 'JSONEachRow', }); console.log(` [OK] ${file}`); } catch (err) { console.error(` [FAIL] ${file}: ${(err as Error).message}`); throw err; // Stop on first failure } } console.log('Migrations complete.'); } runMigrations();
Step 6: Example Migration Files
-- migrations/sql/001-create-events.sql CREATE TABLE IF NOT EXISTS analytics.events ( event_id UUID DEFAULT generateUUIDv4(), event_type LowCardinality(String), user_id UInt64, properties String CODEC(ZSTD(3)), created_at DateTime DEFAULT now() ) ENGINE = MergeTree() ORDER BY (event_type, created_at) PARTITION BY toYYYYMM(created_at);
-- migrations/sql/002-add-country.sql ALTER TABLE analytics.events ADD COLUMN IF NOT EXISTS country LowCardinality(String) DEFAULT '';
-- migrations/sql/003-add-ttl.sql ALTER TABLE analytics.events MODIFY TTL created_at + INTERVAL 90 DAY;
-- migrations/sql/004-add-bloom-index.sql ALTER TABLE analytics.events ADD INDEX IF NOT EXISTS idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 4; ALTER TABLE analytics.events MATERIALIZE INDEX idx_session;
Step 7: Migration Best Practices
| Operation | Downtime? | Notes |
|---|---|---|
| ADD COLUMN | None | Instant metadata change |
| DROP COLUMN | None | Mutation runs in background |
| MODIFY COLUMN type | None* | Mutation rewrites — can be slow on large tables |
| Change ORDER BY | Brief | Requires table recreation + RENAME |
| Change ENGINE | Brief | Requires table recreation + RENAME |
| ADD INDEX | None | MATERIALIZE runs in background |
| ALTER TTL | None | Takes effect on next merge |
*No application downtime, but queries on the affected column may be slower during mutation.
Pre-Migration Checklist
- Backup production data (
)BACKUP TABLE ... TO S3(...) - Test migration on staging with production-like data
- Check disk space (mutations create temporary extra parts)
- Schedule during low-traffic window (for heavy mutations)
- Prepare rollback procedure
- Verify mutation completes (
)system.mutations WHERE NOT is_done
Error Handling
| Error | Cause | Solution |
|---|---|---|
| Mutation queue full | Wait or cancel: |
| Re-running migration | Use |
| Incompatible type change | Create new column, backfill, drop old |
| Mutation doubles data temporarily | Free space, then retry |
Resources
Next Steps
For architecture patterns, see
clickhouse-reference-architecture.