Claude-code-plugins-plus-skills clickhouse-core-workflow-a
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/clickhouse-pack/skills/clickhouse-core-workflow-a" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-clickhouse-core-workflow-a && rm -rf "$T"
manifest:
plugins/saas-packs/clickhouse-pack/skills/clickhouse-core-workflow-a/SKILL.mdsource content
ClickHouse Schema Design (Core Workflow A)
Overview
Design ClickHouse tables with correct engine selection, ORDER BY keys, partitioning, and codec choices for analytical workloads.
Prerequisites
connected (see@clickhouse/client
)clickhouse-install-auth- Understanding of your query patterns (what you filter and group on)
Instructions
Step 1: Choose the Right Engine
| Engine | Best For | Dedup? | Example |
|---|---|---|---|
| General analytics, append-only logs | No | Clickstream, IoT |
| Mutable rows (upserts) | Yes (on merge) | User profiles, state |
| Pre-aggregated counters | Sums numerics | Page view counts |
| Materialized view targets | Merges states | Dashboards |
| Stateful row updates | Collapses +-1 | Shopping carts |
ClickHouse Cloud uses
— it is a drop-in replacement for
SharedMergeTree
MergeTree on Cloud. You do not need to change your DDL.
Step 2: Design the ORDER BY (Sort Key)
The
ORDER BY clause is the single most important schema decision. It defines:
- Primary index — sparse index over sort-key granules (8192 rows default)
- Data layout on disk — rows sorted physically by these columns
- Query speed — queries filtering on ORDER BY prefix columns hit fewer granules
Rules of thumb:
- Put low-cardinality filter columns first (
,event_type
)status - Then high-cardinality columns you filter on (
,user_id
)tenant_id - End with a time column if you use range filters (
)created_at - Do NOT put high-cardinality columns you never filter on in ORDER BY
-- Good: filter by tenant, then by time ranges ORDER BY (tenant_id, event_type, created_at) -- Bad: UUID first means every query scans the full index ORDER BY (event_id, created_at) -- event_id is random UUID
Step 3: Schema Examples
Event Analytics Table
CREATE TABLE analytics.events ( event_id UUID DEFAULT generateUUIDv4(), tenant_id UInt32, event_type LowCardinality(String), user_id UInt64, session_id String, properties String CODEC(ZSTD(3)), -- JSON blob, compress well url String CODEC(ZSTD(1)), ip_address IPv4, country LowCardinality(FixedString(2)), created_at DateTime64(3) DEFAULT now64(3) ) ENGINE = MergeTree() ORDER BY (tenant_id, event_type, toDate(created_at), user_id) PARTITION BY toYYYYMM(created_at) TTL created_at + INTERVAL 1 YEAR SETTINGS index_granularity = 8192;
User Profile Table (Upserts)
CREATE TABLE analytics.users ( user_id UInt64, email String, plan LowCardinality(String), mrr_cents UInt32, properties String CODEC(ZSTD(3)), updated_at DateTime DEFAULT now() ) ENGINE = ReplacingMergeTree(updated_at) -- keeps latest row per ORDER BY key ORDER BY user_id; -- Query with FINAL to get deduplicated results SELECT * FROM analytics.users FINAL WHERE user_id = 42;
Daily Aggregation Table
CREATE TABLE analytics.daily_stats ( date Date, tenant_id UInt32, event_type LowCardinality(String), event_count UInt64, unique_users AggregateFunction(uniq, UInt64) ) ENGINE = AggregatingMergeTree() ORDER BY (tenant_id, event_type, date);
Step 4: Partitioning Guidelines
| Partition Expression | Typical Use | Parts Per Partition |
|---|---|---|
| Most common — monthly | Target 10-1000 |
| Weekly rollups | More parts, finer drops |
| Daily TTL drops | Many parts — use carefully |
| None | Small tables (<1M rows) | Fine |
Warning: Each partition creates separate parts on disk. Over-partitioning (e.g., by
user_id) creates millions of tiny parts and kills performance.
Step 5: Codecs and Compression
-- Column-level compression codecs column1 UInt64 CODEC(Delta, ZSTD(3)), -- Time series / sequential IDs column2 Float64 CODEC(Gorilla, ZSTD(1)), -- Floating point (similar values) column3 String CODEC(ZSTD(3)), -- General text / JSON column4 DateTime CODEC(DoubleDelta, ZSTD), -- Timestamps (near-sequential)
Applying Schema via Node.js
import { createClient } from '@clickhouse/client'; const client = createClient({ url: process.env.CLICKHOUSE_HOST! }); async function applySchema() { await client.command({ query: 'CREATE DATABASE IF NOT EXISTS analytics' }); await client.command({ query: ` CREATE TABLE IF NOT EXISTS analytics.events ( event_id UUID DEFAULT generateUUIDv4(), tenant_id UInt32, event_type LowCardinality(String), user_id UInt64, payload String CODEC(ZSTD(3)), created_at DateTime DEFAULT now() ) ENGINE = MergeTree() ORDER BY (tenant_id, event_type, created_at) PARTITION BY toYYYYMM(created_at) `, }); console.log('Schema applied.'); }
Error Handling
| Error | Cause | Solution |
|---|---|---|
| PRIMARY KEY != ORDER BY | Remove explicit PRIMARY KEY or align |
| Over-partitioning | Use coarser partition expression |
| Wrong data type | Match insert types to schema |
| TTL on non-date column | TTL must reference DateTime column |
Resources
Next Steps
For inserting and querying data, see
clickhouse-core-workflow-b.