Datastoria optimize-clickhouse-sql

Optimize slow queries, analyze SQL performance, and collect evidence for expensive workloads.

install

source · Clone the upstream repo

git clone https://github.com/FrankChen021/datastoria

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FrankChen021/datastoria "$T" && mkdir -p ~/.claude/skills && cp -r "$T/resources/skills/optimize-clickhouse-sql" ~/.claude/skills/frankchen021-datastoria-optimize-clickhouse-sql && rm -rf "$T"

manifest: resources/skills/optimize-clickhouse-sql/SKILL.md

source content

SQL Optimization Skill

Workflow is evidence-driven: collect evidence with tools, then recommend based on evidence only.

Pre-flight Check

HAS SQL: Conversation contains a SQL query -> Go to WORKFLOW step 2 (Collect Evidence).
HAS QUERY_ID: Conversation contains query_id -> Go to WORKFLOW step 2 (Call
```
collect_sql_optimization_evidence
```
immediately).
DISCOVERY REQUEST: User asks to optimize the slowest/heaviest queries but does not provide SQL/query_id -> Go to WORKFLOW step 1 (Discovery).

NEITHER: Call

ask_user_question

with exactly one question:

header

Please provide one of the following for optimization

options

{ "id": "sql", "label": "Provide SQL", "input": "text" }

{ "id": "query_id", "label": "Provide query_id", "input": "text" }

{ "id": "resource", "label": "Find the query that consumes the most", "input": "select", "choices": ["duration", "cpu", "memory", "disk"] }

After the tool returns:

If
```
optionId
```
is
```
sql
```
, treat
```
value
```
as the SQL text and continue with evidence collection.
If
```
optionId
```
is
```
query_id
```
, treat
```
value
```
as the query_id and continue with evidence collection.
If
```
optionId
```
is
```
resource
```
, treat
```
value
```
as the ranking metric and run discovery for the top 1 query in the last 1 day before continuing.

Discovery

Prefer
```
search_query_log
```
for discovery from
```
system.query_log
```
(slowest, most expensive, user-scoped, database-scoped, text-scoped, etc.).
If
```
search_query_log
```
cannot express the request, then load the
```
clickhouse-system-queries
```
skill, immediately call
```
skill_resource
```
for
```
references/system-query-log.md
```
, and follow that reference strictly.
Do NOT write ad-hoc SQL against
```
system.query_log
```
from this skill when
```
search_query_log
```
can satisfy the request.
Extract
```
query_id
```
from the discovery results for the next step (evidence collection).

Time Filtering

```
time_window
```
: Relative minutes from now (e.g., 60 = last hour).

time_range

: Absolute range

{ from: "ISO date", to: "ISO date" }

When calling
```
collect_sql_optimization_evidence
```
after discovery, you MUST pass the same time_window or time_range used in discovery.

Mode Selection

Default
```
collect_sql_optimization_evidence
```
to light mode for the first pass.
Prefer omitting the
```
mode
```
argument entirely unless full detail is required.
Use
```
mode: "full"
```
only when the user explicitly asks for detailed/raw evidence or the light pass is insufficient.
Do not choose
```
full
```
just because the request says "optimize", "analyze", or "investigate".

Workflow

Discovery (if needed): Prefer
```
search_query_log
```
to find candidates. If the request exceeds the tool's schema, then load
```
clickhouse-system-queries
```
, load
```
references/system-query-log.md
```
via
```
skill_resource
```
, and use that reference. Extract
```
query_id
```
from the results.
Collect Evidence: Call
```
collect_sql_optimization_evidence
```
with query_id (preferred) or sql (and same time params if coming from discovery).
Analyze: Review evidence for optimization opportunities.
Recommendations: Rank by Impact/Risk/Effort. Prefer low-risk query rewrites first.
Validate: Use
```
validate_sql
```
for any proposed SQL changes. Add inline comments (
```
-- comment
```
) to highlight key changes.

Table Schema Evidence

Use table_schema fields: columns, engine, partition_key, primary_key, sorting_key, secondary_indexes.
When
```
optimization_target
```
is present, treat it as the real local-table schema behind a
```
Distributed
```
table and base key/index recommendations on it.
Suggest secondary indexes only when evidence shows frequent WHERE filters on selective columns and the index type fits the predicate.
- Use
```
minmax
```
  for range predicates on sorted columns.
- Use
```
set
```
  for low-cardinality equality filters.
- Use
```
bloom_filter
```
  for high-cardinality equality filters (e.g., trace_id, user_id).
- Use
```
tokenbf_v1
```
  for frequent token-based text search.

Rules

Do NOT recommend based on assumptions. If evidence is missing, collect it with tools.
If tools return NO meaningful evidence, output only a brief 3-5 sentence message explaining what's missing.
Always validate proposed SQL with
```
validate_sql
```
before recommending.
If discovery results include both query text and query_id, prefer query_id to avoid truncation issues.
If the SQL appears incomplete (truncated/ellipsized/ends mid-clause), use
```
query_id
```
instead of sql.
When both
```
query_id
```
and SQL are available, prefer
```
query_id
```
to reduce tokens and avoid truncation issues.