Datastoria optimize-clickhouse-sql
Optimize slow queries, analyze SQL performance, and collect evidence for expensive workloads.
install
source · Clone the upstream repo
git clone https://github.com/FrankChen021/datastoria
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/FrankChen021/datastoria "$T" && mkdir -p ~/.claude/skills && cp -r "$T/resources/skills/optimize-clickhouse-sql" ~/.claude/skills/frankchen021-datastoria-optimize-clickhouse-sql && rm -rf "$T"
manifest:
resources/skills/optimize-clickhouse-sql/SKILL.mdsource content
SQL Optimization Skill
Workflow is evidence-driven: collect evidence with tools, then recommend based on evidence only.
Pre-flight Check
- HAS SQL: Conversation contains a SQL query -> Go to WORKFLOW step 2 (Collect Evidence).
- HAS QUERY_ID: Conversation contains query_id -> Go to WORKFLOW step 2 (Call
immediately).collect_sql_optimization_evidence - DISCOVERY REQUEST: User asks to optimize the slowest/heaviest queries but does not provide SQL/query_id -> Go to WORKFLOW step 1 (Discovery).
- NEITHER: Call
with exactly one question:ask_user_question
:headerPlease provide one of the following for optimization
:options{ "id": "sql", "label": "Provide SQL", "input": "text" }{ "id": "query_id", "label": "Provide query_id", "input": "text" }
After the tool returns:{ "id": "resource", "label": "Find the query that consumes the most", "input": "select", "choices": ["duration", "cpu", "memory", "disk"] }
- If
isoptionId
, treatsql
as the SQL text and continue with evidence collection.value - If
isoptionId
, treatquery_id
as the query_id and continue with evidence collection.value - If
isoptionId
, treatresource
as the ranking metric and run discovery for the top 1 query in the last 1 day before continuing.value
Discovery
- Prefer
for discovery fromsearch_query_log
(slowest, most expensive, user-scoped, database-scoped, text-scoped, etc.).system.query_log - If
cannot express the request, then load thesearch_query_log
skill, immediately callclickhouse-system-queries
forskill_resource
, and follow that reference strictly.references/system-query-log.md - Do NOT write ad-hoc SQL against
from this skill whensystem.query_log
can satisfy the request.search_query_log - Extract
from the discovery results for the next step (evidence collection).query_id
Time Filtering
: Relative minutes from now (e.g., 60 = last hour).time_window
: Absolute rangetime_range
.{ from: "ISO date", to: "ISO date" }- When calling
after discovery, you MUST pass the same time_window or time_range used in discovery.collect_sql_optimization_evidence
Mode Selection
- Default
to light mode for the first pass.collect_sql_optimization_evidence - Prefer omitting the
argument entirely unless full detail is required.mode - Use
only when the user explicitly asks for detailed/raw evidence or the light pass is insufficient.mode: "full" - Do not choose
just because the request says "optimize", "analyze", or "investigate".full
Workflow
- Discovery (if needed): Prefer
to find candidates. If the request exceeds the tool's schema, then loadsearch_query_log
, loadclickhouse-system-queries
viareferences/system-query-log.md
, and use that reference. Extractskill_resource
from the results.query_id - Collect Evidence: Call
with query_id (preferred) or sql (and same time params if coming from discovery).collect_sql_optimization_evidence - Analyze: Review evidence for optimization opportunities.
- Recommendations: Rank by Impact/Risk/Effort. Prefer low-risk query rewrites first.
- Validate: Use
for any proposed SQL changes. Add inline comments (validate_sql
) to highlight key changes.-- comment
Table Schema Evidence
- Use table_schema fields: columns, engine, partition_key, primary_key, sorting_key, secondary_indexes.
- When
is present, treat it as the real local-table schema behind aoptimization_target
table and base key/index recommendations on it.Distributed - Suggest secondary indexes only when evidence shows frequent WHERE filters on selective columns and the index type fits the predicate.
- Use
for range predicates on sorted columns.minmax - Use
for low-cardinality equality filters.set - Use
for high-cardinality equality filters (e.g., trace_id, user_id).bloom_filter - Use
for frequent token-based text search.tokenbf_v1
- Use
Rules
- Do NOT recommend based on assumptions. If evidence is missing, collect it with tools.
- If tools return NO meaningful evidence, output only a brief 3-5 sentence message explaining what's missing.
- Always validate proposed SQL with
before recommending.validate_sql - If discovery results include both query text and query_id, prefer query_id to avoid truncation issues.
- If the SQL appears incomplete (truncated/ellipsized/ends mid-clause), use
instead of sql.query_id - When both
and SQL are available, preferquery_id
to reduce tokens and avoid truncation issues.query_id