Datastoria sql-expert

Name: sql-expert
Author: FrankChen021

Expert system for generating, validating, and optimizing ClickHouse SQL. Use this when the user needs data, queries, or analysis.

install

source · Clone the upstream repo

git clone https://github.com/FrankChen021/datastoria

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/FrankChen021/datastoria "$T" && mkdir -p ~/.claude/skills && cp -r "$T/resources/skills/sql-expert" ~/.claude/skills/frankchen021-datastoria-sql-expert && rm -rf "$T"

manifest: resources/skills/sql-expert/SKILL.md

source content

🚨 CRITICAL RULE: MANDATORY VALIDATION

You MUST call
validate_sql(sql)
for every new query you generate. Context Note: Historical validation steps are pruned to save tokens, but this does NOT excuse you from validating new queries in the current turn. Always validate before executing.

1. Schema Discovery & Context

Missing Schema: If you do not have the table schema, you MUST use
```
get_tables
```
and
```
explore_schema
```
first.
- Optimization: If the user already mentioned exact field names, pass them in the
```
columns
```
  argument of
```
explore_schema
```
  instead of loading the full table schema.
Exact Identifier Rule: Treat exact identifier-like tokens from the user question as candidate columns on the first schema lookup. This especially applies to ClickHouse metric names such as
```
ProfileEvent_*
```
,
```
CurrentMetric_*
```
, and flattened event columns on
```
system.*
```
tables.
Missing Columns: If you don't see the expected column, retry
```
explore_schema
```
with a narrower
```
columns
```
list based on the user-mentioned identifier or the closest confirmed column names.
Schema Fidelity: Only use columns that are confirmed to exist in the table schema from
```
explore_schema
```
. Do not assume standard columns exist if they are not in the tool output.
User Context: If the user asks about "my data", use
```
WHERE user = '<clickHouseUser>'
```
.
System Tables: For queries on
```
system.*
```
tables (e.g.,
```
system.query_log
```
,
```
system.parts
```
,
```
system.merges
```
), defer to the
```
clickhouse-system-queries
```
skill - it contains table-specific patterns, predicates, and resource metrics that this skill does not cover. For
```
system.query_log
```
, do not generate SQL until
```
references/system-query-log.md
```
has been loaded via
```
skill_resource
```
, and do not call
```
search_query_log
```
for chart/time-series requests.

2. Syntax Rules (The Grammar)

Tables: ALWAYS use fully qualified names (e.g.,
```
database.table
```
).
Semicolons: NEVER include a trailing semicolon (
```
;
```
).
Enums: Use exact string literals for Enum columns.
Safety: ALWAYS use
```
LIMIT
```
for data exploration queries.

3. Optimization Rules (Best Practices)

Time filters: Always filter by the partition key (usually
```
event_date
```
or
```
timestamp
```
) first. Use bounded time windows (e.g., last 24h, 7 days) unless the user asks for all history.
Primary Keys (CRITICAL): ClickHouse indexes are sparse. You MUST filter on the leading column of the Primary Key if you filter on any secondary column.
- Bad:
```
WHERE event_time > now() - 1h
```
  (If PK is
```
event_date, event_time
```
  , this scans everything).
- Good:
```
WHERE event_date >= toDate(now() - 1h) AND event_time > now() - 1h
```
  (Uses index, handles midnight crossover).
Approximation: Use
```
uniq()
```
instead of
```
uniqExact()
```
unless precision is explicitly requested.
Joins: Put the smaller table on the RIGHT. Use
```
GLOBAL IN
```
only for distributed queries.

4. Execution Workflow

Generate: Create the SQL following the rules above.
Validate (MANDATORY): Call
```
validate_sql(sql)
```
.
- If invalid: Read the error, fix the SQL, and retry (max 3 attempts).
Decide Action:
- Visualization: IF the user wants a chart, DO NOT execute. Pass the SQL to the visualization skill logic.
- Data: IF the user wants answers (lists, counts), call
```
execute_sql(sql)
```
  .
- Code Only: IF the user asks to "write SQL", just output the code block.