Claude-skill-registry bsl-query-expert
Query BSL semantic models with group_by, aggregate, filter, and visualizations. Use for data analysis from existing semantic tables.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bsl-query-expert" ~/.claude/skills/majiayu000-claude-skill-registry-bsl-query-expert && rm -rf "$T"
manifest:
skills/data/bsl-query-expert/SKILL.mdsource content
BSL Query Expert
Query semantic models using BSL. Be concise.
Workflow
→ discover available modelslist_models()
→ get schema (REQUIRED before querying)get_model(name)
→ call before first query to learn syntaxget_documentation("query-methods")
→ execute, auto-displays resultsquery_model(query)- Brief summary (1-2 sentences max)
Behavior
- Execute queries immediately - don't show code to user
- Never stop after listing models - proceed to query
- Charts/tables auto-display - don't print data inline
- Reuse context: Don't re-call tools if info already in context
- IMPORTANT: If query fails → call
to learn correct syntax before retryingget_documentation("query-methods")
CRITICAL: Field Names
- Use EXACT names from
outputget_model() - Joined columns:
(nott.customers.country
)t.customer_id.country() - Direct columns:
(nott.region
)t.model.region - NEVER invent methods on columns - they don't exist!
CRITICAL: Never Guess Filter Values
- WRONG:
without checking actual values first.filter(lambda t: t.region.isin(["US", "EU"])) - Data uses codes/IDs that differ from what you expect (e.g., "California" might be "CA" or "US-CA")
- Always discover values first, then filter with real data
Multi-Hop Query Pattern
When filtering by names/locations/categories you haven't seen:
Step 1 (discover): query_model(query="model.group_by('region').aggregate('count')", records_limit=50, get_chart=false) Step 2 (filter): query_model(query="model.filter(lambda t: t.region.isin(['CA','NY'])).group_by('region').aggregate('count')", get_records=false)
- Step 1: Get data to LLM (
), hide chart (records_limit=50
)get_chart=false - Step 2: Display to user (
), show chart (default)get_records=false
query_model Parameters
(default): Return data to LLM, table auto-displaysget_records=true
: Display-only, no data returned to LLMget_records=false
: Max records to LLM (increase for discovery queries)records_limit=N
(default): Show chart;get_chart=true
for table-onlyfalse
CRITICAL: Exploration vs Final Query
- Discovery/exploration queries: Use
- no chart when exploring data valuesget_chart=false - Final answer query: Use
(default) - show chart for user's answerget_chart=true - Example: Looking up airport codes? →
. Final flight count? → chart enabledget_chart=false
Charts
- Default: Omit chart_spec - auto-detect handles most cases
- Override only if needed:
orchart_spec={"chart_type": "line"}"bar" - CRITICAL: Charting only works on BSL SemanticQuery results (after group_by + aggregate)
- If you use filter-only queries (returns Ibis Table), set
- charts will fail on raw tablesget_chart=false
Time Dimensions
- Use
for time columns:.truncate()with_dimensions(year=lambda t: t.date.truncate("Y")) - Units:
,"Y"
,"Q"
,"M"
,"W"
,"D"
,"h"
,"m""s"
CRITICAL: Case Expressions
- Use
(PLURAL) - NOTibis.cases()ibis.case() - Syntax:
ibis.cases((condition1, value1), (condition2, value2), else_=default) - Example:
ibis.cases((t.value > 100, "high"), (t.value > 50, "medium"), else_="low")
Help
get_documentation(topic) for:
- Core: getting-started, semantic-table, yaml-config, profile, compose, query-methods
- Advanced: windowing, bucketing, nested-subtotals, percentage-total, indexing, sessionized, comparison
- Charts: charting, charting-altair, charting-plotly, charting-plotext
Additional Information
Available documentation:
- Getting Started: Introduction to BSL, installation, and basic usage with semantic tables
- Semantic Tables: Building semantic models with dimensions, measures, and expressions
- YAML Configuration: Defining semantic models in YAML files for better organization
- Profiles: Database connection profiles for connecting to data sources
- Composing Models: Joining multiple semantic tables together
- Query Methods: Complete API reference for group_by, aggregate, filter, order_by, limit, mutate
- Window Functions: Running totals, moving averages, rankings, lag/lead, and cumulative calculations
- Bucketing with Other: Create categorical buckets and consolidate long-tail into 'Other' category
- Nested Subtotals: Rollup calculations with subtotals at each grouping level
- Percent of Total: Calculate percentages using t.all() for market share and distribution analysis
- Dimensional Indexing: Compare values to baselines and calculate indexed metrics
- Charting Overview: Data visualization basics with automatic chart type detection
- Altair Charts: Interactive web charts with Vega-Lite via Altair backend
- Plotly Charts: Interactive charts with Plotly backend for dashboards
- Terminal Charts: ASCII charts for terminal/CLI with Plotext backend
- Sessionized Data: Working with session-based data and user journey analysis
- Comparison Queries: Period-over-period comparisons and trend analysis
Query Syntax Reference
Execute BSL queries and visualize results. Returns query results with optional charts.
Core Pattern
model.group_by(<dimensions>).aggregate(<measures>) # Both take STRING names only
CRITICAL:
aggregate() takes measure names as strings, NOT expressions or lambdas!
Method Order
model -> with_dimensions -> filter -> with_measures -> group_by -> aggregate -> order_by -> mutate -> limit
Lambda Column Access
CRITICAL: In
with_dimensions and with_measures lambdas, access columns directly - NO model prefix!
# ✅ CORRECT - access columns directly via t flights.with_dimensions(x=lambda t: ibis.cases((t.carrier == "WN", "Southwest"), else_="Other")) flights.with_measures(pct=lambda t: t.flight_count / t.all(t.flight_count) * 100) # ❌ WRONG - model prefix fails in with_dimensions/with_measures flights.with_dimensions(x=lambda t: t.flights.carrier) # ERROR: 'Table' has no attribute 'flights' flights.with_measures(x=lambda t: t.flights.flight_count) # ERROR!
Note: Model prefix (e.g.,
t.flights.carrier) works in .filter() but NOT in with_dimensions/with_measures.
Filtering
# Simple filter model.filter(lambda t: t.status == "active").group_by("category").aggregate("count") # Multiple conditions - use ibis.and_() / ibis.or_() model.filter(lambda t: ibis.and_(t.amount > 1000, t.year >= 2023)) # IN operator - MUST use .isin() (Python "in" does NOT work!) model.filter(lambda t: t.region.isin(["US", "EU"])) # ✅ model.filter(lambda t: t.region in ["US", "EU"]) # ❌ ERROR! # Post-aggregate filter (SQL HAVING) - filter AFTER aggregate model.group_by("carrier").aggregate("count").filter(lambda t: t.count > 1000)
Joined Columns
Models with joins expose prefixed columns (e.g.,
customers.country). Use EXACT names from get_model():
# ✅ CORRECT - use prefixed column name model.filter(lambda t: t.customers.country.isin(["US", "CA"])).group_by("customers.country").aggregate("count") # ❌ WRONG - columns don't have lookup methods! model.filter(lambda t: t.customer_id.country()) # ERROR: no 'country' attribute
Key: Look for prefixed columns in
get_model() output - don't call methods on ID columns.
Time Transformations
group_by() only accepts strings. Use .with_dimensions() first:
model.with_dimensions(year=lambda t: t.created_at.truncate("Y")).group_by("year").aggregate("count")
Truncate units:
"Y", "Q", "M", "W", "D", "h", "m", "s"
Filtering Timestamps - Match Types!
# .year() returns int -> compare with int model.filter(lambda t: t.created_at.year() >= 2023) # .truncate() returns timestamp -> compare with ISO string model.with_dimensions(yr=lambda t: t.created_at.truncate("Y")).filter(lambda t: t.yr >= '2023-01-01')
Percentage of Total
Use
t.all(t.measure) in .with_measures() for grand total:
# Simple percentage by category sales.with_measures(pct=lambda t: t.revenue / t.all(t.revenue) * 100).group_by("category").aggregate("revenue", "pct") # Complex: filter + joined column + time dimension + percentage orders.filter(lambda t: t.customers.country.isin(["US", "CA"])).with_dimensions( order_date=lambda t: t.created_at.date() ).with_measures( pct=lambda t: t.order_count / t.all(t.order_count) * 100 ).group_by("order_date").aggregate("order_count", "pct").order_by("order_date")
More:
get_documentation(topic="percentage-total")
Sorting & Limiting
model.group_by("category").aggregate("revenue").order_by(ibis.desc("revenue")).limit(10)
CRITICAL:
.limit() in query limits data before calculations. Use limit parameter for display-only limiting.
Window Functions
.mutate() for post-aggregation transforms - MUST come after .order_by():
model.group_by("week").aggregate("count").order_by("week").mutate( rolling_avg=lambda t: t.count.mean().over(ibis.window(rows=(-9, 0), order_by="week")) )
More:
get_documentation(topic="windowing")
Chart
chart_spec={"chart_type": "bar"} # or "line", "scatter" - omit for auto-detect