Skills ibis
Expert guidance for Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.
install
source · Clone the upstream repo
git clone https://github.com/TerminalSkills/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/TerminalSkills/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ibis" ~/.claude/skills/terminalskills-skills-ibis && rm -rf "$T"
manifest:
skills/ibis/SKILL.mdsafety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- pip install
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
Ibis — Portable Python Analytics
Overview
Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.
Instructions
Basic Usage
# src/analytics.py — Portable analytics with Ibis import ibis from ibis import _ # Shorthand for column references # Connect to a backend (DuckDB for local development) con = ibis.duckdb.connect("analytics.duckdb") # Or connect to production databases — same code, different backend # con = ibis.postgres.connect(url="postgresql://...") # con = ibis.bigquery.connect(project_id="my-project") # con = ibis.snowflake.connect(...) # Load data orders = con.table("orders") # Build a query — this is lazy (no execution until you call .execute()) monthly_revenue = ( orders .filter(_.status == "completed") .filter(_.created_at >= "2026-01-01") .group_by(month=_.created_at.truncate("M")) .agg( revenue=_.amount.sum(), order_count=_.count(), unique_customers=_.customer_id.nunique(), avg_order_value=_.amount.mean(), ) .order_by(_.month) ) # Execute and get a pandas DataFrame df = monthly_revenue.execute() print(df) # Or see the generated SQL print(ibis.to_sql(monthly_revenue))
Complex Transformations
# Window functions, joins, and case expressions import ibis from ibis import _ con = ibis.duckdb.connect("analytics.duckdb") orders = con.table("orders") customers = con.table("customers") # Window functions — running totals and rankings ranked = ( orders .filter(_.status == "completed") .group_by(_.customer_id) .agg( total_spent=_.amount.sum(), order_count=_.count(), first_order=_.created_at.min(), last_order=_.created_at.max(), ) .mutate( # Rank customers by revenue revenue_rank=ibis.rank().over( order_by=ibis.desc(_.total_spent) ), # Percentile revenue_percentile=ibis.percent_rank().over( order_by=_.total_spent ), # Customer segment based on spending segment=ibis.case() .when(_.total_spent >= 1000, "whale") .when(_.total_spent >= 100, "regular") .else_("casual") .end(), ) ) # Joins customer_analytics = ( ranked .join(customers, _.customer_id == customers.id) .select( _.customer_id, customers.name, customers.email, customers.plan, _.total_spent, _.order_count, _.segment, _.revenue_rank, # Days since last order days_inactive=(ibis.now() - _.last_order).cast("int32") // 86400, ) ) # Cohort analysis cohorts = ( orders .filter(_.status == "completed") .group_by(_.customer_id) .mutate( cohort_month=_.created_at.min().truncate("M"), ) .mutate( months_since=((_.created_at.truncate("M") - _.cohort_month) .cast("int32") // (30 * 86400)), ) .group_by(_.cohort_month, _.months_since) .agg( active_users=_.customer_id.nunique(), revenue=_.amount.sum(), ) )
Backend Portability
# The same analytics code runs on any backend import ibis def build_revenue_report(con: ibis.BaseBackend): """Build a revenue report — works on any Ibis backend. Args: con: Any Ibis connection (DuckDB, Postgres, BigQuery, etc.) """ orders = con.table("orders") return ( orders .filter(_.status == "completed") .group_by( month=_.created_at.truncate("M"), category=_.category, ) .agg( revenue=_.amount.sum(), orders=_.count(), ) .order_by(_.month.desc()) ) # Development: DuckDB on local Parquet files dev_con = ibis.duckdb.connect() dev_con.read_parquet("data/orders.parquet", table_name="orders") report = build_revenue_report(dev_con).execute() # Production: BigQuery prod_con = ibis.bigquery.connect(project_id="prod-project", dataset_id="analytics") report = build_revenue_report(prod_con).execute() # Testing: in-memory with DuckDB test_con = ibis.duckdb.connect() test_con.create_table("orders", test_data_df) report = build_revenue_report(test_con).execute()
UDFs and Custom Functions
# Custom scalar and aggregate functions import ibis from ibis import udf @udf.scalar.python def normalize_email(email: str) -> str: """Normalize email addresses for deduplication.""" local, domain = email.lower().split("@") # Remove dots and plus aliases from Gmail if domain in ("gmail.com", "googlemail.com"): local = local.split("+")[0].replace(".", "") return f"{local}@{domain}" # Use in queries customers = con.table("customers") deduped = ( customers .mutate(clean_email=normalize_email(_.email)) .group_by(_.clean_email) .agg( count=_.count(), first_seen=_.created_at.min(), ) .filter(_.count > 1) )
Installation
# Core library pip install ibis-framework # With specific backends pip install "ibis-framework[duckdb]" pip install "ibis-framework[postgres]" pip install "ibis-framework[bigquery]" pip install "ibis-framework[snowflake]" pip install "ibis-framework[pyspark]" # Interactive mode (for notebooks) ibis.options.interactive = True # Auto-execute and display results
Examples
Example 1: Migrating a pandas pipeline to run on BigQuery
User request:
I have a pandas script that calculates cohort retention from our events table. Rewrite it using Ibis so it runs on BigQuery instead of loading everything into memory.
The agent rewrites the pandas code using Ibis expressions (
ibis.bigquery.connect(), t.group_by(), _.mutate(), window functions with ibis.cumulative_window()), keeping the same logic but generating SQL that executes on BigQuery. The script goes from loading 50M rows into memory to pushing all computation to the warehouse.
Example 2: Building a portable analytics module with DuckDB for dev
User request:
Write an analytics module that computes daily active users and revenue per plan from Parquet files locally, but can switch to Snowflake in production.
The agent creates a module using
ibis.duckdb.connect() for local development with Parquet files, writes composable Ibis expressions for DAU (t.select('user_id', 'event_date').distinct().group_by('event_date').count()) and revenue by plan, and adds a get_connection() function that switches to ibis.snowflake.connect() based on an environment variable — same analytics code, different backend.
Guidelines
- Write once, run anywhere — Define analytics logic with Ibis; swap backends by changing the connection, not the code
- Lazy by default — Ibis expressions are lazy; they only execute when you call
or.execute().to_pandas() - DuckDB for development — Use DuckDB locally with Parquet files; switch to BigQuery/Snowflake for production
- Use
for readability —_
gives you clean column references:from ibis import _
vs_.amount.sum()orders.amount.sum() - Generate SQL for debugging — Use
to see the SQL being generated; helps debug unexpected resultsibis.to_sql(expr) - Functions for reuse — Wrap analytics logic in functions that take a connection; test with DuckDB, deploy on any backend
- Interactive mode in notebooks — Set
for immediate result display during explorationibis.options.interactive = True - Type your schemas — Use
to define expected table schemas; catch type mismatches earlyibis.schema()