install
source · Clone the upstream repo
git clone https://github.com/ai-analyst-lab/ai-analyst
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/connect-data" ~/.claude/skills/ai-analyst-lab-ai-analyst-connect-data && rm -rf "$T"
manifest:
.claude/skills/connect-data/skill.mdsource content
Skill: Connect Data
Purpose
Guided wizard to connect a new dataset. Walks the user through selecting a connection type, configuring credentials, validating the connection, profiling the schema, and setting up the knowledge brain.
When to Use
- User says
or "connect my database" or "add a new dataset"/connect-data - First-run welcome suggests connecting data
- After
when the target dataset doesn't exist yet/switch-dataset
Invocation
/connect-data — start the connection wizard
/connect-data type=postgres — skip type selection
Instructions
Step 1: Choose Connection Type
Present options:
- CSV files — "I have CSV files in a local directory"
- DuckDB — "I have a local DuckDB database file"
- MotherDuck — "I have a MotherDuck cloud database"
- PostgreSQL — "I have a PostgreSQL database"
- BigQuery — "I have a Google BigQuery dataset"
- Snowflake — "I have a Snowflake warehouse"
Step 2: Collect Connection Details
For CSV:
- Ask: "What's the path to your CSV directory? (relative to this repo)"
- Verify the directory exists and contains .csv files
- List found files and ask to confirm
For DuckDB:
- Ask: "Path to your .duckdb file?"
- Verify file exists
- Test connection with
SELECT 1
For MotherDuck:
- Ask: "Database name and schema?"
- Note: "MotherDuck connects via MCP. Make sure your token is configured."
For PostgreSQL / BigQuery / Snowflake:
- Copy the appropriate template from
connection_templates/ - Ask user to fill in required fields
- IMPORTANT: Never ask for or store passwords directly. Guide the user
to use environment variables (e.g.,
).$PG_PASSWORD
Step 3: Create Dataset Brain
- Generate a dataset_id from the display name (lowercase, hyphens)
- Create
directory.knowledge/datasets/{id}/ - Write
from the connection template + user inputsmanifest.yaml - Create empty
with section headersquirks.md - Create empty
metrics/index.yaml
Step 4: Test Connection
Use
ConnectionManager from helpers/connection_manager.py:
- Instantiate with the new config
- Call
test_connection() - If fails: show error, offer to retry or edit config
- If passes: proceed
Step 5: Profile Schema
- Call
to enumerate tableslist_tables() - For each table: get column names and types via
get_table_schema() - Generate
usingschema.md
fromschema_to_markdown()helpers/data_helpers.py - Write to
.knowledge/datasets/{id}/schema.md - Offer to run full data profiling: "Want me to deep-profile this dataset?"
Step 6: Set Active
- Update
to point to the new dataset.knowledge/active.yaml - Confirm: "Connected! {display_name} is now your active dataset."
- Show: table count, estimated row count, date range (if detected)
- Suggest next steps:
to browse,/explore
to define metrics, or just ask a question/metrics
Rules
- Never store credentials in plain text in manifest files
- Always test the connection before declaring success
- Always generate a schema.md — it's required for analysis
- Create the full .knowledge/datasets/{id}/ tree even if profiling fails
- If the user already has this dataset, ask before overwriting
Edge Cases
- Directory doesn't exist: Offer to create it
- No CSV files found: Check for other formats (.parquet, .json)
- Connection fails repeatedly: Suggest checking credentials, firewall, VPN
- Schema too large (>100 tables): Profile only, skip per-table details
- Dataset name collision: Append a number (e.g., "mydata-2")