Ai-analyst connect-data

Skill: Connect Data

install

source · Clone the upstream repo

git clone https://github.com/ai-analyst-lab/ai-analyst

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/connect-data" ~/.claude/skills/ai-analyst-lab-ai-analyst-connect-data && rm -rf "$T"

manifest: .claude/skills/connect-data/skill.md

source content

Skill: Connect Data

Purpose

Guided wizard to connect a new dataset. Walks the user through selecting a connection type, configuring credentials, validating the connection, profiling the schema, and setting up the knowledge brain.

When to Use

User says
```
/connect-data
```
or "connect my database" or "add a new dataset"
First-run welcome suggests connecting data
After
```
/switch-dataset
```
when the target dataset doesn't exist yet

Invocation

/connect-data

— start the connection wizard

/connect-data type=postgres

— skip type selection

Instructions

Step 1: Choose Connection Type

Present options:

CSV files — "I have CSV files in a local directory"
DuckDB — "I have a local DuckDB database file"
MotherDuck — "I have a MotherDuck cloud database"
PostgreSQL — "I have a PostgreSQL database"
BigQuery — "I have a Google BigQuery dataset"
Snowflake — "I have a Snowflake warehouse"

Step 2: Collect Connection Details

For CSV:

Ask: "What's the path to your CSV directory? (relative to this repo)"
Verify the directory exists and contains .csv files
List found files and ask to confirm

For DuckDB:

Ask: "Path to your .duckdb file?"
Verify file exists
Test connection with
```
SELECT 1
```

For MotherDuck:

Ask: "Database name and schema?"
Note: "MotherDuck connects via MCP. Make sure your token is configured."

For PostgreSQL / BigQuery / Snowflake:

Copy the appropriate template from
```
connection_templates/
```
Ask user to fill in required fields
IMPORTANT: Never ask for or store passwords directly. Guide the user to use environment variables (e.g.,
```
$PG_PASSWORD
```
).

Step 3: Create Dataset Brain

Generate a dataset_id from the display name (lowercase, hyphens)
Create
```
.knowledge/datasets/{id}/
```
directory
Write
```
manifest.yaml
```
from the connection template + user inputs
Create empty
```
quirks.md
```
with section headers
Create empty
```
metrics/index.yaml
```

Step 4: Test Connection

Use

ConnectionManager

from

helpers/connection_manager.py

Instantiate with the new config
Call
```
test_connection()
```
If fails: show error, offer to retry or edit config
If passes: proceed

Step 5: Profile Schema

Call
```
list_tables()
```
to enumerate tables
For each table: get column names and types via
```
get_table_schema()
```

Generate

schema.md

using

schema_to_markdown()

from

helpers/data_helpers.py

Write to
```
.knowledge/datasets/{id}/schema.md
```
Offer to run full data profiling: "Want me to deep-profile this dataset?"

Step 6: Set Active

Update
```
.knowledge/active.yaml
```
to point to the new dataset
Confirm: "Connected! {display_name} is now your active dataset."
Show: table count, estimated row count, date range (if detected)
Suggest next steps:
```
/explore
```
to browse,
```
/metrics
```
to define metrics, or just ask a question

Rules

Never store credentials in plain text in manifest files
Always test the connection before declaring success
Always generate a schema.md — it's required for analysis
Create the full .knowledge/datasets/{id}/ tree even if profiling fails
If the user already has this dataset, ask before overwriting

Edge Cases

Directory doesn't exist: Offer to create it
No CSV files found: Check for other formats (.parquet, .json)
Connection fails repeatedly: Suggest checking credentials, firewall, VPN
Schema too large (>100 tables): Profile only, skip per-table details
Dataset name collision: Append a number (e.g., "mydata-2")