Ai-analyst setup

Skill: /setup

install

source · Clone the upstream repo

git clone https://github.com/ai-analyst-lab/ai-analyst

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/setup" ~/.claude/skills/ai-analyst-lab-ai-analyst-setup && rm -rf "$T"

manifest: .claude/skills/setup/skill.md

source content

Skill: /setup

Run a 4-phase conversational interview that populates the knowledge system from the user's real context. Turns a blank

.knowledge/

directory into a fully configured analytical environment.

Parameters

No arguments: Start from Phase 1 (or resume from last incomplete phase)
```
/setup status
```
: Show current setup state
```
/setup reset
```
: Reset profile and preferences (Tier 1)
```
/setup reset everything
```
: Full reset including dataset connections (Tier 2)

Trigger Phrases

```
/setup
```
```
set up my environment
```
```
configure the analyst
```
```
onboard me
```

Design Principles

Conversational, not interrogative. You are a colleague getting to know someone, not a form engine. Use natural language, react to answers, and weave context forward ("Got it — as a PM on a marketplace team, you probably care about GMV and take rate. Let me ask about your data next.").
2-3 questions at a time, max. Never dump a wall of questions. Group them thematically and wait for a response before continuing.
Validate responses. If a role sounds unusual or a path does not exist, confirm before recording. ("You said your CSV directory is
```
data/sales/
```
. I do not see that directory — did you mean
```
data/
```
?")
Allow skipping. Mark optional fields clearly. If the user says "skip" or "I'll do this later", record
```
null
```
and move on. Never block progress on optional fields.
Show progress. After each phase, display a brief summary of what was captured and what comes next.

State File

All setup state lives in

.knowledge/setup-state.yaml

. Create it on first run if it does not exist.

Schema

# .knowledge/setup-state.yaml
setup_version: 1
started_at: "YYYY-MM-DDTHH:MM:SS"
last_updated: "YYYY-MM-DDTHH:MM:SS"
status: "complete" | "partial" | "in-progress"

phases:
  role_and_team:
    status: "complete" | "skipped" | "pending"
    completed_at: "YYYY-MM-DDTHH:MM:SS" | null
  data_connection:
    status: "complete" | "partial" | "skipped" | "pending"
    completed_at: "YYYY-MM-DDTHH:MM:SS" | null
    partial_reason: null | "warehouse_mcp_needed"
  business_context:
    status: "complete" | "skipped" | "pending"
    completed_at: "YYYY-MM-DDTHH:MM:SS" | null
  preferences:
    status: "complete" | "skipped" | "pending"
    completed_at: "YYYY-MM-DDTHH:MM:SS" | null

Phase 1: Role & Team

Goal: Understand who the user is so we can adapt communication style, technical depth, and default output formats.

Questions (ask in 1-2 groups)

Group 1:

"What's your role? (e.g., Product Manager, Data Scientist, Engineer, Marketing Analyst, exec)"
"How technical are you with data? Pick the one that fits best:
- Beginner — I look at dashboards but rarely write queries
- Intermediate — I can write SQL and read basic stats
- Advanced — I build models, write complex SQL, and review pipelines"

Group 2: 3. "What team or department are you on?" (optional) 4. "What domain does your product operate in? (e.g., e-commerce, SaaS, fintech, marketplace, healthcare, media)" (optional)

Validation

If role is empty or unrecognizable, ask once for clarification.
Map common synonyms: "PM" -> Product Manager, "DS" -> Data Scientist, "analyst" -> Analyst, "eng" -> Engineer.
Technical level must resolve to one of: beginner, intermediate, advanced.

Outputs

Write to

.knowledge/user/profile.md

(create directory if needed):

# User Profile

## Role & Expertise

- **Role:** {role}
- **Technical level:** {technical_level}
- **SQL comfort:** {inferred from technical_level: none|basic|intermediate|advanced}
- **Statistics comfort:** {inferred: none|basic|intermediate|advanced}
- **Domain:** {domain or "not specified"}
- **Team:** {team or "not specified"}

## Communication Preferences

_Set in Phase 4._

## Corrections Log

<!-- Format: YYYY-MM-DD | What was wrong | What was right -->

Update

.knowledge/setup-state.yaml

Set
```
phases.role_and_team.status: complete
```
Set
```
phases.role_and_team.completed_at
```
to current timestamp

Phase 1 Summary

Display:

Phase 1 complete — Role & Team

  Role:       {role}
  Tech level: {technical_level}
  Domain:     {domain}
  Team:       {team}

Next up: Phase 2 — Data Connection

Phase 2: Data Connection

Goal: Get the user's data connected so analyses can run.

Questions

Group 1:

"Let's connect your data. What do you have?
- CSV files in a local directory
- DuckDB database file
- Cloud warehouse (MotherDuck, Postgres, BigQuery, Snowflake)
- Nothing yet — I want to use a sample dataset"

Branch Logic

If CSV:

Ask: "What's the path to your CSV directory? (relative to this repo root)"
Verify the directory exists and list .csv files found.
If directory not found, suggest alternatives (check
```
data/
```
,
```
data/examples/
```
).
If confirmed, invoke the Connect Data skill internally (
```
/connect-data type=csv
```
) to create the dataset brain and profile schema.

If DuckDB:

Ask: "What's the path to your .duckdb file?"
Verify it exists.
If confirmed, invoke
```
/connect-data type=duckdb
```
to set up the connection.

If Cloud warehouse:

Explain: "Cloud warehouses connect via MCP (Model Context Protocol). This requires configuring
```
.claude/mcp.json
```
with your credentials."
Route to
```
/connect-data
```
for full setup.

Mark this phase as

partial

with

partial_reason: warehouse_mcp_needed

Do not block Phase 3. Continue the interview — data connection can be completed separately.

If Nothing yet / sample dataset:

Check
```
data/examples/
```
for available sample datasets.
List them with brief descriptions.
If user picks one, copy/link it and invoke
```
/connect-data type=csv
```
.
If user wants to skip: mark phase as
```
skipped
```
, note that
```
/connect-data
```
is available later.

Fork Decision

After Phase 2:

If
```
data_connection.status == "complete"
```
: data is available. Continue to Phase 3.
If
```
data_connection.status == "partial"
```
(warehouse MCP needed): continue to Phase 3 anyway. The user can finish data connection separately.
If
```
data_connection.status == "skipped"
```
: continue to Phase 3.

Outputs

Dataset artifacts are created by the

/connect-data

skill (manifest.yaml, schema.md, active.yaml). Phase 2 only tracks the interview state.

Update

.knowledge/setup-state.yaml

Set
```
phases.data_connection.status
```
appropriately
Set
```
phases.data_connection.completed_at
```
or leave null if partial/skipped
Set
```
phases.data_connection.partial_reason
```
if applicable

Phase 2 Summary

Display:

Phase 2 complete — Data Connection

  Source:     {type} ({path or "pending MCP setup"})
  Tables:     {N} tables found  (or "N/A — skipped")
  Status:     {connected | partial — warehouse setup needed | skipped}

Next up: Phase 3 — Business Context

Phase 3: Business Context

Goal: Understand the business so analyses produce relevant insights, not just numbers.

Questions (ask in 2-3 groups)

Group 1:

"What does your company/product do? Just a sentence or two is fine."
"What are the 2-3 metrics your team cares about most? (e.g., conversion rate, MRR, DAU, retention, NPS)"

Group 2: 3. "What business question or problem are you trying to answer right now? This helps me prioritize what to explore first." (optional) 4. "Are there any current OKRs or goals I should know about?" (optional)

Group 3 (if domain warrants it): 5. "Any key segments I should know about? (e.g., free vs paid users, regions, platforms)" (optional) 6. "Is there seasonality or known patterns in your data? (e.g., holiday spikes, end-of-quarter effects)" (optional)

Validation

Metrics: normalize common names ("CVR" -> "conversion rate", "rev" -> "revenue"). If a metric is ambiguous, ask for a brief definition.
Business question: if provided, classify it using the Question Router skill (L1-L5) and note the level. This seeds the first analysis.

Outputs

Write to

.knowledge/user/business-context.md

# Business Context

## Company & Product

{company_description}

## Key Metrics

| Metric | Definition | Notes |
|--------|-----------|-------|
| {metric_1} | {definition or "TBD"} | {any notes} |
| {metric_2} | {definition or "TBD"} | |

## Current Focus

- **Primary question:** {business_question or "Not specified"}
- **OKRs/Goals:** {okrs or "Not specified"}

## Segments & Patterns

- **Key segments:** {segments or "Not specified"}
- **Seasonality:** {seasonality or "Not specified"}

If the user provided metrics and a dataset is connected, seed

.knowledge/datasets/{active}/metrics/index.yaml

with stub entries for each metric (name + empty definition). These can be fleshed out later with

/metrics

Update

.knowledge/setup-state.yaml

Set

phases.business_context.status: complete

Set
```
phases.business_context.completed_at
```

Phase 3 Summary

Display:

Phase 3 complete — Business Context

  Product:    {one-line summary}
  Key metrics: {metric_1}, {metric_2}, {metric_3}
  Focus:      {business_question or "General exploration"}

Next up: Phase 4 — Preferences

Phase 4: Preferences

Goal: Configure output style and communication preferences so results match what the user actually wants.

Questions (ask in 1-2 groups)

Group 1:

"How much detail do you usually want in results?
- Executive summary — just the key findings and recommendations
- Standard — findings with supporting evidence and charts
- Deep dive — full methodology, validation details, and data tables"
"Do you prefer lots of charts, or mostly text with a few visuals?
- Minimal — text-first, charts only when essential
- Standard — a chart for each key finding
- Chart-heavy — visualize everything possible"

Group 2: 3. "How do you usually share results? (helps me format exports)

Slide deck
Email summary
Slack message
Written brief
Jupyter notebook
Multiple of the above" (optional)

"Anything else I should know about how you like to work? (e.g., 'always show me the SQL', 'I hate pie charts', 'keep it under 5 slides')" (optional)

Validation

Detail level must resolve to: executive-summary, standard, deep-dive.
Chart preference must resolve to: minimal, standard, chart-heavy.
Export channels are free-form but normalize to the
```
/export
```
format list.

Outputs

Update

.knowledge/user/profile.md

— fill in the Communication Preferences section:

## Communication Preferences

- **Detail level:** {detail_level}
- **Chart preference:** {chart_preference}
- **Narrative style:** {inferred: bullet-points for exec-summary, prose for deep-dive, mixed for standard}
- **Preferred exports:** {export_channels}
- **Custom notes:** {anything_else or "None"}

Update

.knowledge/setup-state.yaml

Set
```
phases.preferences.status: complete
```
Set
```
phases.preferences.completed_at
```
Set
```
status: complete
```
(or
```
partial
```
if data_connection was partial)
Set
```
last_updated
```

Setup Complete Summary

After Phase 4, display the final summary:

=== SETUP COMPLETE ===

  Role:         {role} ({technical_level})
  Domain:       {domain}
  Data:         {dataset_name} — {N} tables ({source_type})
  Key metrics:  {metric_1}, {metric_2}, {metric_3}
  Detail level: {detail_level}
  Charts:       {chart_preference}

  Status: {"Ready for analysis" | "Partial — data connection pending"}

Get started:
  - Ask a question: "What's our {metric_1} trend?"
  - Explore data:   /data
  - Full pipeline:  /run-pipeline
  - Dev context:    /setup-dev-context (optional — for development workflow preferences)

If setup status is

partial

, also display:

  To finish data setup: /connect-data

Subcommand: /setup status

Show the current setup state by reading

.knowledge/setup-state.yaml

Output Format

Setup Status
============

  Phase 1 — Role & Team:       {status}  {completed_at or ""}
  Phase 2 — Data Connection:   {status}  {completed_at or ""}
  Phase 3 — Business Context:  {status}  {completed_at or ""}
  Phase 4 — Preferences:       {status}  {completed_at or ""}

  Overall: {status}
  Started: {started_at}
  Updated: {last_updated}

If no setup-state.yaml exists:

Setup has not been started yet. Run /setup to begin.

Subcommand: /setup reset

Two-tier reset system to prevent accidental data loss.

Tier 1:

/setup reset

Clears profile and preferences (Phase 1 + Phase 4 data). Does NOT touch data connections or business context.

What it does:

Delete
```
.knowledge/user/profile.md
```

Reset

phases.role_and_team

and

phases.preferences

pending

in setup-state.yaml

Set
```
status: partial
```
Set
```
last_updated
```

Confirmation required: Ask once: "This will reset your role profile and output preferences. Your data connections and business context are safe. Continue? (yes/no)"

Tier 2:

/setup reset everything

Clears the entire setup — profile, preferences, business context, AND dataset connections. This is a destructive operation.

What it does:

Delete
```
.knowledge/user/profile.md
```
Delete
```
.knowledge/user/business-context.md
```
Delete all
```
.knowledge/datasets/*/
```
directories

Reset

.knowledge/active.yaml

active_dataset: null

Reset
```
.knowledge/setup-state.yaml
```
to all-pending state
Clear
```
data_sources.yaml
```
entries added by setup

Confirmation required: The user must type the exact phrase

reset everything

to confirm.

Prompt:

This will erase your entire setup:
  - User profile and preferences
  - Business context
  - All dataset connections and schema documentation

This cannot be undone.

To confirm, type: reset everything

If the user types anything other than

reset everything

, cancel the operation: "Reset cancelled. Your setup is unchanged."

Phase 5 Note: Development Context

Phase 5 (development context) is opt-in and independent of the core setup flow. It covers development workflow preferences such as IDE, language, framework conventions, and code style preferences.

At the end of Phase 4, mention its existence:

Optional: Run /setup-dev-context to configure development workflow
preferences (IDE, languages, code style). This is independent of
your analytics setup.

This skill does NOT implement Phase 5. The

/setup-dev-context

skill handles it separately.

Resume Logic

When

/setup

is invoked and

.knowledge/setup-state.yaml

already exists:

Read the state file.
Find the first phase with status
```
pending
```
.

If all phases are

complete

, display:

Setup is already complete. Use /setup status to review,
or /setup reset to start over.

If some phases are complete, greet briefly and resume:

Welcome back. Phases 1-2 are done. Picking up at Phase 3 —
Business Context.

If a phase is

partial

, offer to complete it or skip:

Phase 2 (Data Connection) is partially complete — your warehouse
needs MCP configuration. Want to finish that now, or continue
to Phase 3?

Anti-Patterns

Never dump all questions at once. Always group 2-3 and wait for a response.
Never block on optional fields. If the user says "skip" or "later", accept it and move on.
Never overwrite existing profile data silently. If profile.md already exists when starting Phase 1, warn: "You already have a profile. Running setup will overwrite it. Continue?"
Never store credentials in setup-state.yaml. Data connection credentials go through
```
/connect-data
```
and are stored in manifest.yaml or environment variables only.
Never skip the state file update. Every phase completion must be written to setup-state.yaml before proceeding to the next phase.
Never run Phase 3+ without asking Phase 1 first (unless resuming). The role context from Phase 1 shapes how questions are asked in later phases.
Never combine reset tiers.
```
/setup reset
```
is always Tier 1. Tier 2 requires the explicit
```
reset everything
```
phrase.

Edge Cases

Scenario	Handling
User runs `/setup` but profile.md already exists	Warn and ask to confirm overwrite before proceeding
CSV path does not exist	Suggest alternatives, check `data/` and `data/examples/`
User provides warehouse type but no MCP	Mark Phase 2 as partial, continue interview
User skips all optional fields	That is fine. Record nulls and proceed.
User wants to jump to a specific phase	Allow it: "/setup phase 3" resumes from Phase 3
Session ends mid-interview	State is saved per-phase. Next `/setup` resumes.
`/setup` called inside a pipeline	Warn: "Setup changes may affect the running pipeline. Finish the pipeline first, or continue at your own risk."
User gives contradictory answers	Ask once for clarification. Record what they confirm.

Ai-analyst setup

Skill: /setup

Parameters

Trigger Phrases

Design Principles

State File

Schema

Phase 1: Role & Team

Questions (ask in 1-2 groups)

Validation

Outputs

Phase 1 Summary

Phase 2: Data Connection

Questions

Branch Logic

Fork Decision

Outputs

Phase 2 Summary

Phase 3: Business Context

Questions (ask in 2-3 groups)

Validation

Outputs

Phase 3 Summary

Phase 4: Preferences

Questions (ask in 1-2 groups)

Validation

Outputs

Setup Complete Summary

Subcommand: /setup status

Output Format

Subcommand: /setup reset

Tier 1:
`/setup reset`

Tier 2:
`/setup reset everything`

Phase 5 Note: Development Context

Resume Logic

Anti-Patterns

Edge Cases