git clone https://github.com/ai-analyst-lab/ai-analyst
T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/setup" ~/.claude/skills/ai-analyst-lab-ai-analyst-setup && rm -rf "$T"
.claude/skills/setup/skill.mdSkill: /setup
Run a 4-phase conversational interview that populates the knowledge system from the user's real context. Turns a blank
.knowledge/ directory into a
fully configured analytical environment.
Parameters
- No arguments: Start from Phase 1 (or resume from last incomplete phase)
: Show current setup state/setup status
: Reset profile and preferences (Tier 1)/setup reset
: Full reset including dataset connections (Tier 2)/setup reset everything
Trigger Phrases
/setupset up my environmentconfigure the analystonboard me
Design Principles
- Conversational, not interrogative. You are a colleague getting to know someone, not a form engine. Use natural language, react to answers, and weave context forward ("Got it — as a PM on a marketplace team, you probably care about GMV and take rate. Let me ask about your data next.").
- 2-3 questions at a time, max. Never dump a wall of questions. Group them thematically and wait for a response before continuing.
- Validate responses. If a role sounds unusual or a path does not exist,
confirm before recording. ("You said your CSV directory is
. I do not see that directory — did you meandata/sales/
?")data/ - Allow skipping. Mark optional fields clearly. If the user says "skip"
or "I'll do this later", record
and move on. Never block progress on optional fields.null - Show progress. After each phase, display a brief summary of what was captured and what comes next.
State File
All setup state lives in
.knowledge/setup-state.yaml. Create it on first
run if it does not exist.
Schema
# .knowledge/setup-state.yaml setup_version: 1 started_at: "YYYY-MM-DDTHH:MM:SS" last_updated: "YYYY-MM-DDTHH:MM:SS" status: "complete" | "partial" | "in-progress" phases: role_and_team: status: "complete" | "skipped" | "pending" completed_at: "YYYY-MM-DDTHH:MM:SS" | null data_connection: status: "complete" | "partial" | "skipped" | "pending" completed_at: "YYYY-MM-DDTHH:MM:SS" | null partial_reason: null | "warehouse_mcp_needed" business_context: status: "complete" | "skipped" | "pending" completed_at: "YYYY-MM-DDTHH:MM:SS" | null preferences: status: "complete" | "skipped" | "pending" completed_at: "YYYY-MM-DDTHH:MM:SS" | null
Phase 1: Role & Team
Goal: Understand who the user is so we can adapt communication style, technical depth, and default output formats.
Questions (ask in 1-2 groups)
Group 1:
- "What's your role? (e.g., Product Manager, Data Scientist, Engineer, Marketing Analyst, exec)"
- "How technical are you with data? Pick the one that fits best:
- Beginner — I look at dashboards but rarely write queries
- Intermediate — I can write SQL and read basic stats
- Advanced — I build models, write complex SQL, and review pipelines"
Group 2: 3. "What team or department are you on?" (optional) 4. "What domain does your product operate in? (e.g., e-commerce, SaaS, fintech, marketplace, healthcare, media)" (optional)
Validation
- If role is empty or unrecognizable, ask once for clarification.
- Map common synonyms: "PM" -> Product Manager, "DS" -> Data Scientist, "analyst" -> Analyst, "eng" -> Engineer.
- Technical level must resolve to one of: beginner, intermediate, advanced.
Outputs
Write to
.knowledge/user/profile.md (create directory if needed):
# User Profile ## Role & Expertise - **Role:** {role} - **Technical level:** {technical_level} - **SQL comfort:** {inferred from technical_level: none|basic|intermediate|advanced} - **Statistics comfort:** {inferred: none|basic|intermediate|advanced} - **Domain:** {domain or "not specified"} - **Team:** {team or "not specified"} ## Communication Preferences _Set in Phase 4._ ## Corrections Log <!-- Format: YYYY-MM-DD | What was wrong | What was right -->
Update
.knowledge/setup-state.yaml:
- Set
phases.role_and_team.status: complete - Set
to current timestampphases.role_and_team.completed_at
Phase 1 Summary
Display:
Phase 1 complete — Role & Team Role: {role} Tech level: {technical_level} Domain: {domain} Team: {team} Next up: Phase 2 — Data Connection
Phase 2: Data Connection
Goal: Get the user's data connected so analyses can run.
Questions
Group 1:
- "Let's connect your data. What do you have?
- CSV files in a local directory
- DuckDB database file
- Cloud warehouse (MotherDuck, Postgres, BigQuery, Snowflake)
- Nothing yet — I want to use a sample dataset"
Branch Logic
If CSV:
- Ask: "What's the path to your CSV directory? (relative to this repo root)"
- Verify the directory exists and list .csv files found.
- If directory not found, suggest alternatives (check
,data/
).data/examples/ - If confirmed, invoke the Connect Data skill internally (
) to create the dataset brain and profile schema./connect-data type=csv
If DuckDB:
- Ask: "What's the path to your .duckdb file?"
- Verify it exists.
- If confirmed, invoke
to set up the connection./connect-data type=duckdb
If Cloud warehouse:
- Explain: "Cloud warehouses connect via MCP (Model Context Protocol). This
requires configuring
with your credentials.".claude/mcp.json - Route to
for full setup./connect-data - Mark this phase as
withpartial
.partial_reason: warehouse_mcp_needed - Do not block Phase 3. Continue the interview — data connection can be completed separately.
If Nothing yet / sample dataset:
- Check
for available sample datasets.data/examples/ - List them with brief descriptions.
- If user picks one, copy/link it and invoke
./connect-data type=csv - If user wants to skip: mark phase as
, note thatskipped
is available later./connect-data
Fork Decision
After Phase 2:
- If
: data is available. Continue to Phase 3.data_connection.status == "complete" - If
(warehouse MCP needed): continue to Phase 3 anyway. The user can finish data connection separately.data_connection.status == "partial" - If
: continue to Phase 3.data_connection.status == "skipped"
Outputs
Dataset artifacts are created by the
/connect-data skill (manifest.yaml,
schema.md, active.yaml). Phase 2 only tracks the interview state.
Update
.knowledge/setup-state.yaml:
- Set
appropriatelyphases.data_connection.status - Set
or leave null if partial/skippedphases.data_connection.completed_at - Set
if applicablephases.data_connection.partial_reason
Phase 2 Summary
Display:
Phase 2 complete — Data Connection Source: {type} ({path or "pending MCP setup"}) Tables: {N} tables found (or "N/A — skipped") Status: {connected | partial — warehouse setup needed | skipped} Next up: Phase 3 — Business Context
Phase 3: Business Context
Goal: Understand the business so analyses produce relevant insights, not just numbers.
Questions (ask in 2-3 groups)
Group 1:
- "What does your company/product do? Just a sentence or two is fine."
- "What are the 2-3 metrics your team cares about most? (e.g., conversion rate, MRR, DAU, retention, NPS)"
Group 2: 3. "What business question or problem are you trying to answer right now? This helps me prioritize what to explore first." (optional) 4. "Are there any current OKRs or goals I should know about?" (optional)
Group 3 (if domain warrants it): 5. "Any key segments I should know about? (e.g., free vs paid users, regions, platforms)" (optional) 6. "Is there seasonality or known patterns in your data? (e.g., holiday spikes, end-of-quarter effects)" (optional)
Validation
- Metrics: normalize common names ("CVR" -> "conversion rate", "rev" -> "revenue"). If a metric is ambiguous, ask for a brief definition.
- Business question: if provided, classify it using the Question Router skill (L1-L5) and note the level. This seeds the first analysis.
Outputs
Write to
.knowledge/user/business-context.md:
# Business Context ## Company & Product {company_description} ## Key Metrics | Metric | Definition | Notes | |--------|-----------|-------| | {metric_1} | {definition or "TBD"} | {any notes} | | {metric_2} | {definition or "TBD"} | | ## Current Focus - **Primary question:** {business_question or "Not specified"} - **OKRs/Goals:** {okrs or "Not specified"} ## Segments & Patterns - **Key segments:** {segments or "Not specified"} - **Seasonality:** {seasonality or "Not specified"}
If the user provided metrics and a dataset is connected, seed
.knowledge/datasets/{active}/metrics/index.yaml with stub entries for each
metric (name + empty definition). These can be fleshed out later with
/metrics.
Update
.knowledge/setup-state.yaml:
- Set
phases.business_context.status: complete - Set
phases.business_context.completed_at
Phase 3 Summary
Display:
Phase 3 complete — Business Context Product: {one-line summary} Key metrics: {metric_1}, {metric_2}, {metric_3} Focus: {business_question or "General exploration"} Next up: Phase 4 — Preferences
Phase 4: Preferences
Goal: Configure output style and communication preferences so results match what the user actually wants.
Questions (ask in 1-2 groups)
Group 1:
- "How much detail do you usually want in results?
- Executive summary — just the key findings and recommendations
- Standard — findings with supporting evidence and charts
- Deep dive — full methodology, validation details, and data tables"
- "Do you prefer lots of charts, or mostly text with a few visuals?
- Minimal — text-first, charts only when essential
- Standard — a chart for each key finding
- Chart-heavy — visualize everything possible"
Group 2: 3. "How do you usually share results? (helps me format exports)
- Slide deck
- Email summary
- Slack message
- Written brief
- Jupyter notebook
- Multiple of the above" (optional)
- "Anything else I should know about how you like to work? (e.g., 'always show me the SQL', 'I hate pie charts', 'keep it under 5 slides')" (optional)
Validation
- Detail level must resolve to: executive-summary, standard, deep-dive.
- Chart preference must resolve to: minimal, standard, chart-heavy.
- Export channels are free-form but normalize to the
format list./export
Outputs
Update
.knowledge/user/profile.md — fill in the Communication Preferences
section:
## Communication Preferences - **Detail level:** {detail_level} - **Chart preference:** {chart_preference} - **Narrative style:** {inferred: bullet-points for exec-summary, prose for deep-dive, mixed for standard} - **Preferred exports:** {export_channels} - **Custom notes:** {anything_else or "None"}
Update
.knowledge/setup-state.yaml:
- Set
phases.preferences.status: complete - Set
phases.preferences.completed_at - Set
(orstatus: complete
if data_connection was partial)partial - Set
last_updated
Setup Complete Summary
After Phase 4, display the final summary:
=== SETUP COMPLETE === Role: {role} ({technical_level}) Domain: {domain} Data: {dataset_name} — {N} tables ({source_type}) Key metrics: {metric_1}, {metric_2}, {metric_3} Detail level: {detail_level} Charts: {chart_preference} Status: {"Ready for analysis" | "Partial — data connection pending"} Get started: - Ask a question: "What's our {metric_1} trend?" - Explore data: /data - Full pipeline: /run-pipeline - Dev context: /setup-dev-context (optional — for development workflow preferences)
If setup status is
partial, also display:
To finish data setup: /connect-data
Subcommand: /setup status
Show the current setup state by reading
.knowledge/setup-state.yaml.
Output Format
Setup Status ============ Phase 1 — Role & Team: {status} {completed_at or ""} Phase 2 — Data Connection: {status} {completed_at or ""} Phase 3 — Business Context: {status} {completed_at or ""} Phase 4 — Preferences: {status} {completed_at or ""} Overall: {status} Started: {started_at} Updated: {last_updated}
If no setup-state.yaml exists:
Setup has not been started yet. Run /setup to begin.
Subcommand: /setup reset
Two-tier reset system to prevent accidental data loss.
Tier 1: /setup reset
/setup resetClears profile and preferences (Phase 1 + Phase 4 data). Does NOT touch data connections or business context.
What it does:
- Delete
.knowledge/user/profile.md - Reset
andphases.role_and_team
tophases.preferences
in setup-state.yamlpending - Set
status: partial - Set
last_updated
Confirmation required: Ask once: "This will reset your role profile and output preferences. Your data connections and business context are safe. Continue? (yes/no)"
Tier 2: /setup reset everything
/setup reset everythingClears the entire setup — profile, preferences, business context, AND dataset connections. This is a destructive operation.
What it does:
- Delete
.knowledge/user/profile.md - Delete
.knowledge/user/business-context.md - Delete all
directories.knowledge/datasets/*/ - Reset
to.knowledge/active.yamlactive_dataset: null - Reset
to all-pending state.knowledge/setup-state.yaml - Clear
entries added by setupdata_sources.yaml
Confirmation required: The user must type the exact phrase
reset everything to confirm.
Prompt:
This will erase your entire setup: - User profile and preferences - Business context - All dataset connections and schema documentation This cannot be undone. To confirm, type: reset everything
If the user types anything other than
reset everything, cancel the
operation: "Reset cancelled. Your setup is unchanged."
Phase 5 Note: Development Context
Phase 5 (development context) is opt-in and independent of the core setup flow. It covers development workflow preferences such as IDE, language, framework conventions, and code style preferences.
At the end of Phase 4, mention its existence:
Optional: Run /setup-dev-context to configure development workflow preferences (IDE, languages, code style). This is independent of your analytics setup.
This skill does NOT implement Phase 5. The
/setup-dev-context skill
handles it separately.
Resume Logic
When
/setup is invoked and .knowledge/setup-state.yaml already exists:
- Read the state file.
- Find the first phase with status
.pending - If all phases are
, display:completeSetup is already complete. Use /setup status to review, or /setup reset to start over. - If some phases are complete, greet briefly and resume:
Welcome back. Phases 1-2 are done. Picking up at Phase 3 — Business Context. - If a phase is
, offer to complete it or skip:partialPhase 2 (Data Connection) is partially complete — your warehouse needs MCP configuration. Want to finish that now, or continue to Phase 3?
Anti-Patterns
- Never dump all questions at once. Always group 2-3 and wait for a response.
- Never block on optional fields. If the user says "skip" or "later", accept it and move on.
- Never overwrite existing profile data silently. If profile.md already exists when starting Phase 1, warn: "You already have a profile. Running setup will overwrite it. Continue?"
- Never store credentials in setup-state.yaml. Data connection
credentials go through
and are stored in manifest.yaml or environment variables only./connect-data - Never skip the state file update. Every phase completion must be written to setup-state.yaml before proceeding to the next phase.
- Never run Phase 3+ without asking Phase 1 first (unless resuming). The role context from Phase 1 shapes how questions are asked in later phases.
- Never combine reset tiers.
is always Tier 1. Tier 2 requires the explicit/setup reset
phrase.reset everything
Edge Cases
| Scenario | Handling |
|---|---|
User runs but profile.md already exists | Warn and ask to confirm overwrite before proceeding |
| CSV path does not exist | Suggest alternatives, check and |
| User provides warehouse type but no MCP | Mark Phase 2 as partial, continue interview |
| User skips all optional fields | That is fine. Record nulls and proceed. |
| User wants to jump to a specific phase | Allow it: "/setup phase 3" resumes from Phase 3 |
| Session ends mid-interview | State is saved per-phase. Next resumes. |
called inside a pipeline | Warn: "Setup changes may affect the running pipeline. Finish the pipeline first, or continue at your own risk." |
| User gives contradictory answers | Ask once for clarification. Record what they confirm. |