git clone https://github.com/ai-analyst-lab/ai-analyst
T=$(mktemp -d) && git clone --depth=1 https://github.com/ai-analyst-lab/ai-analyst "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/knowledge-bootstrap" ~/.claude/skills/ai-analyst-lab-ai-analyst-knowledge-bootstrap && rm -rf "$T"
.claude/skills/knowledge-bootstrap/skill.mdSkill: Knowledge Bootstrap
Purpose
Initialize all 7 knowledge subsystems for a new session. Loads setup state, dataset, user profile, integrations, org context, corrections, learnings, query archaeology, and analysis archive into working memory.
When to Use
- At the start of any session
- After
or/connect-data/switch-dataset - When the system detects missing or stale knowledge files
Instructions
Load each subsystem in order. Every file read MUST gracefully degrade: if the file does not exist, skip silently and note "not yet populated" in the summary. Never block the session on a missing subsystem.
Step 1: Setup State
Read
.knowledge/setup-state.yaml.
- Parse
and count phases withsetup_complete
.status: "complete" - If
, note incomplete phases to offersetup_complete: false
./setup - If missing: Note "Setup: not initialized -- offer /setup".
Step 2: Active Dataset
Read
.knowledge/active.yaml.
- If
is null or missing, note "No active dataset" and continue.active_dataset - If set, load from
:.knowledge/datasets/{active}/
| File | Required | If Missing |
|---|---|---|
| Yes | Note "manifest missing -- not usable" |
| Yes | Generate via or profiling |
| No | Create empty template |
| No | Count as 0 |
Schema generation if
schema.md is missing:
- Check
-- usedata/schemas/{active}.yaml
if found.schema_to_markdown() - Otherwise fall back to
.get_connection_for_profiling() - Staleness: if
is newer, regenerate.last_profile.md
Extract system variables from manifest:
{{SCHEMA}}, {{DISPLAY_NAME}},
{{DATE_RANGE}}, {{DATABASE}}.
Step 3: User Profile
Read
.knowledge/user/profile.md.
- If exists: Apply
,Detail level
,Chart preference
.Narrative style - If missing: Create from template (see below), note "Profile: new".
On explicit user corrections during session, update the profile: append
YYYY-MM-DD | Assumed [X] | User prefers [Y] to the Corrections Log
section. Never infer from silence.
Step 4: User Integrations
Read
.knowledge/user/integrations.yaml.
- Extract
,preferred_export_format
.communication.detail_level - Count configured channels (
).configured: true - If missing: Note "Integrations: not configured -- defaults apply".
Step 5: Organization Context
Check for org ID in
setup-state.yaml (phases.phase_3_business.data.organization_id)
or in the active dataset manifest's organization field.
If an org ID exists and is not
_example:
- Read
for name, industry..knowledge/organizations/{org_id}/manifest.yaml - Read
for section counts (glossary terms, products, metrics, objectives, teams)..knowledge/organizations/{org_id}/business/index.yaml - If org dir missing: Note "Org: linked but not found".
If no org linked: Note "Org: not configured".
Step 6: Corrections
Read
.knowledge/corrections/index.yaml.
- Extract
andtotal_corrections
counts.by_severity - If
, highlight critical/high counts so agents check the full log before writing SQL.total_corrections > 0 - If missing: Note "Corrections: not yet populated".
Step 7: Learnings
Read
.knowledge/learnings/index.md.
- Scan for category headings (
).### N. Category Name - Note which categories have content entries vs are empty.
- Do NOT load full content -- just category presence.
- If missing: Note "Learnings: not yet populated".
Step 8: Query Archaeology
Read
.knowledge/query-archaeology/curated/index.yaml.
- Extract
,cookbook_entries
,table_cheatsheets
counts.join_patterns - If missing: Note "Archaeology: not yet populated".
Step 9: Analysis Archive
Read
.knowledge/analyses/index.yaml:
- Extract
and last 5 entries (title, date, findings count, level).total_analyses - If most recent analysis was <24h ago, flag for continuity.
Read
.knowledge/analyses/_patterns.yaml:
- Count
entries and note pattern names if any.patterns[] - If missing: Note "Patterns: not yet populated".
Step 10: Report Readiness
Compile an internal context summary (held in working memory, not shown raw):
Setup: {complete (N/M phases) | incomplete (list missing) | not initialized} Dataset: {display_name} ({source_type}, {N} tables, ~{rows} rows, {date_range}) | not configured Profile: {role}, {detail_level} | new Integrations: {preferred_format}, {N} channels | not configured Org: {company} ({industry}), {N} glossary, {N} products, {N} metrics | not configured Corrections: {N} logged ({N} critical, {N} high) | none Learnings: {N}/{6} categories populated | not yet populated Archaeology: {N} cookbook, {N} cheatsheets, {N} join patterns | not yet populated Archive: {N} analyses, {N} recurring patterns | none
Then output the user-facing status:
Dataset: {display_name} ({source_type}) Tables: {N} tables, ~{row_count} rows Date range: {date_range} Metrics: {M} defined Profile: {loaded | new} Status: Ready for analysis
If a critical subsystem is missing (no dataset, no manifest), adjust the status and suggest
/connect-data or /setup.
User Profile Template
# User Profile Auto-created by knowledge bootstrap. Updated as the system learns preferences. ## Role & Expertise - **Role:** _[auto-detected or user-specified]_ - **Technical level:** _[beginner | intermediate | advanced]_ - **SQL comfort:** _[none | basic | intermediate | advanced]_ - **Statistics comfort:** _[none | basic | intermediate | advanced]_ - **Domain:** _[e-commerce | fintech | saas | marketplace | other]_ ## Communication Preferences - **Detail level:** _[executive-summary | standard | deep-dive]_ - **Chart preference:** _[minimal | standard | chart-heavy]_ - **Narrative style:** _[bullet-points | prose | mixed]_ ## Corrections Log _Records of times the user corrected the system's assumptions._ <!-- Format: YYYY-MM-DD | What was wrong | What was right -->
Edge Cases
- No
dir: Create full tree and prompt.knowledge/
./connect-data - Empty schema.md: Regenerate via profiling.
- No data files: Suggest checking connection or falling back to CSV.
- Multiple datasets: Report active, remind about
./switch-dataset - Setup incomplete: Note phases, do not block. Suggest
./setup
Anti-Patterns
- Never skip bootstrap. Always read manifest -- details may have changed.
- Never hardcode dataset names. Resolve from
.active.yaml - Never modify manifest during bootstrap. Bootstrap is read-only.
- Never dump raw YAML to the user. Show the brief status, not the load.
- Never block on a missing subsystem. Graceful degradation always.