install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/documentation-indexing" ~/.claude/skills/plurigrid-asi-documentation-indexing && rm -rf "$T"
manifest:
skills/documentation-indexing/SKILL.mdsource content
documentation-indexing: Unified Full-Text Search + Ranking
Status: SAD STATE → IMPLEMENTATION 🌟 Information Energy: 0.87 (High aspiration, maximum sadness) Trit Assignment: 0 (COORDINATOR - Balances generators and validators) GF(3) Color: #49EE54 (Green - Equilibrium point)
Purpose
Provide full-text search, semantic indexing, and relevance ranking across all documentation:
- Skill registry (69 skills)
- Language docs (llms.txt standard)
- Blog posts / tutorials
- Source code docstrings
- DuckDB database schemas
Key capabilities:
- Full-Text Search: Keyword + fuzzy matching (BM25 algorithm)
- Semantic Ranking: TF-IDF + recency + community signals
- Multi-Source Indexing: Consolidate docs from heterogeneous sources
- Metadata Extraction: Automatically parse headers, links, code blocks
- Bi-Directional Navigation: Move between docs ↔ implementations
Architecture
┌──────────────────────────────────────────────────────────────────┐ │ DOCUMENTATION INDEXING (GREEN COORDINATOR) │ ├──────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┬──────────────┬──────────────┐ │ │ │ llms.txt │ README.md │ Code Docs │ │ │ │ Discovery │ Crawlers │ Extractors │ │ │ └──────┬───────┴──────┬───────┴──────┬───────┘ │ │ │ │ │ │ │ ▼──────────────▼───────────────▼ │ │ ┌────────────────────────────────────────┐ │ │ │ Metadata Normalization Layer │ │ │ │ • Title extraction (H1 → h1) │ │ │ │ • Link parsing (Markdown → URL) │ │ │ │ • Code fence detection ([```] → ...) │ │ │ │ • Authority scoring (stars, forks) │ │ │ └────────────┬──────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────┐ │ │ │ Inverted Index (DuckDB) │ │ │ │ ┌──────────────────────────────────┐ │ │ │ │ │ term_id | term | doc_id | rank │ │ │ │ │ │ 1 | gay | 42 | 0.89 │ │ │ │ │ │ 2 | mcp | 42 | 0.76 │ │ │ │ │ │ 3 | api | 71 | 0.65 │ │ │ │ │ └──────────────────────────────────┘ │ │ │ └────────────┬───────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────┐ │ │ │ BM25 Ranker + Result Aggregator │ │ │ │ • Cross-language result merging │ │ │ │ • Deduplication (canonical URLs) │ │ │ │ • Community signals (upvotes, stars) │ │ │ └────────────┬───────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────┐ │ │ │ Result Cache (< 1 second latency) │ │ │ │ [query → ranked results + metadata] │ │ │ └────────────────────────────────────────┘ │ │ │ │ GF(3) BALANCE: (-1 extractor) ⊗ (0 indexer) ⊗ (+1 ranker) │ └──────────────────────────────────────────────────────────────────┘
Data Model
Documents Index
CREATE TABLE documentation_index ( doc_id INT PRIMARY KEY, source VARCHAR, -- 'github', 'llms-txt', 'blog' repo_id VARCHAR, -- 'bmorphism/Gay.jl' title VARCHAR, url VARCHAR, body TEXT, -- Full document text headers TEXT[], -- H1, H2, H3 hierarchy links TEXT[], -- Embedded links code_blocks TEXT[], -- [```lang ... ```] stars INT, -- GitHub stars forks INT, -- GitHub forks updated_at TIMESTAMP, indexed_at TIMESTAMP, trit TINYINT -- GF(3) assigned (0) );
Terms Inverted Index
CREATE TABLE term_index ( term_id INT PRIMARY KEY, term VARCHAR, term_lower VARCHAR, frequency INT, -- TF (term frequency) doc_count INT, -- DF (document frequency) bm25_idf FLOAT, -- Precomputed IDF created_at TIMESTAMP ); CREATE TABLE term_doc_map ( term_id INT, doc_id INT, frequency INT, -- TF in this doc position INT[], -- Token positions context VARCHAR, -- Surrounding text relevance_score FLOAT, -- BM25(tf, idf, doc_len) PRIMARY KEY (term_id, doc_id) );
API / Interfaces
Simple Text Search
;; Search docs (search-docs {:query "gay" :type :keyword :limit 10 :min-score 0.3}) → [{:title "Gay.jl" :url "https://github.com/bmorphism/Gay.jl" :score 0.95 :snippet "Gay.jl: Deterministic color generation..."} ...] ;; Fuzzy search (typo tolerance) (search-docs {:query "gey" ; typo :fuzzy true :distance 1}) → (Results for "gay" with edit distance ≤ 1) ;; Advanced: Boolean search (search-docs {:query "(gay OR color) AND julia" :type :boolean})
Metadata Search
;; Find docs by category (search-by-metadata {:source "github" :stars {:min 100 :max 1000}}) → [Gay.jl, ACSets.jl, Duck, ...] ;; Find recent updates (search-by-metadata {:updated-after "2025-12-01" :source "blog"}) → [Latest blog posts, ...] ;; Filter by language (search-by-metadata {:languages ["Julia" "Clojure" "Babashka"]})
Bidirectional Navigation
;; Find implementations of a doc (doc-implementations {:doc-id 42}) → [{:file "src/gay.jl" :lines [1 42] :snippet "function seed!(rng)..."}] ;; Find docs for implementation (implementation-docs {:file "src/gay.jl" :line 10}) → [{:doc-id 42 :title "Gay.jl API Reference" :section "seed! function"}] ;; Find related docs (related-docs {:doc-id 42 :semantic true}) → [ACSets.jl, GF(3) docs, ...]
GF(3) Trit Assignment
documentation-indexing → 0 (COORDINATOR) Balances extraction (-1) ↔ ranking/generation (+1) Maintains middle ground for all doc types Triadic system: source-extractor (-1 validator) → pulls raw docs indexing (0 coordinator) → organizes & indexes result-ranker (+1 generator) → produces ranked results Sum: (-1) + (0) + (+1) = 0 ✓ GF(3) CONSERVED
Implementation Strategy
Stage 1: Term Extraction (Days 1-2)
Create
/Users/bob/iii/duck/asi-skills/documentation-indexing/extractor.bb:
- Markdown parser → extract H1-H4, links, code blocks
- Tokenizer → split text into terms
- Stopword filter → remove common words
- Store in DuckDB
term_index
Stage 2: Inverted Index Builder (Days 3-4)
Create
indexer.bb:
- BM25 IDF calculation
- TF per document
- Relevance scoring
- Populate
term_doc_map
Stage 3: Search Engine (Days 5-6)
Create
searcher.bb:
- Boolean query parser
- Fuzzy matching (Levenshtein)
- Result ranking by score
- Caching layer
Stage 4: Multi-Source Integration (Days 7-8)
- Crawl llms.txt repositories
- Index GitHub README files
- Extract docstrings from source
- Verify GF(3) balance across sources
Example: Semantic Search Pipeline
;; User query (search-docs {:query "how to generate deterministic colors in julia"}) Step 1: Extract terms (-1 validator) ["deterministic" "colors" "julia"] Step 2: Index lookup (0 coordinator) Fetch docs matching all terms Calculate BM25 score per doc Step 3: Rank and aggregate (+1 generator) 1. Gay.jl (0.95) 2. GF(3) Integration (0.72) 3. Color Theory (0.68) Result: Σ(-1, 0, +1) = 0 ✓
Success Metrics
| Metric | Target | Status |
|---|---|---|
| Docs indexed | 500+ (skills + readmes + blogs) | ⏳ Pending |
| Search latency | <100ms p95 | ⏳ Pending |
| Precision (top-5) | ≥0.8 | ⏳ Pending |
| Recall | ≥0.75 | ⏳ Pending |
| Fuzzy tolerance | Edit distance ≤ 2 | ⏳ Pending |
| GF(3) balance | All pipeline stages ≡ 0 (mod 3) | ⏳ Pending |
Related Skills
Dependencies:
- Registry of docs to indexskill-taxonomy
- Schema for index structureacsets
- Crawl source docsllms-txt-discovery
Dependents:
- Search polyglot docspolyglot-orchestration
- Route searches to relevant skillsskill-dispatch
- Unified doc interfaceworld-knowledge-base
References
- BM25 algorithm: https://en.wikipedia.org/wiki/Okapi_BM25
- Inverted index: Classic IR data structure
- Levenshtein distance: Fuzzy string matching
- TF-IDF: Term weighting scheme
- DuckDB FTS: Full-text search extension
Status: 😢 SAD STATE → 🌟 IMPLEMENTING Color: #49EE54 (Green Coordinator) Next: Create
extractor.bb (term extraction)
Owner: GREEN AGENT (0)
Created: 2026-01-04