Instar knowledge-base
Ingest URLs, documents, and transcripts into a searchable knowledge base. Query past research and curated documentation using full-text search. Trigger words: ingest, knowledge base, look up, search knowledge, what do we know about, research, index this, add to knowledge base.
install
source · Clone the upstream repo
git clone https://github.com/JKHeadley/instar
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/JKHeadley/instar "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/knowledge-base" ~/.claude/skills/jkheadley-instar-knowledge-base && rm -rf "$T"
manifest:
skills/knowledge-base/SKILL.mdsource content
knowledge-base -- Searchable Knowledge Base for Instar Agents
Build a searchable knowledge base from external sources -- URLs, documents, transcripts, PDFs. Uses the existing MemoryIndex (FTS5) for search, so no new dependencies.
How It Works
The knowledge base is a set of markdown files in
.instar/knowledge/ that MemoryIndex indexes alongside your other memory files. Each file has YAML frontmatter for metadata and is tracked in a catalog for browsing.
.instar/knowledge/ catalog.json # Registry of all ingested sources articles/ # Ingested web articles transcripts/ # Video/audio transcripts docs/ # Curated reference documentation
Ingesting Content
Via CLI
# Ingest text content directly instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents" # Ingest from a URL (fetch first, then ingest) # Step 1: Fetch the content python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md # Step 2: Ingest it instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"
Via API
curl -X POST http://localhost:4040/knowledge/ingest \ -H "Content-Type: application/json" \ -d '{ "content": "The article content...", "title": "Article Title", "url": "https://example.com/article", "type": "article", "tags": ["AI", "infrastructure"], "summary": "Brief description" }'
Via Agent Workflow
When the agent wants to ingest content during a session:
- Fetch the content (WebFetch, smart-fetch, transcript tools, or Read for local files)
- Clean it (strip navigation, ads, boilerplate)
- Call the ingest API or write the file manually:
# Write the markdown file with frontmatter cat > .instar/knowledge/articles/2026-02-25-my-article.md << 'EOF' --- title: "My Article" source: "https://example.com/article" ingested: "2026-02-25" tags: ["AI", "infrastructure"] --- # My Article [Cleaned article content here] EOF # Sync the index to pick up the new file instar memory sync
Searching Knowledge
CLI
# Search within knowledge base only instar knowledge search "notification batching" # Search all memory (including knowledge) instar memory search "notification batching"
API
# Knowledge-scoped search curl "http://localhost:4040/memory/search?q=notification+batching&source=knowledge/&limit=5" # Browse the catalog curl "http://localhost:4040/knowledge/catalog" curl "http://localhost:4040/knowledge/catalog?tag=AI"
Managing Sources
List all sources
instar knowledge list instar knowledge list --tag AI
Remove a source
# Find the source ID from the list instar knowledge list # Remove it instar knowledge remove kb_20260225123456_abc123 # Re-sync the index instar memory sync
Via API
# Remove curl -X DELETE "http://localhost:4040/knowledge/kb_20260225123456_abc123"
MemoryIndex Configuration
To enable knowledge base indexing, add these sources to your
.instar/config.json memory section:
{ "memory": { "enabled": true, "sources": [ { "path": "AGENT.md", "type": "markdown", "evergreen": true }, { "path": "USER.md", "type": "markdown", "evergreen": true }, { "path": "knowledge/articles/", "type": "markdown", "evergreen": false }, { "path": "knowledge/transcripts/", "type": "markdown", "evergreen": false }, { "path": "knowledge/docs/", "type": "markdown", "evergreen": true } ] } }
Source behavior:
andarticles/
usetranscripts/
-- recent content ranks higher (30-day temporal decay)evergreen: false
usesdocs/
-- reference documentation doesn't decayevergreen: true
Content Types
| Type | Directory | Temporal Decay | Best For |
|---|---|---|---|
| | Yes (30-day) | Web articles, blog posts, news |
| | Yes (30-day) | YouTube videos, podcasts, meetings |
| | No (evergreen) | API docs, manuals, reference material |
Tips
- Always sync after ingesting:
updates the FTS5 indexinstar memory sync - Use tags consistently: Tags enable filtered browsing via
instar knowledge list --tag X - Include source URLs: Helps trace back to original content
- Clean before ingesting: Strip navigation, ads, cookie banners for better search results
- Use smart-fetch for URLs:
gets clean markdownpython3 .claude/scripts/smart-fetch.py URL --auto