Claude-skill-registry doc-search

Token-efficient documentation search using Serena Document Index. 90%+ token savings vs reading full files. Use BEFORE reading README.md or docs/ files. Triggers on architecture questions, pattern lookups, and project-specific documentation needs.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/doc-search" ~/.claude/skills/majiayu000-claude-skill-registry-doc-search && rm -rf "$T"

manifest: skills/data/doc-search/SKILL.md

Document Search

Search project documentation efficiently using the Serena Document Index system.

Why This Matters

Approach	Tokens	Use Case
Read full README.md	3000-8000	Never (wasteful)
Read docs/*.md	2000-5000 each	Rarely needed
Document Index Search	100-500	Always prefer
Section Retrieval	200-800	After finding relevant section

Rule: Never read documentation files until the document index fails to answer.

Document Index Location

.serena/cache/documents/document_index.json

Index Types Available:

```
tag_index
```
- Search by tags (architecture, api, testing, etc.)
```
title_index
```
- Search by section titles
```
project_index
```
- Filter by project (basecamp-server, interface-cli, etc.)
```
doc_type_index
```
- Filter by document type (readme, guide, api-reference, etc.)
```
content_index
```
- Keyword-based content search

Workflow Pattern

Step 1: Search Document Index (Python CLI)

# Search for relevant documentation sections
cd /Users/kun/github/1ambda/dataops-platform
python3 scripts/serena/document_indexer.py --search "hexagonal architecture" --max-results 5

Step 2: Read Specific Section Only

After finding relevant section from search:

# Use section coordinates from search result
# Example: project-basecamp-server/docs/PATTERNS.md#module-placement-rules
# Read only that section (lines 45-80) instead of entire file
Read(file_path="project-basecamp-server/docs/PATTERNS.md", offset=45, limit=35)

Step 3: Alternative - Direct JSON Query

# For programmatic access in agent workflows
import json
from pathlib import Path

cache_path = Path(".serena/cache/documents/document_index.json")
index = json.loads(cache_path.read_text())

# Search by tag
architecture_docs = index['tag_index'].get('architecture', [])

# Search by project
server_docs = index['project_index'].get('project-basecamp-server', [])

# Get section content
for ref in architecture_docs[:3]:
    print(f"Section: {ref['section_title']}")
    print(f"File: {ref['relative_path']}")
    print(f"Lines: {ref['line_start']}-{ref['line_end']}")

Decision Tree

Need documentation?
|
+-- What patterns exist for X?
|   +-- doc-search: tag_index["patterns"] or tag_index["architecture"]
|
+-- How to implement feature in project Y?
|   +-- doc-search: project_index["project-Y"] + tag_index["implementation"]
|
+-- What does README say about Z?
|   +-- doc-search: title_index["Z"] or content_index["keyword"]
|
+-- Full context needed?
    +-- Read specific section (lines from search result)
    +-- LAST RESORT: Read full file

Integration with mcp-efficiency

Document search is the first step before Serena symbol queries:

# 1. Search docs for patterns/context
doc_search("hexagonal architecture", max_results=3)

# 2. Use Serena for code structure
serena.get_symbols_overview("module-core-domain/")

# 3. Find specific symbols
serena.find_symbol("RepositoryJpa", depth=1)

Common Search Queries

Need	Search Query
Architecture patterns	`"hexagonal" OR "architecture"`
API endpoints	`"api" OR "endpoint" OR "controller"`
Testing patterns	`"test" OR "testing" OR "fixture"`
Entity relationships	`"entity" OR "repository" OR "jpa"`
CLI commands	`"command" OR "cli" OR "dli"`
Configuration	`"config" OR "environment" OR "settings"`

Token Savings Examples

Task	Without Doc Search	With Doc Search	Savings
Find architecture pattern	5000 tokens (full PATTERNS.md)	300 tokens	94%
Check entity rules	3000 tokens (full README)	400 tokens	87%
Find API reference	4000 tokens (full docs)	250 tokens	94%
Implementation guide	6000 tokens (multiple files)	500 tokens	92%

Updating the Index

# Rebuild after documentation changes
python3 scripts/serena/update-symbols.py --with-docs

# Incremental update (changed files only)
python3 scripts/serena/update-symbols.py --changed-only --with-docs

# Full rebuild
python3 scripts/serena/document_indexer.py --project-root . --rebuild

Anti-Patterns

Anti-Pattern	Problem	Solution
Read full README.md first	3000+ tokens wasted	Search index, read section
Read all docs/*.md	10000+ tokens wasted	Search by tag/title
Skip doc search, use web	Slower, less relevant	Use indexed local docs
Guess file locations	Miss relevant docs	Use project_index filter

Quick Reference

# CLI search (recommended)
python3 scripts/serena/document_indexer.py --search "QUERY" --max-results 5

# Build/rebuild index
python3 scripts/serena/update-symbols.py --with-docs

# Check index stats
python3 -c "import json; d=json.load(open('.serena/cache/documents/document_index.json')); print(d['metadata'])"