Claude-skill-registry-data mcp-local-rag
Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and source naming for query_documents, ingest_file, ingest_data tools. Use this skill when working with RAG, searching documents, ingesting files, saving web content, or handling PDF, HTML, DOCX, TXT, Markdown.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mcp-local-rag" ~/.claude/skills/majiayu000-claude-skill-registry-data-mcp-local-rag && rm -rf "$T"
manifest:
data/mcp-local-rag/SKILL.mdsource content
MCP Local RAG Skills
Tools
| Tool | Use When |
|---|---|
| Local files (PDF, DOCX, TXT, MD) |
| Raw content (HTML, text) with source URL |
| Semantic + keyword hybrid search |
/ / | Management |
Search: Core Rules
Hybrid search combines vector (semantic) and keyword (BM25).
Score Interpretation
Lower = better match. Use this to filter noise.
| Score | Action |
|---|---|
| < 0.3 | Use directly |
| 0.3-0.5 | Include if mentions same concept/entity |
| > 0.5 | Skip unless no better results |
Limit Selection
| Intent | Limit |
|---|---|
| Specific answer (function, error) | 5 |
| General understanding | 10 |
| Comprehensive survey | 20 |
Query Formulation
| Situation | Why Transform | Action |
|---|---|---|
| Specific term mentioned | Keyword search needs exact match | KEEP term |
| Vague query | Vector search needs semantic signal | ADD context |
| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
| Multiple distinct topics | Single query conflates results | SPLIT queries |
| Few/poor results | Term mismatch | EXPAND (see below) |
Query Expansion
When results are few or all score > 0.5, expand query terms:
- Keep original term first, add 2-4 variants
- Types: synonyms, abbreviations, related terms, word forms
- Example:
→"config""config configuration settings configure"
Avoid over-expansion (causes topic drift).
Result Selection
When to include vs skip—based on answer quality, not just score.
INCLUDE if:
- Directly answers the question
- Provides necessary context
- Score < 0.5
SKIP if:
- Same keyword, unrelated context
- Score > 0.7
- Mentions term without explanation
Ingestion
ingest_file
ingest_file({ filePath: "/absolute/path/to/document.pdf" })
ingest_data
ingest_data({ content: "<html>...</html>", metadata: { source: "https://example.com/page", format: "html" } })
Format selection — match the data you have:
- HTML string →
format: "html" - Markdown string →
format: "markdown" - Other →
format: "text"
Source format:
- Web page → Use URL:
https://example.com/page - Other content → Use scheme:
or{type}://{date}{type}://{date}/{detail}- Examples:
,clipboard://2024-12-30chat://2024-12-30/project-discussion
- Examples:
HTML source options:
- Static page → LLM fetch
- SPA/JS-rendered → Browser MCP
- Auth required → Manual paste
Re-ingest same source to update. Use same source in
delete_file to remove.
References
For edge cases and examples:
- html-ingestion.md - URL normalization, SPA handling
- query-optimization.md - Query patterns by intent
- result-refinement.md - Contradiction resolution, chunking