Claude-skill-registry-data mcp-local-rag

Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and source naming for query_documents, ingest_file, ingest_data tools. Use this skill when working with RAG, searching documents, ingesting files, saving web content, or handling PDF, HTML, DOCX, TXT, Markdown.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/mcp-local-rag" ~/.claude/skills/majiayu000-claude-skill-registry-data-mcp-local-rag && rm -rf "$T"
manifest: data/mcp-local-rag/SKILL.md
source content

MCP Local RAG Skills

Tools

ToolUse When
ingest_file
Local files (PDF, DOCX, TXT, MD)
ingest_data
Raw content (HTML, text) with source URL
query_documents
Semantic + keyword hybrid search
delete_file
/
list_files
/
status
Management

Search: Core Rules

Hybrid search combines vector (semantic) and keyword (BM25).

Score Interpretation

Lower = better match. Use this to filter noise.

ScoreAction
< 0.3Use directly
0.3-0.5Include if mentions same concept/entity
> 0.5Skip unless no better results

Limit Selection

IntentLimit
Specific answer (function, error)5
General understanding10
Comprehensive survey20

Query Formulation

SituationWhy TransformAction
Specific term mentionedKeyword search needs exact matchKEEP term
Vague queryVector search needs semantic signalADD context
Error stack or code blockLong text dilutes relevanceEXTRACT core keywords
Multiple distinct topicsSingle query conflates resultsSPLIT queries
Few/poor resultsTerm mismatchEXPAND (see below)

Query Expansion

When results are few or all score > 0.5, expand query terms:

  • Keep original term first, add 2-4 variants
  • Types: synonyms, abbreviations, related terms, word forms
  • Example:
    "config"
    "config configuration settings configure"

Avoid over-expansion (causes topic drift).

Result Selection

When to include vs skip—based on answer quality, not just score.

INCLUDE if:

  • Directly answers the question
  • Provides necessary context
  • Score < 0.5

SKIP if:

  • Same keyword, unrelated context
  • Score > 0.7
  • Mentions term without explanation

Ingestion

ingest_file

ingest_file({ filePath: "/absolute/path/to/document.pdf" })

ingest_data

ingest_data({
  content: "<html>...</html>",
  metadata: { source: "https://example.com/page", format: "html" }
})

Format selection — match the data you have:

  • HTML string →
    format: "html"
  • Markdown string →
    format: "markdown"
  • Other →
    format: "text"

Source format:

  • Web page → Use URL:
    https://example.com/page
  • Other content → Use scheme:
    {type}://{date}
    or
    {type}://{date}/{detail}
    • Examples:
      clipboard://2024-12-30
      ,
      chat://2024-12-30/project-discussion

HTML source options:

  • Static page → LLM fetch
  • SPA/JS-rendered → Browser MCP
  • Auth required → Manual paste

Re-ingest same source to update. Use same source in

delete_file
to remove.

References

For edge cases and examples: