Medical-research-skills knowledge-base-search
Search and locate relevant content within a local knowledge base (files, indices, or PageIndex). Use when you need verifiable citations (file + page/paragraph) to support answers from local sources.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/knowledge-base-search" ~/.claude/skills/aipoch-medical-research-skills-knowledge-base-search && rm -rf "$T"
manifest:
scientific-skills/Other/knowledge-base-search/SKILL.mdsource content
Knowledge Base Search
When to Use
- You need to find specific facts, definitions, or procedures from a local knowledge base and return the exact source location.
- You must provide traceable citations (file path + page/paragraph/section) for audit, compliance, or review.
- You need to verify the original wording of a claim in the source document (quote-level validation).
- You want to compare how multiple local documents discuss the same topic and identify differences.
- You need to assemble supporting snippets for a report, FAQ, or internal knowledge response using only local materials.
Key Features
- Supports multiple retrieval approaches: direct file search, index-based search, and PageIndex-style location mapping.
- Query strategy guidance: keyword splitting, synonym expansion, and optional filters (time range, file type, tags).
- Relevance-oriented result ranking and filtering to keep the most supportive evidence first.
- Outputs verifiable hit snippets with precise citation locations (file + page/paragraph/section when available).
- Enforces local-only boundaries: searches only within authorized directories and does not modify source content.
Dependencies
(>= 10.0.0): file path pattern matchingglob
(>= 3.11): in-file text searchinggrep- Local knowledge base index files (one or more of: filename index, content index, vector index, PageIndex mapping)
: standardized hit list output templateassets/hit_list_template.csv- Optional reference:
(output formats, checklists, inspection points)references/guide.md
Example Usage
The following example demonstrates an end-to-end local search workflow and produces a CSV hit list compatible with
assets/hit_list_template.csv.
Inputs
- Knowledge base root:
./kb/ - Query:
How do we rotate API keys? - Filters: file types
, time rangemd,pdf2024-01-01..2026-12-31
Steps
-
Confirm index and scope
- Ensure the search scope is limited to authorized paths (e.g.,
)../kb/ - Identify available indices:
- filename/content index (fast keyword search)
- vector index (semantic retrieval)
- PageIndex mapping (page/paragraph location resolution)
- Ensure the search scope is limited to authorized paths (e.g.,
-
Build the query
- Keywords:
,rotate
,API keykey rotation - Synonyms/variants:
,credential rotation
,token rotationregenerate key - Filters:
- file type:
,*.md*.pdf - time range:
(if metadata exists)2024-01-01..2026-12-31
- file type:
- Keywords:
-
Execute search (local-only)
- Path discovery (example):
glob("./kb/**/*.md")glob("./kb/**/*.pdf")
- Content search (example):
grep -RIn "API key\|key rotation\|rotate" ./kb/
- Path discovery (example):
-
Filter and rank results
- Keep hits that directly answer the question (procedure, policy, steps, constraints).
- Rank by:
- term proximity (e.g., “rotate” near “API key”)
- section relevance (e.g., “Security”, “Credentials”, “Operations”)
- coverage (hits that include prerequisites + steps + verification)
-
Output citations and hit list
- For each hit, output:
file_path
(page number for PDFs; heading/paragraph index for Markdown; PageIndex if available)location
(verbatim excerpt supporting the conclusion)snippet
(why it is relevant; any assumptions)notes
- Save as
usinghit_list.csv
columns.assets/hit_list_template.csv
- For each hit, output:
Example Output (CSV rows)
file_path,location,snippet,relevance_score,notes kb/security/credential_policy.pdf,page 12,"API keys must be rotated every 90 days... Rotation requires...",0.92,"Direct policy + rotation interval + procedure reference." kb/runbooks/api_key_rotation.md,section 'Procedure' ¶3,"To rotate an API key: (1) create a new key... (2) update services... (3) revoke old key...",0.89,"Step-by-step operational runbook." kb/audit/controls.md,heading 'Key Management' ¶2,"Evidence of rotation includes change tickets and key revocation logs...",0.81,"Provides verification/evidence requirements."
Implementation Details
Retrieval Workflow
-
Index confirmation
- Determine knowledge base root paths and last update time (if available).
- Detect which indices exist:
- filename index: quick narrowing by file names
- content index: inverted index / grep-like scanning
- vector index: semantic similarity retrieval
- PageIndex: mapping from document offsets to page/paragraph identifiers
-
Query strategy
- Tokenize the question into:
- core entities (e.g., “API key”)
- actions (e.g., “rotate”, “revoke”, “regenerate”)
- constraints (e.g., “every 90 days”, “approval required”)
- Expand with synonyms and variants.
- Apply filters when metadata exists:
- time range
- file type
- tags/collections
- Tokenize the question into:
-
Result filtering and ranking
- Remove low-signal hits (navigation, boilerplate, unrelated mentions).
- Rank by a weighted score (example):
- Keyword match (exact phrase > partial): 0.45
- Proximity (terms close together): 0.20
- Section importance (titles like “Procedure/Policy”): 0.20
- Coverage (answers include steps + constraints + verification): 0.15
- Keep the original text snippet verbatim for verification.
-
Citation and location resolution
- Markdown/text:
- use heading + paragraph index (or line range) as the primary locator
- PDF:
- use page number; optionally include bounding text around the hit
- PageIndex (if present):
- map internal offsets to stable
identifierspage/paragraph
- map internal offsets to stable
- Markdown/text:
Constraints and Limitations
- Search only within user-authorized local directories.
- Do not modify source documents.
- Do not execute scripts or arbitrary code.
- Do not access network resources or external APIs.
- If indices are missing/corrupted, fall back to direct file scanning; if scanning is not possible, report the limitation and required remediation (re-indexing).