Claude-skill-registry contextual-chunking
Contextual Retrieval implementation for RAG - chunks clinical notes with LLM-generated context prepended to each chunk before embedding. Improves citation accuracy by 49% per Anthropic research.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/contextual-chunking" ~/.claude/skills/majiayu000-claude-skill-registry-contextual-chunking && rm -rf "$T"
skills/data/contextual-chunking/SKILL.mdContextual Chunking Skill
Overview
This skill implements Anthropic's Contextual Retrieval pattern for RAG systems. It chunks clinical notes into fixed-size segments (1000 tokens, 200 token overlap) and generates 50-100 token contextual summaries for each chunk using Phi-4. The context is prepended to chunks before embedding, significantly improving retrieval accuracy for citation extraction.
When to Use
Use this skill when:
- Preparing clinical notes for RAG-based summarization
- Creating embeddings for ChromaDB storage
- Need to improve citation accuracy and reduce hallucinations
- Processing multi-page clinical notes for semantic search
Research Background
Anthropic Contextual Retrieval Paper: Prepending chunk-specific context improves retrieval accuracy by 49% over standard RAG. The context helps the embedding model understand each chunk's role within the larger document.
Installation
IMPORTANT: This skill has its own isolated virtual environment (
.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory cd .agent/skills/contextual-chunking uv sync # Creates .venv and installs dependencies from pyproject.toml
Dependencies are in
pyproject.toml:
- Token counting for Phi-4tiktoken
Usage
CRITICAL: Always use
uv run to execute code with this skill's .venv, NOT system Python.
Basic Chunking with Context
# From .agent/skills/contextual-chunking/ directory # Run with: uv run python -c "..." from contextual_chunking import ContextualChunker # You'll need to import ollama-client separately import sys from pathlib import Path sys.path.insert(0, str(Path(__file__).parent.parent / "ollama-client")) from ollama_client import OllamaClient # Initialize chunker = ContextualChunker( ollama_client=OllamaClient(), chunk_size=1000, # Tokens per chunk chunk_overlap=200, # Overlap between chunks (20%) context_size=75 # Context tokens (50-100 range) ) # Chunk clinical note clinical_note = "Patient presents with chest pain radiating to left arm..." enriched_chunks = chunker.chunk_with_context( document_text=clinical_note, doc_id="note_123" ) # Each enriched chunk contains: for chunk in enriched_chunks: print(f"Chunk ID: {chunk['id']}") print(f"Original text: {chunk['original_text'][:100]}...") print(f"Context: {chunk['context']}") print(f"Enriched (context + text): {chunk['enriched_text'][:150]}...") print(f"Offsets: {chunk['start_offset']}-{chunk['end_offset']}") print("---")
Integration with ChromaDB
from src.skills.chroma_client.chroma_client import ChromaClient # 1. Chunk with context enriched_chunks = chunker.chunk_with_context(clinical_note, "note_123") # 2. Store enriched chunks in ChromaDB chroma_client = ChromaClient() chroma_client.add_chunks( collection_name="clinical_note_session_456", chunks=[chunk['enriched_text'] for chunk in enriched_chunks], metadatas=[{ 'chunk_id': chunk['id'], 'start_offset': chunk['start_offset'], 'end_offset': chunk['end_offset'], 'original_text': chunk['original_text'] } for chunk in enriched_chunks], ids=[chunk['id'] for chunk in enriched_chunks] )
Context Generation Prompt
The LLM generates context with this prompt template:
Given the whole document context, provide succinct context (50-100 tokens) to situate this chunk for search retrieval purposes. Document title/type: Clinical Note Document context: [First 2000 chars of full document] Chunk to contextualize: {chunk_text} Provide ONLY the context (no explanations):
Example Output:
Context: This section describes the patient's presenting symptoms during initial triage, specifically cardiovascular complaints requiring urgent evaluation.
Chunk Structure
Each enriched chunk dictionary contains:
{ 'id': 'note_123_chunk_0', 'original_text': 'Patient presents with chest pain...', 'context': 'This section describes presenting symptoms...', 'enriched_text': 'This section describes presenting symptoms... Patient presents with chest pain...', 'start_offset': 0, 'end_offset': 1200, 'token_count': 1000 }
Configuration
Parameters:
-
: Tokens per chunk (default: 1000)chunk_size- Too small: Context fragmentation, poor retrieval
- Too large: Embedding quality degrades, slower search
-
: Token overlap (default: 200, ~20%)chunk_overlap- Prevents information loss at boundaries
- Critical for accurate citation offsets
-
: Context tokens (default: 75, range: 50-100)context_size- Balances informativeness vs token cost
- Generated by LLM for each chunk
Best Practices
- Token Counting: Use tiktoken for accurate Phi-4 token counts
- Context Quality: Verify LLM generates succinct, relevant context
- Offset Tracking: Maintain character offsets for citation extraction
- Batch Processing: Generate contexts in batches for efficiency
- Cache Contexts: Store enriched chunks to avoid regeneration
Performance Considerations
Chunking a 10-page note (5000 tokens):
- Chunks: ~5 chunks (1000 tokens each, 200 overlap)
- Context generation: 5 LLM calls (~5-10 seconds total)
- Total time: 10-15 seconds (acceptable for offline processing)
Trade-offs:
- Pro: 49% better retrieval accuracy
- Pro: Fewer hallucinations, better citations
- Con: Additional LLM inference time
- Con: Slightly higher token usage
Error Handling
- If LLM context generation fails, fall back to empty context (still functional)
- If chunk exceeds token limit, split further
- Preserve original text and offsets even if context fails
Integration with RAG Pipeline
Workflow:
- Chunk: Use this skill to create enriched chunks
- Embed: Store in ChromaDB (automatic embedding)
- Retrieve: Query ChromaDB for relevant chunks
- Extract: Use
skill to validate citationscitation-extraction - Cleanup: Clear ChromaDB collection after session
Implementation
See
contextual_chunking.py for the full Python implementation.