Claude-skill-registry implementing-rag
Set up RAG pipelines with document chunking, embedding generation, and retrieval strategies using LlamaIndex. Use when building new RAG systems, choosing chunking approaches, selecting embedding models, or implementing vector/hybrid retrieval for src/ or src-iLand/ pipelines.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/implementing-rag" ~/.claude/skills/majiayu000-claude-skill-registry-implementing-rag && rm -rf "$T"
skills/data/implementing-rag/SKILL.mdImplementing RAG Systems
Quick-start guide for building production-grade RAG pipelines with LlamaIndex. This skill helps you set up the foundational components: document processing, embedding generation, and retrieval.
When to Use This Skill
- Building a new RAG pipeline from scratch
- Choosing optimal chunking strategy for your documents
- Selecting and configuring embedding models
- Implementing retrieval (vector, BM25, or hybrid search)
- Optimizing for specific use cases (Thai language, legal documents, product catalogs)
- Integrating with existing pipelines in
orsrc/src-iLand/
Quick Decision Guide
Chunking Strategy
- Legal/Technical docs (like Thai land deeds) → 512-1024 tokens
- Narrative content → 1024-2048 tokens
- Q&A pairs → 256-512 tokens
- Rule of thumb: When halving chunk size, double
similarity_top_k
Embedding Model Selection
- Cost-sensitive → HuggingFace bge-base-en-v1.5 with ONNX (3-7x faster, free)
- Quality-first → OpenAI text-embedding-3-small or JinaAI-base
- Multi-language (Thai) → Cohere embed-multilingual-v3.0
- Balanced → OpenAI or bge-large (open-source)
Retrieval Strategy
- Simple queries → Vector search
- Keyword-heavy → BM25
- Production systems → Hybrid (vector + BM25)
- Structured data → Metadata filtering
- Large doc sets (100+) → Document summary or recursive retrieval
Quick Start Patterns
Pattern 1: Standard RAG Setup (Vector Search)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings from llama_index.core.node_parser import SentenceSplitter from llama_index.embeddings.openai import OpenAIEmbedding # Configure chunking Settings.chunk_size = 1024 Settings.chunk_overlap = 20 # Configure embeddings Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small", embed_batch_size=100 ) # Load and index documents documents = SimpleDirectoryReader("./data").load_data() index = VectorStoreIndex.from_documents(documents) # Create query engine query_engine = index.as_query_engine(similarity_top_k=5) response = query_engine.query("Your question here")
Pattern 2: Multi-Language Setup (Thai Support)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.embeddings.cohere import CohereEmbedding from llama_index.core.node_parser import SentenceSplitter # Use multilingual embeddings embed_model = CohereEmbedding( model_name="embed-multilingual-v3.0", api_key="YOUR_COHERE_API_KEY" ) # Configure chunking for Thai text node_parser = SentenceSplitter( chunk_size=1024, chunk_overlap=50 ) # Load Thai documents documents = SimpleDirectoryReader("./data").load_data() # Create index with multilingual support index = VectorStoreIndex.from_documents( documents, embed_model=embed_model, transformations=[node_parser] ) query_engine = index.as_query_engine(similarity_top_k=5)
Pattern 3: Hybrid Search (Production-Ready)
from llama_index.core import VectorStoreIndex from llama_index.retrievers.bm25 import BM25Retriever from llama_index.core.retrievers import QueryFusionRetriever import Stemmer # Create vector index index = VectorStoreIndex.from_documents(documents) vector_retriever = index.as_retriever(similarity_top_k=10) # Create BM25 retriever bm25_retriever = BM25Retriever.from_defaults( docstore=index.docstore, similarity_top_k=10, stemmer=Stemmer.Stemmer("english") ) # Combine with query fusion hybrid_retriever = QueryFusionRetriever( [vector_retriever, bm25_retriever], similarity_top_k=5, mode="reciprocal_rerank", use_async=True ) # Use in query engine from llama_index.core.query_engine import RetrieverQueryEngine query_engine = RetrieverQueryEngine.from_args(hybrid_retriever)
Pattern 4: Cost-Optimized Local Embeddings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings # Use local model with ONNX acceleration (3-7x faster) Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-base-en-v1.5", backend="onnx" # Requires: pip install optimum[onnxruntime] ) # Rest of the pipeline remains the same index = VectorStoreIndex.from_documents(documents)
Your Codebase Integration
For src/
Pipeline
src/Document Preprocessing (
src/02_prep_doc_for_embedding.py):
- Add configurable chunking strategy
- Support multiple node parser types
Batch Embeddings (
src/09_enhanced_batch_embeddings.py):
- Increase
to 100 for faster processingembed_batch_size - Consider local models to eliminate API costs
Basic Retrieval (
src/10_basic_query_engine.py):
- Baseline vector search implementation
- Adjust
based on chunk sizesimilarity_top_k
For src-iLand/
Pipeline (Thai Land Deeds)
src-iLand/Data Processing (
src-iLand/data_processing/):
- Current chunk size: 1024 tokens (good for legal documents)
- Consider testing 512 with increased
for better precisiontop_k
Embeddings (
src-iLand/docs_embedding/):
- Current: OpenAI text-embedding-3-small
- Test: Cohere embed-multilingual-v3.0 for better Thai understanding
Retrieval (
src-iLand/retrieval/retrievers/):
- Vector strategy: Baseline semantic search
- Hybrid strategy: Combines vector + BM25 for Thai queries
- Metadata strategy: Fast filtering (<50ms) on Thai geographic/legal metadata
Detailed References
Load these reference files when you need comprehensive details:
-
reference-chunking.md: Complete chunking strategies guide
- Node parser types and configurations
- Chunk size optimization for different domains
- Sentence-window and hierarchical patterns
- Metadata preservation
-
reference-embeddings.md: Embedding model selection and optimization
- All supported models (OpenAI, Cohere, HuggingFace, JinaAI, Voyage)
- Performance benchmarks (hit rate, MRR)
- ONNX/OpenVINO optimization details
- Multi-language support
- Cost optimization strategies
-
reference-retrieval-basics.md: Core retrieval patterns
- Vector retrieval implementation
- BM25 keyword search
- Hybrid search (vector + BM25)
- Metadata filtering
- Configuration parameters and tuning
Common Workflows
Workflow 1: New RAG Pipeline Setup
-
Step 1: Choose chunking strategy based on document type
- Load
for detailed guidancereference-chunking.md - Test with sample documents
- Load
-
Step 2: Select embedding model
- Load
for model comparisonreference-embeddings.md - Consider: cost, latency, language support
- For Thai: use multilingual models
- Load
-
Step 3: Implement basic vector retrieval
- Start with simple VectorStoreIndex
- Test with sample queries
- Measure baseline performance
-
Step 4: Optimize chunk size and top_k
- Adjust based on query results
- Balance precision vs recall
-
Step 5: (Optional) Upgrade to hybrid search
- Combine vector + BM25 for production quality
Workflow 2: Migrating from API to Local Embeddings
-
Step 1: Choose local model (see
)reference-embeddings.md- Recommended: BAAI/bge-base-en-v1.5
-
Step 2: Install optimization backend
pip install optimum[onnxruntime] -
Step 3: Update embedding configuration
Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-base-en-v1.5", backend="onnx" ) -
Step 4: Re-index all documents
- CRITICAL: Must re-embed with new model
- Use caching to avoid redundant work
-
Step 5: Validate retrieval quality
- Compare results with API embeddings
- Measure latency improvement (3-7x faster expected)
Workflow 3: Thai Language RAG Setup
-
Step 1: Use multilingual embedding model
embed_model = CohereEmbedding( model_name="embed-multilingual-v3.0" ) -
Step 2: Configure appropriate chunking
- Test 512-1024 tokens for Thai text
- Preserve Thai metadata
-
Step 3: Implement hybrid search
- Vector for semantic understanding
- BM25 for exact Thai keyword matching
-
Step 4: Add metadata filtering
- จังหวัด (province), อำเภอ (district), ประเภท (type)
- Enable fast filtering (<50ms)
-
Step 5: Test with Thai queries
- Validate Unicode handling
- Check retrieval relevance
Key Reminders
Critical Requirements:
- Re-indexing: When changing embedding models, MUST re-index all data
- Model consistency: Identical models for indexing and querying
- Chunk size + top_k: Halving chunk size → double
similarity_top_k
Best Practices:
- Start simple (vector search) → Add complexity as needed (hybrid, metadata)
- Test with representative queries before full indexing
- Use local models for cost optimization at scale
- Enable parallel loading (
) for 13x speedupnum_workers=10
Performance Tips:
- Batch size: Set
for API callsembed_batch_size=100 - Parallel loading: Use
SimpleDirectoryReader(...).load_data(num_workers=10) - Caching: Enable ingestion pipeline caching for faster re-runs
Next Steps
After implementing basic RAG:
- Optimize: Use
skill for reranking, caching, production deploymentoptimizing-rag - Evaluate: Use
skill to measure hit rate, MRR, and compare strategiesevaluating-rag