Developer-kit rag
Implements document chunking, embedding generation, vector storage, and retrieval pipelines for Retrieval-Augmented Generation systems. Use when building RAG applications, creating document Q&A systems, or integrating AI with knowledge bases.
install
source · Clone the upstream repo
git clone https://github.com/giuseppe-trisciuoglio/developer-kit
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/giuseppe-trisciuoglio/developer-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/developer-kit-ai/skills/rag" ~/.claude/skills/giuseppe-trisciuoglio-developer-kit-rag && rm -rf "$T"
manifest:
plugins/developer-kit-ai/skills/rag/SKILL.mdsource content
RAG Implementation
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
Overview
This skill covers: document processing, embedding generation, vector storage, retrieval configuration, and RAG pipeline implementation.
When to Use
- Building Q&A systems over proprietary documents
- Creating chatbots with factual information from knowledge bases
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded, sourced responses
- Building documentation assistants and research tools
- Enabling AI systems to access domain-specific knowledge
Instructions
Step 1: Choose Vector Database
Select based on your requirements:
| Requirement | Recommended |
|---|---|
| Production scalability | Pinecone, Milvus |
| Open-source | Weaviate, Qdrant |
| Local development | Chroma, FAISS |
| Hybrid search | Weaviate with BM25 |
Step 2: Select Embedding Model
| Use Case | Model |
|---|---|
| General purpose | text-embedding-ada-002 |
| Fast and lightweight | all-MiniLM-L6-v2 |
| Multilingual | e5-large-v2 |
| Best performance | bge-large-en-v1.5 |
Step 3: Implement Document Processing Pipeline
- Load documents from source (file system, database, API)
- Clean and preprocess (remove formatting, normalize text)
- Split documents into chunks with appropriate strategy
- Generate embeddings for each chunk
- Store embeddings in vector database with metadata
Validation: Verify embeddings were generated successfully:
List<Embedding> embeddings = embeddingModel.embedAll(segments); if (embeddings.isEmpty() || embeddings.get(0).dimension() != expectedDim) { throw new IllegalStateException("Embedding generation failed"); }
Step 4: Configure Retrieval Strategy
Choose the appropriate strategy:
- Dense Retrieval: Semantic similarity via embeddings (default for most cases)
- Hybrid Search: Dense + sparse retrieval for better coverage
- Metadata Filtering: Filter by document attributes
- Reranking: Cross-encoder reranking for high-precision requirements
Step 5: Build RAG Pipeline
- Create content retriever with your embedding store
- Configure AI service with retriever and chat memory
- Implement prompt template with context injection
- Add response validation and grounding checks
Validation: Test with known queries to verify context injection works correctly.
Error Handling: For batch ingestion, wrap in retry logic:
for (Document doc : documents) { int attempts = 0; while (attempts < 3) { try { store.add(embeddingModel.embed(doc).content(), doc.toTextSegment()); break; } catch (EmbeddingException e) { attempts++; if (attempts == 3) throw new RuntimeException("Failed after 3 retries", e); } } }
Step 6: Evaluate and Optimize
- Measure retrieval metrics: precision@k, recall@k, MRR
- Evaluate answer quality: faithfulness, relevance
- Monitor performance and user feedback
- Iterate on chunking, retrieval, and prompt parameters
Examples
Example 1: Basic Document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs"); InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>(); EmbeddingStoreIngestor.ingest(documents, store); DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class) .chatModel(chatModel) .contentRetriever(EmbeddingStoreContentRetriever.from(store)) .build(); String answer = assistant.answer("What is the company policy on remote work?");
Example 2: Metadata-Filtered Retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(store) .embeddingModel(embeddingModel) .maxResults(5) .minScore(0.7) .filter(metadataKey("category").isEqualTo("technical")) .build();
Example 3: Multi-Source RAG Pipeline
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore); ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore); List<Content> results = new ArrayList<>(); results.addAll(webRetriever.retrieve(query)); results.addAll(docRetriever.retrieve(query)); List<Content> topResults = reranker.reorder(query, results).subList(0, 5);
Example 4: RAG with Chat Memory
Assistant assistant = AiServices.builder(Assistant.class) .chatModel(chatModel) .chatMemory(MessageWindowChatMemory.withMaxMessages(10)) .contentRetriever(retriever) .build(); assistant.chat("Tell me about the product features"); assistant.chat("What about pricing for those features?"); // Maintains context
Best Practices
Document Preparation
- Clean documents before ingestion; remove irrelevant content and formatting
- Add relevant metadata for filtering and context
Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Test different sizes for your specific use case
Retrieval Optimization
- Start with high k values (10-20), then filter/rerank
- Use metadata filtering to improve relevance
- Monitor retrieval quality and iterate based on user feedback
Performance
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store indexing for your scale
Constraints and Warnings
System Constraints
- Embedding models have maximum token limits per document
- Vector databases require proper indexing for performance
- Chunk boundaries may lose context for complex documents
- Hybrid search requires additional infrastructure
Quality Warnings
- Retrieval quality depends heavily on chunking strategy
- Embedding models may not capture domain-specific semantics
- Metadata filtering requires proper document annotation
- Reranking adds latency to query responses
Security Warnings
- Never hardcode credentials: Use environment variables for API keys and passwords
- Validate external content: Documents from file systems, APIs, or web sources may contain malicious content (prompt injection)
- Apply content filtering on retrieved documents before passing to LLM
- Restrict allowed data source URLs and file paths using allowlists