Babysitter rag-chunking-strategy
Document chunking with multiple strategies including semantic, recursive, and fixed-size chunking
install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/ai-agents-conversational/skills/rag-chunking-strategy" ~/.claude/skills/a5c-ai-babysitter-rag-chunking-strategy && rm -rf "$T"
manifest:
library/specializations/ai-agents-conversational/skills/rag-chunking-strategy/SKILL.mdsource content
RAG Chunking Strategy Skill
Capabilities
- Implement multiple document chunking strategies
- Configure semantic chunking based on content boundaries
- Set up recursive character text splitting
- Design fixed-size chunking with overlap
- Implement document-aware chunking (markdown, code, etc.)
- Optimize chunk sizes for retrieval quality
Target Processes
- rag-pipeline-implementation
- chunking-strategy-design
Implementation Details
Chunking Strategies
- RecursiveCharacterTextSplitter: Hierarchical splitting with separators
- SemanticChunker: Embedding-based semantic boundaries
- TokenTextSplitter: Token-aware splitting
- MarkdownHeaderTextSplitter: Structure-aware markdown splitting
- CodeSplitter: Language-aware code chunking
Configuration Options
- Chunk size (characters or tokens)
- Chunk overlap percentage
- Separator hierarchy
- Embedding model for semantic chunking
- Document type detection
Best Practices
- Match chunk size to embedding model limits
- Use appropriate overlap for context preservation
- Test retrieval quality with different strategies
- Consider document structure in strategy selection
Dependencies
- langchain-text-splitters
- sentence-transformers (for semantic chunking)