Claude-skill-registry hybrid-search
Use when building search systems that need both semantic similarity and keyword matching - covers combining vector and BM25 search with Reciprocal Rank Fusion, alpha tuning for search weight control, and optimizing retrieval quality
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/hybrid-search" ~/.claude/skills/majiayu000-claude-skill-registry-hybrid-search && rm -rf "$T"
skills/data/hybrid-search/SKILL.mdLLMemory Hybrid Search
Installation
uv add llmemory # or pip install llmemory
Overview
Hybrid search combines vector similarity search (semantic understanding) with full-text search (keyword matching) to deliver superior retrieval quality. Results are merged using Reciprocal Rank Fusion (RRF) to create a unified ranking.
When to use hybrid search:
- Need both semantic similarity AND exact keyword matches
- Queries contain specific terms, names, or technical jargon
- Want best-of-both-worlds retrieval quality (recommended default)
When to use vector-only search:
- Purely semantic/conceptual queries
- Cross-lingual search
- Queries with synonyms or paraphrasing
When to use text-only search:
- Exact keyword/phrase matching required
- Search in structured data or code
- When embeddings are not available
Quick Start
from llmemory import LLMemory, SearchType async with LLMemory(connection_string="postgresql://localhost/mydb") as memory: # Hybrid search (default, recommended) results = await memory.search( owner_id="workspace-1", query_text="machine learning algorithms", search_type=SearchType.HYBRID, limit=10, alpha=0.5 # Equal weight to vector and text ) for result in results: print(f"[RRF={result.rrf_score:.3f}] {result.content[:80]}...")
Complete API Documentation
SearchType Enum
class SearchType(str, Enum): VECTOR = "vector" # Vector similarity only TEXT = "text" # Full-text search only HYBRID = "hybrid" # Combines vector + text (recommended)
search() - Hybrid Mode
Signature:
async def search( owner_id: str, query_text: str, search_type: Union[SearchType, str] = SearchType.HYBRID, limit: int = 10, alpha: float = 0.5, metadata_filter: Optional[Dict[str, Any]] = None, id_at_origins: Optional[List[str]] = None, date_from: Optional[datetime] = None, date_to: Optional[datetime] = None, include_parent_context: bool = False, context_window: int = 2 ) -> List[SearchResult]
Hybrid Search Parameters:
(SearchType, default: HYBRID): Set tosearch_type
for hybrid searchSearchType.HYBRID
(float, default: 0.5): Weight for vector vs text searchalpha
= text search only0.0
= equal weight (balanced, recommended)0.5
= vector search only1.0
= favor text search (good for keyword-heavy queries)0.3
= favor vector search (good for semantic queries)0.7
Returns:
with hybrid-specific fields:List[SearchResult]
(float): Reciprocal Rank Fusion score (primary ranking)rrf_score
(float): Vector similarity score (0-1)similarity
(float): Full-text search ranktext_rank
(float): Overall score (equals rrf_score for hybrid)score
Example:
# Balanced hybrid search results = await memory.search( owner_id="workspace-1", query_text="quarterly revenue growth", search_type=SearchType.HYBRID, alpha=0.5, # Equal weight limit=20 ) for result in results: print(f"RRF Score: {result.rrf_score:.3f}") print(f"Vector Similarity: {result.similarity:.3f}") print(f"Text Rank: {result.text_rank:.3f}") print(f"Content: {result.content[:100]}...") print("---")
Understanding Alpha Parameter
The
alpha parameter controls the balance between vector and text search in hybrid mode.
Alpha Values Guide
# Text-heavy (alpha = 0.0 to 0.3) # Use when: Query has specific keywords, names, or technical terms results = await memory.search( owner_id="workspace-1", query_text="Python asyncio gather timeout", search_type=SearchType.HYBRID, alpha=0.3 # Favor keyword matching ) # Balanced (alpha = 0.4 to 0.6) # Use when: General queries, uncertain which is better results = await memory.search( owner_id="workspace-1", query_text="customer retention strategies", search_type=SearchType.HYBRID, alpha=0.5 # Equal weight (recommended default) ) # Semantic-heavy (alpha = 0.7 to 1.0) # Use when: Conceptual queries, synonyms, paraphrasing results = await memory.search( owner_id="workspace-1", query_text="ways to keep customers happy", search_type=SearchType.HYBRID, alpha=0.7 # Favor semantic similarity )
Choosing Alpha for Different Query Types
| Query Type | Example | Recommended Alpha | Reasoning |
|---|---|---|---|
| Specific keywords | "PostgreSQL CONNECTION_LIMIT error" | 0.2-0.3 | Need exact keyword matches |
| Product/person names | "iPhone 15 Pro specifications" | 0.3-0.4 | Names matter more than semantics |
| Technical jargon | "SOLID principles dependency injection" | 0.4-0.5 | Balance needed |
| General concepts | "improve team collaboration" | 0.5-0.6 | Balanced approach |
| Semantic queries | "how to motivate employees" | 0.6-0.7 | Semantic understanding key |
| Paraphrased questions | "what are good ways to retain staff" | 0.7-0.8 | Vector search excels |
Reciprocal Rank Fusion (RRF)
Hybrid search uses RRF to merge vector and text search results into a unified ranking.
How RRF Works
k = 60 # RRF constant (prevents early results from dominating) # Initialize score accumulator for each chunk rrf_scores = {} # Process vector search results for rank, result in enumerate(vector_results): chunk_id = result["chunk_id"] vector_contribution = alpha / (k + rank + 1) rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + vector_contribution # Process text search results for rank, result in enumerate(text_results): chunk_id = result["chunk_id"] text_contribution = (1 - alpha) / (k + rank + 1) rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + text_contribution # Sort by accumulated RRF score descending sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
Key points:
- Alpha is inside the division:
, not multiplied afterwardalpha / (k + rank + 1) - Rank is 1-indexed:
where rank starts at 0rank + 1 - Chunks appearing in both result lists get contributions from both
- k = 50 by default (configurable via
)SearchConfig.rrf_k
RRF Benefits
- Handles different score scales: Vector similarities (0-1) and text ranks (varying) are normalized
- Position-based fusion: Emphasizes consensus across search methods
- Robust to score outliers: Single high score doesn't dominate
- Tunable with alpha: Control the balance between search methods
Example: RRF in Action
results = await memory.search( owner_id="workspace-1", query_text="machine learning neural networks", search_type=SearchType.HYBRID, alpha=0.5, limit=5 ) for i, result in enumerate(results, 1): print(f"Result #{i}") print(f" RRF Score: {result.rrf_score:.4f}") print(f" Vector Sim: {result.similarity:.4f} (semantic match)") print(f" Text Rank: {result.text_rank:.4f} (keyword match)") print(f" Content: {result.content[:80]}...") print() # Output shows how RRF balances both signals: # Result #1 # RRF Score: 0.0245 (highest combined score) # Vector Sim: 0.85 (very semantically similar) # Text Rank: 12.5 (good keyword match) # Content: Deep learning uses neural networks with multiple layers...
Configuring Hybrid Search with SearchConfig
LLMemory's
SearchConfig provides fine-grained control over hybrid search behavior, including HNSW vector index parameters and RRF fusion settings. You can configure these settings via environment variables or programmatically through LLMemoryConfig.
HNSW Index Configuration
The HNSW (Hierarchical Navigable Small World) index powers fast approximate nearest neighbor vector search. LLMemory provides three preset profiles and supports custom configuration.
HNSW Parameters
-
(int, default: 16): Number of bi-directional links per nodehnsw_m- Higher values = better recall, larger index, slower construction
- Range: 8-64, typical values: 8 (fast), 16 (balanced), 32 (accurate)
-
(int, default: 200): Size of dynamic candidate list during index constructionhnsw_ef_construction- Higher values = better index quality, slower construction
- Range: 100-1000, typical values: 80 (fast), 200 (balanced), 400 (accurate)
-
(int, default: 100): Size of dynamic candidate list during searchhnsw_ef_search- Higher values = better recall, slower search
- Range: 40-500, typical values: 40 (fast), 100 (balanced), 200 (accurate)
HNSW Presets
LLMemory includes three built-in presets for common use cases:
HNSW_PRESETS = { "fast": { "m": 8, "ef_construction": 80, "ef_search": 40 }, "balanced": { "m": 16, "ef_construction": 200, "ef_search": 100 }, "accurate": { "m": 32, "ef_construction": 400, "ef_search": 200 } }
Preset Recommendations:
- fast: Latency-critical applications (40-60ms search, ~95% recall)
- balanced: General-purpose use (80-120ms search, ~98% recall) - Default
- accurate: High-precision requirements (150-250ms search, ~99.5% recall)
Using HNSW Presets via Environment Variable
Set the
LLMEMORY_HNSW_PROFILE environment variable to use a preset:
# Use fast profile for low-latency applications export LLMEMORY_HNSW_PROFILE=fast # Use accurate profile for high-precision requirements export LLMEMORY_HNSW_PROFILE=accurate # Use balanced profile (default, can be omitted) export LLMEMORY_HNSW_PROFILE=balanced
Then initialize LLMemory normally - the preset will be applied automatically:
from llmemory import LLMemory, SearchType # Automatically uses HNSW preset from environment async with LLMemory(connection_string="postgresql://localhost/mydb") as memory: results = await memory.search( owner_id="workspace-1", query_text="machine learning", search_type=SearchType.HYBRID, limit=10 )
Programmatic HNSW Configuration
For more control, configure HNSW parameters programmatically:
from llmemory import LLMemory, SearchType from llmemory.config import LLMemoryConfig # Create custom configuration config = LLMemoryConfig() # Configure search parameters config.search.hnsw_ef_search = 150 # Higher search accuracy # Configure database/index parameters config.database.hnsw_m = 24 config.database.hnsw_ef_construction = 300 # Initialize with custom config async with LLMemory( connection_string="postgresql://localhost/mydb", config=config ) as memory: results = await memory.search( owner_id="workspace-1", query_text="neural networks", search_type=SearchType.HYBRID, limit=10 )
Note: Index construction parameters (
hnsw_m, hnsw_ef_construction) only affect new indexes. To apply them to an existing index, you must recreate the index:
-- Recreate HNSW index with new parameters DROP INDEX IF EXISTS llmemory.document_chunks_embedding_hnsw; CREATE INDEX document_chunks_embedding_hnsw ON llmemory.document_chunks USING hnsw (embedding vector_cosine_ops) WITH (m = 24, ef_construction = 300);
RRF Configuration
The
rrf_k parameter controls the Reciprocal Rank Fusion constant used to merge vector and text search results.
RRF Parameter
(int, default: 50): RRF constant that controls rank position sensitivityrrf_k- Higher values = less weight on top positions, more democratic fusion
- Lower values = more weight on top positions, favors high-ranking results
- Range: 10-100, typical values: 30 (aggressive), 50 (balanced), 70 (democratic)
How rrf_k affects fusion:
# For a chunk at rank position r (0-indexed): rrf_score_contribution = alpha / (rrf_k + r + 1) # Example with rrf_k=50: # Rank 0: 1.0 / (50 + 0 + 1) = 0.0196 # Rank 1: 1.0 / (50 + 1 + 1) = 0.0192 # Rank 10: 1.0 / (50 + 10 + 1) = 0.0164 # Example with rrf_k=20 (favors top results): # Rank 0: 1.0 / (20 + 0 + 1) = 0.0476 # Rank 1: 1.0 / (20 + 1 + 1) = 0.0455 # Rank 10: 1.0 / (20 + 10 + 1) = 0.0323 # Example with rrf_k=80 (more democratic): # Rank 0: 1.0 / (80 + 0 + 1) = 0.0123 # Rank 1: 1.0 / (80 + 1 + 1) = 0.0122 # Rank 10: 1.0 / (80 + 10 + 1) = 0.0110
Configuring RRF via Environment Variable
# Lower k favors top-ranked results export LLMEMORY_RRF_K=30 # Higher k gives more weight to mid-ranked results export LLMEMORY_RRF_K=70 # Default balanced setting export LLMEMORY_RRF_K=50
Note: Currently,
rrf_k is not directly exposed via environment variable. To configure it, use programmatic configuration:
from llmemory import LLMemory from llmemory.config import LLMemoryConfig config = LLMemoryConfig() config.search.rrf_k = 30 # Favor top-ranked results async with LLMemory( connection_string="postgresql://localhost/mydb", config=config ) as memory: results = await memory.search( owner_id="workspace-1", query_text="search query", search_type=SearchType.HYBRID, limit=10 )
Complete Configuration Example
Here's a complete example showing both environment variable and programmatic configuration:
import os from llmemory import LLMemory, SearchType from llmemory.config import LLMemoryConfig # Option 1: Environment variable configuration os.environ["LLMEMORY_HNSW_PROFILE"] = "accurate" # HNSW will use: m=32, ef_construction=400, ef_search=200 async with LLMemory(connection_string="postgresql://localhost/mydb") as memory: results = await memory.search( owner_id="workspace-1", query_text="deep learning transformers", search_type=SearchType.HYBRID, alpha=0.6, limit=15 ) # Option 2: Programmatic configuration with fine-tuning config = LLMemoryConfig() # HNSW search configuration config.search.hnsw_ef_search = 150 # Higher accuracy than default # HNSW index construction (for new indexes) config.database.hnsw_m = 20 config.database.hnsw_ef_construction = 250 # RRF configuration config.search.rrf_k = 40 # Favor top-ranked results slightly # Other search settings config.search.default_limit = 20 config.search.default_search_type = "hybrid" async with LLMemory( connection_string="postgresql://localhost/mydb", config=config ) as memory: # Search with custom configuration results = await memory.search( owner_id="workspace-1", query_text="neural network architectures", search_type=SearchType.HYBRID, alpha=0.5, limit=20 ) for result in results: print(f"RRF: {result.rrf_score:.4f} | " f"Vector: {result.similarity:.4f} | " f"Text: {result.text_rank:.4f}") print(f" {result.content[:80]}...")
Configuration Performance Impact
Different HNSW settings have measurable performance impacts:
| Profile | Index Size (100k docs) | Construction Time | Search Latency | Recall |
|---|---|---|---|---|
| fast | 150 MB | 5 min | 40-60ms | ~95% |
| balanced | 250 MB | 12 min | 80-120ms | ~98% |
| accurate | 450 MB | 30 min | 150-250ms | ~99.5% |
Tuning Guidelines:
- Start with balanced (default) for most applications
- Use fast if:
- Search latency must be under 100ms
- Recall around 95% is acceptable
- Index size is a constraint
- Use accurate if:
- High precision is critical (medical, legal, financial)
- Search latency under 300ms is acceptable
- Maximum recall is required
- Custom tune if:
- You have specific latency/recall requirements
- You've measured performance with your data
- You're optimizing for your embedding model
Search Type Comparison
Vector Search Only
# Pure semantic similarity results = await memory.search( owner_id="workspace-1", query_text="artificial intelligence", search_type=SearchType.VECTOR, limit=10 ) # Good for: # - "AI" matching "machine learning" (synonym) # - "dog" matching "puppy" (semantic) # - Cross-lingual search # # Weak for: # - Specific keywords ("PostgreSQL 14.2") # - Exact phrases ("return on investment") # - Technical terms ("ValueError exception")
Text Search Only
# Pure keyword matching results = await memory.search( owner_id="workspace-1", query_text="PostgreSQL CONNECTION_LIMIT", search_type=SearchType.TEXT, limit=10 ) # Good for: # - Exact keyword matches # - Technical error messages # - Code search # - Structured data # # Weak for: # - Synonyms ("automobile" vs "car") # - Paraphrasing # - Conceptual queries
Hybrid Search (Recommended)
# Combines both vector and text results = await memory.search( owner_id="workspace-1", query_text="reduce server response time", search_type=SearchType.HYBRID, alpha=0.5, limit=10 ) # Strengths: # - Finds semantically similar content ("optimize latency") # - Also finds exact keywords ("response time") # - Best overall retrieval quality # - Robust to different query styles # # Use cases: # - General-purpose search (recommended default) # - Unknown query patterns # - Mixed keyword + semantic needs
Practical Examples
E-commerce Product Search
# Product search benefits from hybrid # - Vector: Understands "laptop for programming" # - Text: Matches exact model numbers "MacBook Pro M3" results = await memory.search( owner_id="store-1", query_text="fast laptop for developers", search_type=SearchType.HYBRID, alpha=0.6, # Favor semantic understanding metadata_filter={"category": "computers"}, limit=20 )
Technical Documentation Search
# Documentation needs both semantic and exact matches # - Vector: Finds conceptually related docs # - Text: Finds exact function/class names results = await memory.search( owner_id="docs-site", query_text="authenticate users with OAuth2", search_type=SearchType.HYBRID, alpha=0.4, # Slight favor to keywords ("OAuth2") metadata_filter={"doc_type": "api_reference"}, limit=15 )
Customer Support Search
# Support tickets need semantic understanding # - Vector: Matches similar issues ("can't log in" = "login failed") # - Text: Matches error codes, product names results = await memory.search( owner_id="support-team", query_text="error code 500 payment processing", search_type=SearchType.HYBRID, alpha=0.3, # Favor exact error codes metadata_filter={"status": "resolved"}, limit=10 )
Research Paper Search
# Academic search benefits from semantic understanding # - Vector: Finds related concepts and methods # - Text: Finds exact citations, author names results = await memory.search( owner_id="research-db", query_text="transformer attention mechanism", search_type=SearchType.HYBRID, alpha=0.7, # Favor semantic similarity date_from=datetime(2020, 1, 1), # Recent papers limit=25 )
Performance Optimization
Hybrid Search Performance
Hybrid search runs vector and text searches in parallel for optimal performance:
# Both searches execute concurrently # Total time ≈ max(vector_time, text_time) + rrf_fusion_time # Typically: 50-150ms for hybrid search import time start = time.time() results = await memory.search( owner_id="workspace-1", query_text="customer retention", search_type=SearchType.HYBRID, limit=20 ) elapsed = (time.time() - start) * 1000 print(f"Search completed in {elapsed:.2f}ms")
Tuning for Speed vs Quality
# Faster hybrid search (fewer candidates) results = await memory.search( owner_id="workspace-1", query_text="query text", search_type=SearchType.HYBRID, limit=10, # Lower limit = faster alpha=0.5 ) # Higher quality hybrid search (more candidates considered) # Note: Uses internal candidate multiplier (typically limit * 2) results = await memory.search( owner_id="workspace-1", query_text="query text", search_type=SearchType.HYBRID, limit=20, # Higher limit for better recall alpha=0.5 )
Advanced Filtering with Hybrid Search
# Combine hybrid search with metadata filters results = await memory.search( owner_id="workspace-1", query_text="financial performance analysis", search_type=SearchType.HYBRID, alpha=0.5, metadata_filter={ "department": "finance", "year": 2024, "confidential": False }, date_from=datetime(2024, 1, 1), date_to=datetime(2024, 12, 31), limit=15 ) # Hybrid search finds: # - Vector: Similar financial concepts # - Text: Exact keyword "performance analysis" # - Both filtered by metadata and date range
Common Mistakes
❌ Wrong: Always using default alpha=0.5
# This works but may not be optimal results = await memory.search( owner_id="workspace-1", query_text="iPhone 14 Pro specs", # Specific product name search_type=SearchType.HYBRID, alpha=0.5 # Equal weight not ideal here )
✅ Right: Tune alpha for query type
# Product names and specific terms favor text search results = await memory.search( owner_id="workspace-1", query_text="iPhone 14 Pro specs", search_type=SearchType.HYBRID, alpha=0.3 # Favor exact keyword matching )
❌ Wrong: Using VECTOR for exact keyword matching
results = await memory.search( owner_id="workspace-1", query_text="ERROR CODE 404", search_type=SearchType.VECTOR # Won't find exact "404" )
✅ Right: Use HYBRID or TEXT for exact keywords
results = await memory.search( owner_id="workspace-1", query_text="ERROR CODE 404", search_type=SearchType.HYBRID, alpha=0.2 # Heavily favor exact keywords )
❌ Wrong: Using TEXT for conceptual queries
results = await memory.search( owner_id="workspace-1", query_text="how to improve customer satisfaction", search_type=SearchType.TEXT # Misses semantic matches )
✅ Right: Use HYBRID for conceptual queries
results = await memory.search( owner_id="workspace-1", query_text="how to improve customer satisfaction", search_type=SearchType.HYBRID, alpha=0.7 # Favor semantic understanding )
Alpha Tuning Strategies
A/B Testing Different Alpha Values
# Test different alpha values to find optimal setting query = "product launch strategy roadmap" alpha_values = [0.3, 0.5, 0.7] for alpha in alpha_values: results = await memory.search( owner_id="workspace-1", query_text=query, search_type=SearchType.HYBRID, alpha=alpha, limit=10 ) print(f"\nAlpha = {alpha}") for i, result in enumerate(results[:3], 1): print(f" #{i}: {result.content[:60]}... (RRF={result.rrf_score:.4f})") # Compare results quality and adjust
Dynamic Alpha Based on Query Analysis
def calculate_alpha(query_text: str) -> float: """Dynamically adjust alpha based on query characteristics.""" # Check for exact phrases (quotes) if '"' in query_text: return 0.2 # Favor exact matching # Check for technical terms or codes if any(char.isdigit() or char.isupper() for char in query_text.split()): return 0.3 # Favor keywords # Check for question words (semantic query) question_words = ["how", "why", "what", "when", "where", "who"] if any(word in query_text.lower() for word in question_words): return 0.7 # Favor semantic # Default balanced return 0.5 # Use dynamic alpha query = "how to optimize database queries" alpha = calculate_alpha(query) results = await memory.search( owner_id="workspace-1", query_text=query, search_type=SearchType.HYBRID, alpha=alpha, limit=10 )
Monitoring and Debugging
Understanding Result Scores
results = await memory.search( owner_id="workspace-1", query_text="test query", search_type=SearchType.HYBRID, alpha=0.5, limit=5 ) for result in results: # Inspect individual scores print(f"Chunk ID: {result.chunk_id}") print(f" RRF Score: {result.rrf_score:.4f} (overall ranking)") print(f" Vector Similarity: {result.similarity:.4f}") print(f" Text Rank: {result.text_rank:.4f}") print(f" Content preview: {result.content[:80]}...") print() # Look for: # - High RRF but low similarity = text search dominated # - High RRF but low text rank = vector search dominated # - High in both = strong consensus (best results)
Related Skills
- Core document and search operationsbasic-usage
- Query expansion for better hybrid search resultsmulti-query
- Using hybrid search in RAG systems with rerankingrag
- Multi-tenant isolation patternsmulti-tenant
Important Notes
HNSW Configuration: Hybrid search uses HNSW (Hierarchical Navigable Small World) index for fast vector similarity. Performance can be tuned with
LLMEMORY_HNSW_PROFILE environment variable or programmatically via SearchConfig. See the "Configuring Hybrid Search with SearchConfig" section for comprehensive configuration details including:
- Three presets:
,fast
(default),balancedaccurate - Individual HNSW parameters (m, ef_construction, ef_search)
- RRF tuning with
parameterrrf_k - Performance impact comparison table
Language Support: Text search automatically detects document language and uses appropriate full-text search configuration (supports 14+ languages including English, Spanish, French, German, etc.).
Embedding Models: Vector search quality depends on embedding model. Default is OpenAI
text-embedding-3-small (1536 dimensions). For local embeddings, use all-MiniLM-L6-v2 (384 dimensions).
Search Limits: Hybrid search internally retrieves
limit * 2 candidates from each search method before RRF fusion. This ensures high-quality results even when vector and text return different chunks.