Claude-skill-registry hybrid-search

Use when building search systems that need both semantic similarity and keyword matching - covers combining vector and BM25 search with Reciprocal Rank Fusion, alpha tuning for search weight control, and optimizing retrieval quality

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/hybrid-search" ~/.claude/skills/majiayu000-claude-skill-registry-hybrid-search && rm -rf "$T"
manifest: skills/data/hybrid-search/SKILL.md
source content

LLMemory Hybrid Search

Installation

uv add llmemory
# or
pip install llmemory

Overview

Hybrid search combines vector similarity search (semantic understanding) with full-text search (keyword matching) to deliver superior retrieval quality. Results are merged using Reciprocal Rank Fusion (RRF) to create a unified ranking.

When to use hybrid search:

  • Need both semantic similarity AND exact keyword matches
  • Queries contain specific terms, names, or technical jargon
  • Want best-of-both-worlds retrieval quality (recommended default)

When to use vector-only search:

  • Purely semantic/conceptual queries
  • Cross-lingual search
  • Queries with synonyms or paraphrasing

When to use text-only search:

  • Exact keyword/phrase matching required
  • Search in structured data or code
  • When embeddings are not available

Quick Start

from llmemory import LLMemory, SearchType

async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    # Hybrid search (default, recommended)
    results = await memory.search(
        owner_id="workspace-1",
        query_text="machine learning algorithms",
        search_type=SearchType.HYBRID,
        limit=10,
        alpha=0.5  # Equal weight to vector and text
    )

    for result in results:
        print(f"[RRF={result.rrf_score:.3f}] {result.content[:80]}...")

Complete API Documentation

SearchType Enum

class SearchType(str, Enum):
    VECTOR = "vector"   # Vector similarity only
    TEXT = "text"       # Full-text search only
    HYBRID = "hybrid"   # Combines vector + text (recommended)

search() - Hybrid Mode

Signature:

async def search(
    owner_id: str,
    query_text: str,
    search_type: Union[SearchType, str] = SearchType.HYBRID,
    limit: int = 10,
    alpha: float = 0.5,
    metadata_filter: Optional[Dict[str, Any]] = None,
    id_at_origins: Optional[List[str]] = None,
    date_from: Optional[datetime] = None,
    date_to: Optional[datetime] = None,
    include_parent_context: bool = False,
    context_window: int = 2
) -> List[SearchResult]

Hybrid Search Parameters:

  • search_type
    (SearchType, default: HYBRID): Set to
    SearchType.HYBRID
    for hybrid search
  • alpha
    (float, default: 0.5): Weight for vector vs text search
    • 0.0
      = text search only
    • 0.5
      = equal weight (balanced, recommended)
    • 1.0
      = vector search only
    • 0.3
      = favor text search (good for keyword-heavy queries)
    • 0.7
      = favor vector search (good for semantic queries)

Returns:

  • List[SearchResult]
    with hybrid-specific fields:
    • rrf_score
      (float): Reciprocal Rank Fusion score (primary ranking)
    • similarity
      (float): Vector similarity score (0-1)
    • text_rank
      (float): Full-text search rank
    • score
      (float): Overall score (equals rrf_score for hybrid)

Example:

# Balanced hybrid search
results = await memory.search(
    owner_id="workspace-1",
    query_text="quarterly revenue growth",
    search_type=SearchType.HYBRID,
    alpha=0.5,  # Equal weight
    limit=20
)

for result in results:
    print(f"RRF Score: {result.rrf_score:.3f}")
    print(f"Vector Similarity: {result.similarity:.3f}")
    print(f"Text Rank: {result.text_rank:.3f}")
    print(f"Content: {result.content[:100]}...")
    print("---")

Understanding Alpha Parameter

The

alpha
parameter controls the balance between vector and text search in hybrid mode.

Alpha Values Guide

# Text-heavy (alpha = 0.0 to 0.3)
# Use when: Query has specific keywords, names, or technical terms
results = await memory.search(
    owner_id="workspace-1",
    query_text="Python asyncio gather timeout",
    search_type=SearchType.HYBRID,
    alpha=0.3  # Favor keyword matching
)

# Balanced (alpha = 0.4 to 0.6)
# Use when: General queries, uncertain which is better
results = await memory.search(
    owner_id="workspace-1",
    query_text="customer retention strategies",
    search_type=SearchType.HYBRID,
    alpha=0.5  # Equal weight (recommended default)
)

# Semantic-heavy (alpha = 0.7 to 1.0)
# Use when: Conceptual queries, synonyms, paraphrasing
results = await memory.search(
    owner_id="workspace-1",
    query_text="ways to keep customers happy",
    search_type=SearchType.HYBRID,
    alpha=0.7  # Favor semantic similarity
)

Choosing Alpha for Different Query Types

Query TypeExampleRecommended AlphaReasoning
Specific keywords"PostgreSQL CONNECTION_LIMIT error"0.2-0.3Need exact keyword matches
Product/person names"iPhone 15 Pro specifications"0.3-0.4Names matter more than semantics
Technical jargon"SOLID principles dependency injection"0.4-0.5Balance needed
General concepts"improve team collaboration"0.5-0.6Balanced approach
Semantic queries"how to motivate employees"0.6-0.7Semantic understanding key
Paraphrased questions"what are good ways to retain staff"0.7-0.8Vector search excels

Reciprocal Rank Fusion (RRF)

Hybrid search uses RRF to merge vector and text search results into a unified ranking.

How RRF Works

k = 60  # RRF constant (prevents early results from dominating)

# Initialize score accumulator for each chunk
rrf_scores = {}

# Process vector search results
for rank, result in enumerate(vector_results):
    chunk_id = result["chunk_id"]
    vector_contribution = alpha / (k + rank + 1)
    rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + vector_contribution

# Process text search results
for rank, result in enumerate(text_results):
    chunk_id = result["chunk_id"]
    text_contribution = (1 - alpha) / (k + rank + 1)
    rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + text_contribution

# Sort by accumulated RRF score descending
sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

Key points:

  • Alpha is inside the division:
    alpha / (k + rank + 1)
    , not multiplied afterward
  • Rank is 1-indexed:
    rank + 1
    where rank starts at 0
  • Chunks appearing in both result lists get contributions from both
  • k = 50 by default (configurable via
    SearchConfig.rrf_k
    )

RRF Benefits

  1. Handles different score scales: Vector similarities (0-1) and text ranks (varying) are normalized
  2. Position-based fusion: Emphasizes consensus across search methods
  3. Robust to score outliers: Single high score doesn't dominate
  4. Tunable with alpha: Control the balance between search methods

Example: RRF in Action

results = await memory.search(
    owner_id="workspace-1",
    query_text="machine learning neural networks",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=5
)

for i, result in enumerate(results, 1):
    print(f"Result #{i}")
    print(f"  RRF Score: {result.rrf_score:.4f}")
    print(f"  Vector Sim: {result.similarity:.4f} (semantic match)")
    print(f"  Text Rank: {result.text_rank:.4f} (keyword match)")
    print(f"  Content: {result.content[:80]}...")
    print()

# Output shows how RRF balances both signals:
# Result #1
#   RRF Score: 0.0245  (highest combined score)
#   Vector Sim: 0.85   (very semantically similar)
#   Text Rank: 12.5    (good keyword match)
#   Content: Deep learning uses neural networks with multiple layers...

Configuring Hybrid Search with SearchConfig

LLMemory's

SearchConfig
provides fine-grained control over hybrid search behavior, including HNSW vector index parameters and RRF fusion settings. You can configure these settings via environment variables or programmatically through
LLMemoryConfig
.

HNSW Index Configuration

The HNSW (Hierarchical Navigable Small World) index powers fast approximate nearest neighbor vector search. LLMemory provides three preset profiles and supports custom configuration.

HNSW Parameters

  • hnsw_m
    (int, default: 16): Number of bi-directional links per node

    • Higher values = better recall, larger index, slower construction
    • Range: 8-64, typical values: 8 (fast), 16 (balanced), 32 (accurate)
  • hnsw_ef_construction
    (int, default: 200): Size of dynamic candidate list during index construction

    • Higher values = better index quality, slower construction
    • Range: 100-1000, typical values: 80 (fast), 200 (balanced), 400 (accurate)
  • hnsw_ef_search
    (int, default: 100): Size of dynamic candidate list during search

    • Higher values = better recall, slower search
    • Range: 40-500, typical values: 40 (fast), 100 (balanced), 200 (accurate)

HNSW Presets

LLMemory includes three built-in presets for common use cases:

HNSW_PRESETS = {
    "fast": {
        "m": 8,
        "ef_construction": 80,
        "ef_search": 40
    },
    "balanced": {
        "m": 16,
        "ef_construction": 200,
        "ef_search": 100
    },
    "accurate": {
        "m": 32,
        "ef_construction": 400,
        "ef_search": 200
    }
}

Preset Recommendations:

  • fast: Latency-critical applications (40-60ms search, ~95% recall)
  • balanced: General-purpose use (80-120ms search, ~98% recall) - Default
  • accurate: High-precision requirements (150-250ms search, ~99.5% recall)

Using HNSW Presets via Environment Variable

Set the

LLMEMORY_HNSW_PROFILE
environment variable to use a preset:

# Use fast profile for low-latency applications
export LLMEMORY_HNSW_PROFILE=fast

# Use accurate profile for high-precision requirements
export LLMEMORY_HNSW_PROFILE=accurate

# Use balanced profile (default, can be omitted)
export LLMEMORY_HNSW_PROFILE=balanced

Then initialize LLMemory normally - the preset will be applied automatically:

from llmemory import LLMemory, SearchType

# Automatically uses HNSW preset from environment
async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="machine learning",
        search_type=SearchType.HYBRID,
        limit=10
    )

Programmatic HNSW Configuration

For more control, configure HNSW parameters programmatically:

from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig

# Create custom configuration
config = LLMemoryConfig()

# Configure search parameters
config.search.hnsw_ef_search = 150  # Higher search accuracy

# Configure database/index parameters
config.database.hnsw_m = 24
config.database.hnsw_ef_construction = 300

# Initialize with custom config
async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="neural networks",
        search_type=SearchType.HYBRID,
        limit=10
    )

Note: Index construction parameters (

hnsw_m
,
hnsw_ef_construction
) only affect new indexes. To apply them to an existing index, you must recreate the index:

-- Recreate HNSW index with new parameters
DROP INDEX IF EXISTS llmemory.document_chunks_embedding_hnsw;
CREATE INDEX document_chunks_embedding_hnsw
ON llmemory.document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 300);

RRF Configuration

The

rrf_k
parameter controls the Reciprocal Rank Fusion constant used to merge vector and text search results.

RRF Parameter

  • rrf_k
    (int, default: 50): RRF constant that controls rank position sensitivity
    • Higher values = less weight on top positions, more democratic fusion
    • Lower values = more weight on top positions, favors high-ranking results
    • Range: 10-100, typical values: 30 (aggressive), 50 (balanced), 70 (democratic)

How rrf_k affects fusion:

# For a chunk at rank position r (0-indexed):
rrf_score_contribution = alpha / (rrf_k + r + 1)

# Example with rrf_k=50:
# Rank 0: 1.0 / (50 + 0 + 1) = 0.0196
# Rank 1: 1.0 / (50 + 1 + 1) = 0.0192
# Rank 10: 1.0 / (50 + 10 + 1) = 0.0164

# Example with rrf_k=20 (favors top results):
# Rank 0: 1.0 / (20 + 0 + 1) = 0.0476
# Rank 1: 1.0 / (20 + 1 + 1) = 0.0455
# Rank 10: 1.0 / (20 + 10 + 1) = 0.0323

# Example with rrf_k=80 (more democratic):
# Rank 0: 1.0 / (80 + 0 + 1) = 0.0123
# Rank 1: 1.0 / (80 + 1 + 1) = 0.0122
# Rank 10: 1.0 / (80 + 10 + 1) = 0.0110

Configuring RRF via Environment Variable

# Lower k favors top-ranked results
export LLMEMORY_RRF_K=30

# Higher k gives more weight to mid-ranked results
export LLMEMORY_RRF_K=70

# Default balanced setting
export LLMEMORY_RRF_K=50

Note: Currently,

rrf_k
is not directly exposed via environment variable. To configure it, use programmatic configuration:

from llmemory import LLMemory
from llmemory.config import LLMemoryConfig

config = LLMemoryConfig()
config.search.rrf_k = 30  # Favor top-ranked results

async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="search query",
        search_type=SearchType.HYBRID,
        limit=10
    )

Complete Configuration Example

Here's a complete example showing both environment variable and programmatic configuration:

import os
from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig

# Option 1: Environment variable configuration
os.environ["LLMEMORY_HNSW_PROFILE"] = "accurate"
# HNSW will use: m=32, ef_construction=400, ef_search=200

async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="deep learning transformers",
        search_type=SearchType.HYBRID,
        alpha=0.6,
        limit=15
    )

# Option 2: Programmatic configuration with fine-tuning
config = LLMemoryConfig()

# HNSW search configuration
config.search.hnsw_ef_search = 150  # Higher accuracy than default

# HNSW index construction (for new indexes)
config.database.hnsw_m = 20
config.database.hnsw_ef_construction = 250

# RRF configuration
config.search.rrf_k = 40  # Favor top-ranked results slightly

# Other search settings
config.search.default_limit = 20
config.search.default_search_type = "hybrid"

async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    # Search with custom configuration
    results = await memory.search(
        owner_id="workspace-1",
        query_text="neural network architectures",
        search_type=SearchType.HYBRID,
        alpha=0.5,
        limit=20
    )

    for result in results:
        print(f"RRF: {result.rrf_score:.4f} | "
              f"Vector: {result.similarity:.4f} | "
              f"Text: {result.text_rank:.4f}")
        print(f"  {result.content[:80]}...")

Configuration Performance Impact

Different HNSW settings have measurable performance impacts:

ProfileIndex Size (100k docs)Construction TimeSearch LatencyRecall
fast150 MB5 min40-60ms~95%
balanced250 MB12 min80-120ms~98%
accurate450 MB30 min150-250ms~99.5%

Tuning Guidelines:

  1. Start with balanced (default) for most applications
  2. Use fast if:
    • Search latency must be under 100ms
    • Recall around 95% is acceptable
    • Index size is a constraint
  3. Use accurate if:
    • High precision is critical (medical, legal, financial)
    • Search latency under 300ms is acceptable
    • Maximum recall is required
  4. Custom tune if:
    • You have specific latency/recall requirements
    • You've measured performance with your data
    • You're optimizing for your embedding model

Search Type Comparison

Vector Search Only

# Pure semantic similarity
results = await memory.search(
    owner_id="workspace-1",
    query_text="artificial intelligence",
    search_type=SearchType.VECTOR,
    limit=10
)

# Good for:
# - "AI" matching "machine learning" (synonym)
# - "dog" matching "puppy" (semantic)
# - Cross-lingual search
#
# Weak for:
# - Specific keywords ("PostgreSQL 14.2")
# - Exact phrases ("return on investment")
# - Technical terms ("ValueError exception")

Text Search Only

# Pure keyword matching
results = await memory.search(
    owner_id="workspace-1",
    query_text="PostgreSQL CONNECTION_LIMIT",
    search_type=SearchType.TEXT,
    limit=10
)

# Good for:
# - Exact keyword matches
# - Technical error messages
# - Code search
# - Structured data
#
# Weak for:
# - Synonyms ("automobile" vs "car")
# - Paraphrasing
# - Conceptual queries

Hybrid Search (Recommended)

# Combines both vector and text
results = await memory.search(
    owner_id="workspace-1",
    query_text="reduce server response time",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=10
)

# Strengths:
# - Finds semantically similar content ("optimize latency")
# - Also finds exact keywords ("response time")
# - Best overall retrieval quality
# - Robust to different query styles
#
# Use cases:
# - General-purpose search (recommended default)
# - Unknown query patterns
# - Mixed keyword + semantic needs

Practical Examples

E-commerce Product Search

# Product search benefits from hybrid
# - Vector: Understands "laptop for programming"
# - Text: Matches exact model numbers "MacBook Pro M3"

results = await memory.search(
    owner_id="store-1",
    query_text="fast laptop for developers",
    search_type=SearchType.HYBRID,
    alpha=0.6,  # Favor semantic understanding
    metadata_filter={"category": "computers"},
    limit=20
)

Technical Documentation Search

# Documentation needs both semantic and exact matches
# - Vector: Finds conceptually related docs
# - Text: Finds exact function/class names

results = await memory.search(
    owner_id="docs-site",
    query_text="authenticate users with OAuth2",
    search_type=SearchType.HYBRID,
    alpha=0.4,  # Slight favor to keywords ("OAuth2")
    metadata_filter={"doc_type": "api_reference"},
    limit=15
)

Customer Support Search

# Support tickets need semantic understanding
# - Vector: Matches similar issues ("can't log in" = "login failed")
# - Text: Matches error codes, product names

results = await memory.search(
    owner_id="support-team",
    query_text="error code 500 payment processing",
    search_type=SearchType.HYBRID,
    alpha=0.3,  # Favor exact error codes
    metadata_filter={"status": "resolved"},
    limit=10
)

Research Paper Search

# Academic search benefits from semantic understanding
# - Vector: Finds related concepts and methods
# - Text: Finds exact citations, author names

results = await memory.search(
    owner_id="research-db",
    query_text="transformer attention mechanism",
    search_type=SearchType.HYBRID,
    alpha=0.7,  # Favor semantic similarity
    date_from=datetime(2020, 1, 1),  # Recent papers
    limit=25
)

Performance Optimization

Hybrid Search Performance

Hybrid search runs vector and text searches in parallel for optimal performance:

# Both searches execute concurrently
# Total time ≈ max(vector_time, text_time) + rrf_fusion_time
# Typically: 50-150ms for hybrid search

import time

start = time.time()
results = await memory.search(
    owner_id="workspace-1",
    query_text="customer retention",
    search_type=SearchType.HYBRID,
    limit=20
)
elapsed = (time.time() - start) * 1000
print(f"Search completed in {elapsed:.2f}ms")

Tuning for Speed vs Quality

# Faster hybrid search (fewer candidates)
results = await memory.search(
    owner_id="workspace-1",
    query_text="query text",
    search_type=SearchType.HYBRID,
    limit=10,  # Lower limit = faster
    alpha=0.5
)

# Higher quality hybrid search (more candidates considered)
# Note: Uses internal candidate multiplier (typically limit * 2)
results = await memory.search(
    owner_id="workspace-1",
    query_text="query text",
    search_type=SearchType.HYBRID,
    limit=20,  # Higher limit for better recall
    alpha=0.5
)

Advanced Filtering with Hybrid Search

# Combine hybrid search with metadata filters
results = await memory.search(
    owner_id="workspace-1",
    query_text="financial performance analysis",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    metadata_filter={
        "department": "finance",
        "year": 2024,
        "confidential": False
    },
    date_from=datetime(2024, 1, 1),
    date_to=datetime(2024, 12, 31),
    limit=15
)

# Hybrid search finds:
# - Vector: Similar financial concepts
# - Text: Exact keyword "performance analysis"
# - Both filtered by metadata and date range

Common Mistakes

Wrong: Always using default alpha=0.5

# This works but may not be optimal
results = await memory.search(
    owner_id="workspace-1",
    query_text="iPhone 14 Pro specs",  # Specific product name
    search_type=SearchType.HYBRID,
    alpha=0.5  # Equal weight not ideal here
)

Right: Tune alpha for query type

# Product names and specific terms favor text search
results = await memory.search(
    owner_id="workspace-1",
    query_text="iPhone 14 Pro specs",
    search_type=SearchType.HYBRID,
    alpha=0.3  # Favor exact keyword matching
)

Wrong: Using VECTOR for exact keyword matching

results = await memory.search(
    owner_id="workspace-1",
    query_text="ERROR CODE 404",
    search_type=SearchType.VECTOR  # Won't find exact "404"
)

Right: Use HYBRID or TEXT for exact keywords

results = await memory.search(
    owner_id="workspace-1",
    query_text="ERROR CODE 404",
    search_type=SearchType.HYBRID,
    alpha=0.2  # Heavily favor exact keywords
)

Wrong: Using TEXT for conceptual queries

results = await memory.search(
    owner_id="workspace-1",
    query_text="how to improve customer satisfaction",
    search_type=SearchType.TEXT  # Misses semantic matches
)

Right: Use HYBRID for conceptual queries

results = await memory.search(
    owner_id="workspace-1",
    query_text="how to improve customer satisfaction",
    search_type=SearchType.HYBRID,
    alpha=0.7  # Favor semantic understanding
)

Alpha Tuning Strategies

A/B Testing Different Alpha Values

# Test different alpha values to find optimal setting
query = "product launch strategy roadmap"
alpha_values = [0.3, 0.5, 0.7]

for alpha in alpha_values:
    results = await memory.search(
        owner_id="workspace-1",
        query_text=query,
        search_type=SearchType.HYBRID,
        alpha=alpha,
        limit=10
    )

    print(f"\nAlpha = {alpha}")
    for i, result in enumerate(results[:3], 1):
        print(f"  #{i}: {result.content[:60]}... (RRF={result.rrf_score:.4f})")
    # Compare results quality and adjust

Dynamic Alpha Based on Query Analysis

def calculate_alpha(query_text: str) -> float:
    """Dynamically adjust alpha based on query characteristics."""
    # Check for exact phrases (quotes)
    if '"' in query_text:
        return 0.2  # Favor exact matching

    # Check for technical terms or codes
    if any(char.isdigit() or char.isupper() for char in query_text.split()):
        return 0.3  # Favor keywords

    # Check for question words (semantic query)
    question_words = ["how", "why", "what", "when", "where", "who"]
    if any(word in query_text.lower() for word in question_words):
        return 0.7  # Favor semantic

    # Default balanced
    return 0.5

# Use dynamic alpha
query = "how to optimize database queries"
alpha = calculate_alpha(query)

results = await memory.search(
    owner_id="workspace-1",
    query_text=query,
    search_type=SearchType.HYBRID,
    alpha=alpha,
    limit=10
)

Monitoring and Debugging

Understanding Result Scores

results = await memory.search(
    owner_id="workspace-1",
    query_text="test query",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=5
)

for result in results:
    # Inspect individual scores
    print(f"Chunk ID: {result.chunk_id}")
    print(f"  RRF Score: {result.rrf_score:.4f} (overall ranking)")
    print(f"  Vector Similarity: {result.similarity:.4f}")
    print(f"  Text Rank: {result.text_rank:.4f}")
    print(f"  Content preview: {result.content[:80]}...")
    print()

# Look for:
# - High RRF but low similarity = text search dominated
# - High RRF but low text rank = vector search dominated
# - High in both = strong consensus (best results)

Related Skills

  • basic-usage
    - Core document and search operations
  • multi-query
    - Query expansion for better hybrid search results
  • rag
    - Using hybrid search in RAG systems with reranking
  • multi-tenant
    - Multi-tenant isolation patterns

Important Notes

HNSW Configuration: Hybrid search uses HNSW (Hierarchical Navigable Small World) index for fast vector similarity. Performance can be tuned with

LLMEMORY_HNSW_PROFILE
environment variable or programmatically via
SearchConfig
. See the "Configuring Hybrid Search with SearchConfig" section for comprehensive configuration details including:

  • Three presets:
    fast
    ,
    balanced
    (default),
    accurate
  • Individual HNSW parameters (m, ef_construction, ef_search)
  • RRF tuning with
    rrf_k
    parameter
  • Performance impact comparison table

Language Support: Text search automatically detects document language and uses appropriate full-text search configuration (supports 14+ languages including English, Spanish, French, German, etc.).

Embedding Models: Vector search quality depends on embedding model. Default is OpenAI

text-embedding-3-small
(1536 dimensions). For local embeddings, use
all-MiniLM-L6-v2
(384 dimensions).

Search Limits: Hybrid search internally retrieves

limit * 2
candidates from each search method before RRF fusion. This ensures high-quality results even when vector and text return different chunks.