Skillshub search-engine-setup

Search Engine Setup

install

source · Clone the upstream repo

git clone https://github.com/ComeOnOliver/skillshub

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/search-engine-setup" ~/.claude/skills/comeonoliver-skillshub-search-engine-setup && rm -rf "$T"

manifest: skills/TerminalSkills/skills/search-engine-setup/SKILL.md

source content

Search Engine Setup

Overview

This skill helps AI agents implement production-quality search in applications. It covers index design with custom analyzers, database-to-index sync pipelines, search APIs with faceting and highlights, autocomplete, and relevance tuning based on real query data.

Instructions

Index Design (Elasticsearch)

Map source database columns to Elasticsearch field types:
- Text columns users search →
```
text
```
  with custom analyzer
- Enum/category columns for filtering →
```
keyword
```
- Numeric columns for range filters →
```
integer
```
  ,
```
float
```
- Boolean flags →
```
boolean
```
- Dates →
```
date
```
- Fields for autocomplete →
```
completion
```

Custom analyzer template for product/content search:

{
  "analyzer": {
    "content_analyzer": {
      "tokenizer": "standard",
      "filter": ["lowercase", "synonym_filter", "edge_ngram_filter"]
    }
  },
  "filter": {
    "synonym_filter": { "type": "synonym", "synonyms_path": "synonyms.txt" },
    "edge_ngram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 15 }
  }
}

Boost fields by search importance: title/name (3-5x), tags (2x), description (1x).
Always add a
```
suggest
```
field of type
```
completion
```
for typeahead.

Index Design (Algolia)

Set

searchableAttributes

in priority order:

["name", "category", "description"]

Set
```
attributesForFaceting
```
: prefix filterable attributes with
```
filterOnly()
```
for non-displayed facets.

Configure

customRanking

["desc(popularity)", "desc(rating)"]

Enable typo tolerance (on by default) and set
```
minWordSizefor1Typo: 3
```
.

Sync Pipeline

Full re-index: On first run or manual trigger, paginate through all source records (1000 per batch), transform to index documents, bulk insert.
Incremental sync: Poll
```
updated_at > last_sync_time
```
every 10 seconds, or use database triggers/CDC.
Deletions: Track soft-deleted records. Remove from index when detected.
Idempotency: Use source record ID as document ID. Upsert, never blind insert.
Error handling: Log failed documents, continue batch. Retry failures in next cycle.

Search API

Build an endpoint that accepts:

```
q
```
— full-text query string

Filter params —

category

brand

min_price

max_price

rating

in_stock

sort

—

relevance

(default),

price_asc

price_desc

newest

rating

```
page
```
/
```
per_page
```
or cursor-based pagination

Query construction (Elasticsearch):

{
  "query": {
    "bool": {
      "must": [{ "multi_match": { "query": "q", "fields": ["name^5", "description"], "fuzziness": "AUTO" }}],
      "filter": [
        { "term": { "category": "electronics" }},
        { "range": { "price_cents": { "gte": 2000, "lte": 10000 }}},
        { "term": { "in_stock": true }}
      ],
      "should": [{ "term": { "in_stock": { "value": true, "boost": 2 }}}]
    }
  },
  "highlight": { "fields": { "name": {}, "description": {} }},
  "aggs": {
    "categories": { "terms": { "field": "category", "size": 20 }},
    "brands":     { "terms": { "field": "brand", "size": 20 }},
    "price_ranges": { "range": { "field": "price_cents", "ranges": [
      { "to": 2500 }, { "from": 2500, "to": 10000 }, { "from": 10000 }
    ]}}
  }
}

Autocomplete

Use completion suggester for prefix-based typeahead (fastest).
Return top 5 suggestions with category context.
Add "did you mean" using phrase suggester for low-result queries.

Relevance Tuning

Analyze search logs to improve quality:

Zero-result queries: Check for misspellings → add synonyms. Check for missing data → flag content gaps.
Low CTR queries: Top results don't match intent → adjust boost weights or add synonyms.
Position bias: If users consistently click result #3+, the ranking formula needs tuning.
Apply changes iteratively: synonyms first, then boost adjustments, then custom scoring.

Examples

Example 1 — Blog search index

Input: "Set up search for a blog with 10K articles."

Output:

{
  "mappings": {
    "properties": {
      "title":        { "type": "text", "analyzer": "content_analyzer", "boost": 5.0 },
      "body":         { "type": "text", "analyzer": "content_analyzer" },
      "author":       { "type": "keyword" },
      "tags":         { "type": "keyword" },
      "published_at": { "type": "date" },
      "suggest":      { "type": "completion", "contexts": [{ "name": "tag", "type": "category" }] }
    }
  }
}

Example 2 — Algolia configuration for an e-commerce store

Input: "Configure Algolia for a store with products."

Output:

index.setSettings({
  searchableAttributes: ['name', 'brand', 'category', 'description'],
  attributesForFaceting: ['category', 'brand', 'filterOnly(price_cents)', 'rating'],
  customRanking: ['desc(sales_count)', 'desc(rating)'],
  typoTolerance: true,
  minWordSizefor1Typo: 3,
  minWordSizefor2Typos: 6,
  hitsPerPage: 20,
  snippetEllipsisText: '…',
  attributesToSnippet: ['description:30'],
});

Guidelines

Start with Elasticsearch for control, Algolia for speed-to-market. Elasticsearch gives full tuning power; Algolia is faster to set up but costs more at scale.
Never search the primary database. Always sync to a dedicated search index. SQL
```
LIKE
```
does not scale.
Fuzziness AUTO is almost always correct. It allows 1 typo for 3-5 char words and 2 typos for 6+ chars.
Synonyms are the highest-ROI tuning. Most zero-result queries are fixed by adding 10-20 synonym pairs.
Monitor query performance. Set an alert if p95 search latency exceeds 200ms.