Skilllibrary embeddings-indexing
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/11-ai-llm-runtime-and-integration/embeddings-indexing" ~/.claude/skills/merceralex397-collab-skilllibrary-embeddings-indexing && rm -rf "$T"
manifest:
11-ai-llm-runtime-and-integration/embeddings-indexing/SKILL.mdsource content
Purpose
Build vector search systems: generate embeddings, index them in FAISS/pgvector/Pinecone, and implement similarity retrieval.
When to use this skill
- generating embeddings from text for semantic search or RAG
- setting up FAISS, pgvector, Pinecone, or Chroma as a vector store
- choosing embedding models (OpenAI, Cohere, sentence-transformers)
- tuning similarity search with distance metrics and index parameters
Do not use this skill when
- doing full-text keyword search — use Elasticsearch or Typesense
- designing agent memory policies — prefer
agent-memory - serving LLM inference — prefer
inference-serving
Procedure
- Choose embedding model — OpenAI
(1536d, cheap), Coheretext-embedding-3-small
, or localembed-v3
(384d, free).sentence-transformers/all-MiniLM-L6-v2 - Preprocess text — chunk documents into 256-512 token segments with 50-token overlap. Preserve paragraph boundaries.
- Generate embeddings — batch API calls (max 2048 texts per OpenAI call). Normalize vectors to unit length for cosine similarity.
- Choose vector store — FAISS for local/prototyping, pgvector for Postgres-native, Pinecone/Weaviate for managed cloud.
- Create index — FAISS:
for exact search,IndexFlatIP
for approximate. pgvector:IndexIVFFlat
.CREATE INDEX USING ivfflat ... WITH (lists = 100) - Insert vectors — batch upserts with metadata (source, chunk_id, timestamp). Store raw text alongside vectors for retrieval.
- Query — embed query text, search top-k nearest neighbors, apply metadata filters, rerank results if needed.
- Evaluate — measure recall@k on a test set. Tune chunk size, overlap, and index params based on results.
FAISS example
import faiss import numpy as np dimension = 1536 index = faiss.IndexFlatIP(dimension) # inner product (cosine on normalized vecs) # Add vectors vectors = np.array(embeddings, dtype="float32") faiss.normalize_L2(vectors) index.add(vectors) # Search query_vec = np.array([query_embedding], dtype="float32") faiss.normalize_L2(query_vec) distances, indices = index.search(query_vec, k=10)
pgvector setup
CREATE EXTENSION vector; CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT NOT NULL, embedding VECTOR(1536), metadata JSONB ); CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Query SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity FROM documents ORDER BY embedding <=> $1::vector LIMIT 10;
Decision rules
- Normalize embeddings to unit length — makes cosine and inner product equivalent.
- Chunk at 256-512 tokens with overlap — too small loses context, too large dilutes relevance.
- Use
or HNSW for datasets > 100k vectors — exact search is too slow.IndexIVFFlat - Store raw text with vectors — you need it for the LLM prompt, not just the vector.
- Batch embedding API calls — single-text calls are 10-50x slower due to overhead.
References
- https://github.com/facebookresearch/faiss
- https://github.com/pgvector/pgvector
- https://platform.openai.com/docs/guides/embeddings
Related skills
— using embeddings for agent recallagent-memory
— fitting retrieved chunks into contextcontext-management-memory
— hosting embedding models locallyinference-serving