install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/chromadb" ~/.claude/skills/comeonoliver-skillshub-chromadb && rm -rf "$T"
manifest:
skills/TerminalSkills/skills/chromadb/SKILL.mdsource content
ChromaDB
Overview
ChromaDB is an open-source vector database for storing, searching, and managing embeddings. It provides a simple API for document ingestion, semantic similarity search, and metadata filtering, supporting both Python and JavaScript/TypeScript clients with embedded, server, and cloud deployment options.
Instructions
- When initializing, use
for idempotent collection setup, chooseget_or_create_collection
for development andPersistentClient
for production server connections.HttpClient - When adding documents, batch
calls in chunks of 5,000 documents, always store source metadata (filename, URL, page number) for RAG citations, and useadd()
for incremental updates to avoid duplicates.upsert() - When querying, use
for text-based search, combine metadatacollection.query(query_texts=..., n_results=...)
filters to narrow results before semantic search, and setwhere
based on the LLM's context window (5-10 for most RAG pipelines).n_results - When choosing embeddings, use the default Sentence Transformers for local development without API keys, OpenAI or Cohere embedding functions for production, or pass pre-computed vectors directly.
- When filtering metadata, use operators like
,$eq
,$gt
with$in
/$and
logical operators, and combine with$or
for content-based filtering alongside semantic similarity.where_document - When deploying, use the embedded
for single-node applications, Docker for server mode, or Chroma Cloud for managed hosting with multi-tenancy support.PersistentClient - When tuning performance, configure HNSW parameters (
,hnsw:M
,hnsw:construction_ef
) for the quality-speed tradeoff and choosehnsw:search_ef
distance for normalized embeddings (OpenAI, Cohere).cosine
Examples
Example 1: Build a document Q&A pipeline
User request: "Set up a RAG pipeline with ChromaDB for answering questions about our docs"
Actions:
- Load documents and split into chunks with metadata (source, page)
- Create a collection with OpenAI embedding function
- Batch-add document chunks with
for idempotent ingestionupsert() - Query with
and pass retrieved chunks as context to the LLMcollection.query()
Output: A semantic search pipeline that retrieves relevant document chunks for LLM-powered Q&A.
Example 2: Add filtered semantic search to an application
User request: "Implement product search that combines text similarity with category filters"
Actions:
- Create a collection with product descriptions and category metadata
- Implement search combining
withquery_textswhere={"category": "electronics"} - Return results with distances for relevance ranking
- Add price range filtering with
and$gte
operators$lte
Output: A filtered semantic search that narrows by metadata before ranking by text similarity.
Guidelines
- Use
for idempotent collection initialization; it is safe for restarts.get_or_create_collection - Batch
calls in chunks of 5,000 documents to manage memory usage.add() - Always store source metadata (filename, URL, page number); it is essential for RAG citations.
- Use
for incremental updates to avoid duplicate documents when re-ingesting.upsert() - Set
based on the LLM's context window: 5-10 results for most RAG pipelines.n_results - Use metadata filtering to narrow results before semantic search to reduce noise.
- Choose
distance for normalized embeddings (OpenAI, Cohere) andcosine
for unnormalized.l2