Learn-skills.dev api-vector-db-pinecone
Pinecone serverless vector database -- index management, vector operations, metadata filtering, namespaces, hybrid search, inference API
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/agents-inc/skills/api-vector-db-pinecone" ~/.claude/skills/neversight-learn-skills-dev-api-vector-db-pinecone && rm -rf "$T"
data/skills-md/agents-inc/skills/api-vector-db-pinecone/SKILL.mdPinecone Patterns
Quick Guide: Use
(v7.x) for serverless vector database operations. Target indexes by host (@pinecone-database/pinecone), not by name. Use namespaces for multi-tenant isolation (physically separate, cheaper queries). Batch upserts at 200 records (max 1,000 or 2 MB). Metadata is limited to 40 KB per record with flat key-value pairs only (no nested objects). Pinecone is eventually consistent -- vectors may not appear in queries immediately after upsert. Usepc.index({ host })to verify indexing progress. For hybrid search, usedescribeIndexStats()metric with sparse+dense vectors in a single index.dotproduct
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST target indexes by host URL, not by name --
is the v7 API; pc.index({ host })
is deprecated)pc.index('name')
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with
are rejected by Pinecone)$
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use
or retry logic for freshness-critical flows)describeIndexStats()
</critical_requirements>
Examples
- Core Patterns -- Client setup, index creation, upsert, query, fetch, update, delete
- Namespaces & Multi-Tenancy -- Namespace isolation, multi-tenant patterns, namespace management API
- Metadata Filtering -- Filter operators, compound filters, best practices
- Hybrid Search -- Sparse-dense vectors, hybrid index setup, alpha weighting
- Inference API -- Embedding generation, reranking, integrated inference indexes
- Batch Operations -- Chunked upserts, parallel ingestion, bulk import
Additional resources:
- reference.md -- API quick reference, filter operators, limits, decision frameworks, production checklist
Auto-detection: Pinecone, @pinecone-database/pinecone, createIndex, createIndexForModel, upsert, query, topK, includeMetadata, sparseValues, namespace, describeIndexStats, vector database, similarity search, embedding, cosine, dotproduct, euclidean, RAG retrieval, semantic search, pinecone-sparse-english, rerank, searchRecords, upsertRecords, fetchByMetadata
When to use:
- Semantic search over document embeddings (RAG retrieval)
- Similarity search for recommendations, deduplication, or classification
- Multi-tenant vector isolation using namespaces
- Hybrid semantic + keyword search using sparse-dense vectors
- Embedding generation and result reranking via Pinecone Inference API
Key patterns covered:
- Client setup and index management (serverless vs pod-based)
- Vector CRUD operations (upsert, query, fetch, update, delete)
- Metadata filtering with compound operators
- Namespace-based multi-tenancy
- Sparse-dense hybrid search
- Pinecone Inference API (embed, rerank)
- Batch ingestion with chunking and parallelism
- Integrated inference indexes (automatic embedding)
When NOT to use:
- Full-text search with complex boolean queries (use a dedicated search engine)
- Relational data with joins and transactions (use a relational database)
- Real-time streaming or pub/sub messaging (use a message broker)
- Storing large binary blobs or documents (use object storage; store only embeddings + metadata references)
<philosophy>
Philosophy
Pinecone is a managed serverless vector database purpose-built for similarity search at scale. The core principle: store embeddings and metadata, query by vector similarity, filter by metadata.
Core principles:
- Vectors in, results out -- Pinecone stores high-dimensional vectors and returns the most similar ones. It is not a general-purpose database. Structure your data as embeddings + metadata references.
- Namespaces for isolation -- Use namespaces to physically separate tenant data. Queries scan only the target namespace, reducing cost and latency compared to metadata filtering across a shared namespace.
- Metadata is for filtering, not storage -- Keep metadata small (40 KB limit) and flat. Store document content in your primary database; store only filterable attributes (category, date, tenant ID) as Pinecone metadata.
- Batch for throughput -- Individual upserts are inefficient. Batch at 200 records for optimal throughput (max 1,000 or 2 MB per request).
- Eventual consistency is normal -- Freshly upserted vectors may not appear in query results immediately. Design your application to tolerate brief staleness or poll
before querying.describeIndexStats()
<patterns>
Core Patterns
Pattern 1: Client Initialization
Create a Pinecone client from an API key. See examples/core.md for full examples.
// Good Example import { Pinecone } from "@pinecone-database/pinecone"; function createPineconeClient(): Pinecone { const apiKey = process.env.PINECONE_API_KEY; if (!apiKey) { throw new Error("PINECONE_API_KEY environment variable is required"); } return new Pinecone({ apiKey }); } export { createPineconeClient };
Why good: API key from environment variable, validation before construction, named export
// Bad Example import { Pinecone } from "@pinecone-database/pinecone"; const pc = new Pinecone({ apiKey: "sk-abc123..." }); // Hardcoded key leaks in version control
Why bad: Hardcoded API key is a security risk, no validation
Pattern 2: Index Targeting (v7 API)
Always target an index by its host URL, not its name. See examples/core.md.
// Good Example -- target by host const indexModel = await pc.createIndex({ name: "products", dimension: EMBEDDING_DIMENSION, metric: "cosine", spec: { serverless: { cloud: "aws", region: "us-east-1" } }, }); const index = pc.index({ host: indexModel.host });
Why good:
pc.index({ host }) is the v7 API, avoids an extra API call to resolve the name to a host
// Bad Example -- target by name (deprecated) const index = pc.index("products"); // Triggers an extra describeIndex call to resolve the host URL
Why bad: Targeting by name requires an extra network call and is deprecated in v7
Pattern 3: Upsert with Metadata
Upsert vectors with flat metadata for filtering. See examples/core.md for typed metadata.
// Good Example interface DocumentMetadata { title: string; category: string; createdAt: number; // Unix timestamp (numbers only, no Date objects) } const NAMESPACE = "articles"; await index.namespace(NAMESPACE).upsert({ records: [ { id: "doc-1", values: embedding, // number[] matching index dimension metadata: { title: "Guide", category: "tutorial", createdAt: 1710000000 }, }, ], });
Why good: Typed metadata interface, flat key-value pairs, numeric timestamp (not Date), namespace isolation
Pattern 4: Query with Metadata Filter
Query for similar vectors with metadata filtering. See examples/metadata-filtering.md for all operators.
// Good Example const TOP_K = 10; const results = await index.namespace(NAMESPACE).query({ vector: queryEmbedding, topK: TOP_K, includeMetadata: true, filter: { $and: [ { category: { $eq: "tutorial" } }, { createdAt: { $gte: 1700000000 } }, ], }, }); for (const match of results.matches) { console.log(match.id, match.score, match.metadata); }
Why good: Named constant for topK, structured filter with
$and, includes metadata in response
// Bad Example const results = await index.query({ vector: queryEmbedding, topK: 100, includeMetadata: true, filter: { tags: ["a", "b"] }, // INVALID: arrays are not valid filter values });
Why bad: Missing namespace (queries default namespace), array filter syntax is invalid (use
$in), no named constant for topK
Pattern 5: Namespace-Based Multi-Tenancy
Use namespaces for tenant isolation. See examples/namespaces.md.
// Good Example -- physically isolated tenant data function getTenantIndex(pc: Pinecone, host: string, tenantId: string) { return pc.index({ host }).namespace(`tenant-${tenantId}`); } // Each tenant's queries scan only their namespace const tenantIndex = getTenantIndex(pc, INDEX_HOST, "acme-corp"); const results = await tenantIndex.query({ vector: embedding, topK: TOP_K });
Why good: Physical isolation per tenant, queries scan only the target namespace (lower cost and latency)
// Bad Example -- metadata filtering for multi-tenancy await index.query({ vector: embedding, topK: 10, filter: { tenantId: { $eq: "acme-corp" } }, // Scans ENTIRE index, filters after -- expensive at scale });
Why bad: Metadata filtering scans the full namespace regardless of filter selectivity, cost scales with total data not tenant data
Pattern 6: Pinecone Inference API
Generate embeddings and rerank results. See examples/inference.md.
// Good Example -- embed text const embedResult = await pc.inference.embed({ model: "multilingual-e5-large", inputs: [{ text: "What is machine learning?" }], parameters: { inputType: "query", truncate: "END" }, }); const queryVector = embedResult.data[0].values;
Why good: Specifies
inputType (query vs passage), handles truncation for long inputs
// Good Example -- rerank results const rerankResult = await pc.inference.rerank({ model: "pinecone-rerank-v0", query: "machine learning basics", documents: results.matches.map((m) => ({ id: m.id, text: m.metadata?.content as string, })), topN: 5, returnDocuments: true, });
Why good: Reranks query results for better relevance, limits output with
topN
</patterns>
<decision_framework>
Decision Framework
Which Index Type?
Which Pinecone index type should I use? |-- Serverless? (recommended for most use cases) | |-- Variable or unpredictable traffic? -> Serverless (auto-scales, pay-per-use) | |-- Starting a new project? -> Serverless (simpler, no capacity planning) | '-- Need hybrid sparse-dense search? -> Serverless with dotproduct metric | '-- Pod-based? (legacy, specific needs) |-- Need guaranteed low latency SLAs? -> Pod-based (dedicated compute) '-- Using collections for snapshots? -> Pod-based (collections are pod-only)
Which Metric?
Which distance metric should I use? |-- Using embeddings from a language model? -> cosine (normalized, most common) |-- Need hybrid search (sparse + dense)? -> dotproduct (REQUIRED for hybrid) |-- Comparing raw feature vectors? -> euclidean (absolute distance matters) '-- Unsure? -> cosine (safe default for most embedding models)
Namespaces vs Metadata Filtering?
How should I isolate tenant data? |-- Strict data isolation required? -> Namespaces (physical separation) |-- Need to query across tenants? -> Metadata filtering (logical separation) |-- Cost-sensitive at scale? -> Namespaces (query cost = tenant size, not total) |-- Few tenants (< 10)? -> Either approach works '-- Many tenants (100+)? -> Namespaces (metadata filtering scans everything)
Embedded Inference vs External Embeddings?
How should I generate embeddings? |-- Want simplest architecture? -> Integrated inference (createIndexForModel) | (Pinecone handles embedding automatically on upsert/query) | |-- Need a specific embedding model not hosted by Pinecone? -> External | (Generate embeddings yourself, upsert raw vectors) | |-- Need hybrid search with sparse vectors? -> External sparse model | (Use pinecone-sparse-english-v0 via inference API + your dense model) | '-- Need full control over embedding pipeline? -> External (Custom preprocessing, chunking, model selection)
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Targeting index by name instead of host --
is deprecated in v7; usepc.index("name")
to avoid an extra API callpc.index({ host }) - Upserting vectors with wrong dimensions -- dimension mismatch causes a 400 error; verify your embedding model's output dimension matches the index
- Nested metadata objects -- Pinecone only supports flat key-value metadata; nested objects are silently ignored or rejected
- Using metadata filtering for multi-tenancy at scale -- scans the entire namespace regardless of filter selectivity; use namespaces instead
Medium Priority Issues:
- Missing
in queries -- metadata is NOT included by default; omitting this returns only IDs and scoresincludeMetadata: true - Upsert batches exceeding 1,000 records or 2 MB -- triggers a 400 error; chunk at 200 records for safety margin
- Using
objects in metadata -- Pinecone metadata supports strings, numbers, booleans, and string arrays only; convert dates to Unix timestampsDate - Not awaiting
readiness -- index creation is async; the index is not ready for operations immediately aftercreateIndex()
returnscreateIndex()
Common Mistakes:
- Querying immediately after upsert and expecting results -- eventual consistency means freshly upserted vectors may not be queryable for seconds
- Using
> 1,000 withtopK
-- the maxincludeMetadata: true
is 1,000 when including metadata or values; without them, max is 10,000topK - Passing array values as metadata filter values (
) -- use{ tags: ["a", "b"] }
operator instead:$in{ tags: { $in: ["a", "b"] } } - Forgetting that
without a namespace deletes from the default namespace only, not the entire indexdeleteAll()
Gotchas & Edge Cases:
returns approximate counts -- record counts are not exact in real-time, especially after recent upserts or deletesdescribeIndexStats()- Metadata values are always returned as their original types, but filter comparisons are type-strict --
does not match numeric$eq: "42"42 - Sparse vector indices must be positive 32-bit integers (uint32), and values must be non-zero floats -- zero values are silently dropped
returns vector IDs only (no values or metadata) -- uselistPaginated()
to get full vector datafetch()- The
and$in
operators accept a maximum of 10,000 values each$nin - Metadata keys cannot start with
(reserved for operators)$
is an upsert, not an insert -- upserting with an existing ID overwrites the previous vector and metadata entirely (no partial merge)upsert
merges metadata by default -- updating metadata replaces only the fields you specify, not the entire metadata objectupdate()- The v7 SDK includes built-in automatic retry with exponential backoff for transient errors -- custom retry logic is only needed for fine-grained control or non-default retry policies
(integrated inference) accepts a direct array, notupsertRecords
-- this differs from the regular{ records: [...] }
method which usesupsert{ records: [...] }
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST target indexes by host URL, not by name --
is the v7 API; pc.index({ host })
is deprecated)pc.index('name')
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with
are rejected by Pinecone)$
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use
or retry logic for freshness-critical flows)describeIndexStats()
Failure to follow these rules will cause index creation failures, rejected upserts, empty query results, and degraded multi-tenant performance.
</critical_reminders>