Skills api-ai-cohere-sdk
Official Cohere TypeScript SDK patterns -- CohereClientV2, chat, embeddings, rerank, RAG with citations, tool use, streaming, and model selection
git clone https://github.com/agents-inc/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/api-ai-cohere-sdk/skills/api-ai-cohere-sdk" ~/.claude/skills/agents-inc-skills-api-ai-cohere-sdk && rm -rf "$T"
dist/plugins/api-ai-cohere-sdk/skills/api-ai-cohere-sdk/SKILL.mdCohere SDK Patterns
Quick Guide: Use the
npm package withcohere-aifor all new Cohere integrations. V2 API requiresCohereClientV2on every call. Usemodelfor streaming withchatStreamevents. Embeddings requirecontent-deltamatching your use case (inputTypefor indexing,search_documentfor querying). Rerank scores documents by relevance. RAG works by passingsearch_querytodocuments-- the model returns inline citations automatically. Tool use follows a 4-step loop: user message, model returnschat(), you execute and return results, model generates cited response.tool_calls
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use
(not CohereClientV2
) for all new code -- V2 is the current API with required CohereClient
parameter)model
(You MUST specify
on every embed call -- inputType
for indexing, search_document
for querying -- mismatched types produce garbage similarity scores)search_query
(You MUST handle the tool use loop correctly: append the full assistant message (with
) to messages, then append tool_calls
role results with matching tool
)tool_call_id
(You MUST check
in responses -- finish_reason
means the output was truncated)MAX_TOKENS
(You MUST never hardcode API keys -- pass via
constructor parameter sourced from environment variables)token
</critical_requirements>
Auto-detection: Cohere, cohere-ai, CohereClientV2, CohereClient, command-a, command-r, command-r-plus, embed-v4, rerank-v4, chatStream, content-delta, inputType, search_document, search_query, embeddingTypes, topN, CO_API_KEY, COHERE_API_KEY
When to use:
- Building applications with Cohere Command models (chat, generation, summarization)
- Creating semantic search pipelines with Cohere embeddings
- Adding relevance scoring to search results with Cohere Rerank
- Implementing RAG with inline document grounding and automatic citations
- Building agentic workflows with Cohere tool use / function calling
- Streaming chat responses for real-time user interfaces
Key patterns covered:
- Client setup with
(token, timeout, platform configs)CohereClientV2 - Chat and streaming (
,chat
, event types)chatStream - Embeddings with
for search/classification/clusteringinputType - Rerank for relevance scoring and search result ordering
- RAG with documents and automatic citation handling
- Tool use / function calling with multi-step loops
- Model selection (Command-A, Command-R, Embed v4, Rerank v4)
When NOT to use:
- Multi-provider applications needing OpenAI/Anthropic/Google switching -- use a unified provider SDK
- React-specific chat UI hooks -- use a framework-integrated AI SDK
- Simple text completion without Cohere-specific features (rerank, citations)
Examples Index
- Core: Setup, Chat & Error Handling -- CohereClientV2 init, basic chat, streaming, error handling
- Embeddings & Rerank -- Semantic search, input types, rerank scoring, RAG pipeline
- Tool Use & RAG -- Function calling, document grounding, citation handling
- Quick API Reference -- Model IDs, method signatures, event types, error classes
<philosophy>
Philosophy
The Cohere TypeScript SDK (
cohere-ai) provides direct access to Cohere's API surface -- chat, embeddings, rerank, and RAG with citations. The SDK is auto-generated from Cohere's API spec using Fern.
Core principles:
- V2 API is current --
provides the modern API.CohereClientV2
is required on every call. V1 methods onmodel
are legacy.CohereClient - Embeddings are typed -- The
parameter (inputType
,search_document
,search_query
,classification
) is mandatory for v3+ models. Mismatching input types between indexing and querying silently degrades results.clustering - RAG is first-class -- Pass
directly todocuments
and the model returns grounded answers with inline citations. No external retrieval framework required for the grounding step.chat() - Rerank is a standalone primitive -- Score and reorder search results without building a full RAG pipeline. Feed any list of documents and a query, get relevance scores back.
- Citations are automatic -- When documents are provided (via RAG or tool results), the model generates fine-grained citations with start/end positions and source references.
When to use the Cohere SDK directly:
- You want Cohere-specific features: rerank, citation grounding, multilingual embeddings
- You need semantic search with embed + rerank pipeline
- You want RAG with automatic inline citations
- You are building on Cohere's platform (or Bedrock/Azure/OCI with Cohere models)
When NOT to use:
- You need to switch between multiple LLM providers -- use a unified provider SDK
- You want React-specific chat UI hooks -- use a framework-integrated AI SDK
- You only need basic chat completion without Cohere differentiators
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize
CohereClientV2. The token parameter is required (pass from environment).
// lib/cohere.ts -- basic setup import { CohereClientV2 } from "cohere-ai"; const client = new CohereClientV2({ token: process.env.CO_API_KEY, }); export { client };
// lib/cohere.ts -- production configuration const TIMEOUT_MS = 30_000; const client = new CohereClientV2({ token: process.env.CO_API_KEY, timeout: TIMEOUT_MS, });
Why good: Explicit token from env var, named timeout constant, named export
// BAD: Hardcoded key, default CohereClient (V1) import { CohereClient } from "cohere-ai"; const client = new CohereClient({ token: "sk-abc123" });
Why bad: Hardcoded API key is a security breach risk,
CohereClient is the legacy V1 client
See: examples/core.md for error handling, platform configs (Bedrock, Azure)
Pattern 2: Chat Completion
V2 chat uses
messages array with system, user, assistant, and tool roles.
const response = await client.chat({ model: "command-a-03-2025", messages: [ { role: "system", content: "You are a helpful coding assistant." }, { role: "user", content: "Explain TypeScript generics." }, ], }); console.log(response.message.content[0].text);
Why good: System message for instruction,
model explicitly specified, correct V2 content access path
// BAD: Missing model (required in V2), wrong response access const response = await client.chat({ messages: [{ role: "user", content: "Hello" }], }); console.log(response.text); // WRONG: V2 uses response.message.content[0].text
Why bad: V2 requires
model, response shape is response.message.content[0].text not response.text
See: examples/core.md for multi-turn, token tracking, temperature control
Pattern 3: Streaming
Use
chatStream with for await and check event type for content-delta.
const stream = await client.chatStream({ model: "command-a-03-2025", messages: [{ role: "user", content: "Explain async/await." }], }); for await (const event of stream) { if (event.type === "content-delta") { process.stdout.write(event.delta?.message?.content?.text ?? ""); } }
Why good: Checks event type before accessing delta, handles nullable content safely
// BAD: Not checking event type for await (const event of stream) { console.log(event.delta?.message); // Many events don't have message delta }
Why bad: Only
content-delta events have text content -- other events (message-start, citation-start, tool-plan-delta) have different shapes
See: examples/core.md for full streaming with all event types
Pattern 4: Embeddings
inputType is required for v3+ models. Mismatching types between indexing and querying silently degrades results.
const EMBEDDING_MODEL = "embed-v4.0"; // Index documents with search_document const docEmbeddings = await client.embed({ model: EMBEDDING_MODEL, inputType: "search_document", texts: ["TypeScript is a typed superset of JavaScript."], embeddingTypes: ["float"], }); // Query with search_query const queryEmbedding = await client.embed({ model: EMBEDDING_MODEL, inputType: "search_query", texts: ["What is TypeScript?"], embeddingTypes: ["float"], });
Why good: Correct
inputType pairing, embeddingTypes explicitly specified, named model constant
// BAD: Same inputType for both indexing and querying const docs = await client.embed({ model: "embed-v4.0", inputType: "search_query", // WRONG for documents texts: documents, embeddingTypes: ["float"], });
Why bad: Using
search_query for document indexing silently produces worse similarity scores -- documents must use search_document
See: examples/embeddings-rerank.md for cosine similarity, dimension control, batch embedding
Pattern 5: Rerank
Score documents by relevance to a query. Returns ordered results with relevance scores.
const RERANK_MODEL = "rerank-v4.0-pro"; const TOP_N = 3; const result = await client.rerank({ model: RERANK_MODEL, query: "What is TypeScript?", documents: [ "TypeScript is a typed superset of JavaScript.", "Python is a general-purpose language.", "TypeScript compiles to JavaScript.", ], topN: TOP_N, }); for (const item of result.results) { console.log(`Doc ${item.index}: score ${item.relevanceScore}`); }
Why good: Named constants,
topN limits results, accesses index and relevanceScore
See: examples/embeddings-rerank.md for embed + rerank pipeline, rank fields
Pattern 6: RAG with Documents
Pass
documents to chat() and the model returns grounded answers with inline citations.
const response = await client.chat({ model: "command-a-03-2025", messages: [{ role: "user", content: "What is TypeScript?" }], documents: [ { data: { text: "TypeScript is a typed superset of JavaScript.", title: "TS Docs", }, }, { data: { text: "TypeScript was developed by Microsoft.", title: "History", }, }, ], }); console.log(response.message.content[0].text); // Citations reference which documents support each claim if (response.message.citations) { for (const citation of response.message.citations) { console.log(`"${citation.text}" from doc ${citation.sources}`); } }
Why good: Documents passed inline with metadata, citations accessed from response, no external retrieval framework needed
See: examples/tools-rag.md for full RAG pipeline with embed + rerank + chat
Pattern 7: Tool Use / Function Calling
4-step loop: user message -> model returns
tool_calls -> execute tools -> return results with tool_call_id.
const tools = [ { type: "function" as const, function: { name: "get_weather", description: "Get weather for a city", parameters: { type: "object", properties: { location: { type: "string", description: "City name" }, }, required: ["location"], }, }, }, ]; const response = await client.chat({ model: "command-a-03-2025", messages: [{ role: "user", content: "Weather in Paris?" }], tools, }); // Check if model wants to call tools if (response.message.toolCalls) { // See examples/tools-rag.md for the complete tool execution loop }
Why good: Standard JSON Schema tool definition, checks for toolCalls before executing
See: examples/tools-rag.md for complete multi-step tool loop with tool result submission
Pattern 8: Error Handling
Catch
CohereError for API errors, CohereTimeoutError for timeouts.
import { CohereError, CohereTimeoutError } from "cohere-ai"; try { const response = await client.chat({ model: "command-a-03-2025", messages: [{ role: "user", content: "Hello" }], }); } catch (error) { if (error instanceof CohereTimeoutError) { console.error("Request timed out"); } else if (error instanceof CohereError) { console.error(`API Error [${error.statusCode}]: ${error.message}`); console.error("Body:", error.body); } else { throw error; // Re-throw unknown errors } }
Why good: Specific error types with status codes, re-throws unexpected errors, timeout handled separately
See: examples/core.md for production error handling patterns
</patterns><performance>
Performance Optimization
Model Selection for Cost/Speed
General purpose (best) -> command-a-03-2025 (256K context, strongest) Reasoning tasks -> command-a-reasoning-08-2025 (multi-step reasoning) Vision/document analysis -> command-a-vision-07-2025 (images, charts, OCR) Translation -> command-a-translate-08-2025 (23 languages) Lightweight / edge -> command-r7b-12-2024 (7B, fast, 128K context) Legacy (still supported) -> command-r-08-2024, command-r-plus-08-2024 Embeddings (best) -> embed-v4.0 (multimodal, 128K context, flexible dims) Embeddings (English) -> embed-english-v3.0 (1024 dims) Embeddings (multilingual) -> embed-multilingual-v3.0 (23 languages) Rerank (quality) -> rerank-v4.0-pro (32K context, multilingual) Rerank (speed) -> rerank-v4.0-fast (32K context, latency-optimized)
Key Optimization Patterns
- Batch embeddings -- pass up to 96 texts per
call instead of calling per-documentembed() - Use
in rerank -- limit results to reduce response size and costtopN - Use
with embed-v4 -- reduce dimensions (256/512/1024) for faster similarity search at minimal quality lossoutputDimension - Check
-- detect truncated outputfinish_reason === "MAX_TOKENS" - Use
for deterministic output (enables caching)temperature: 0 - Use embed-v4
/int8
types for compressed storage with minimal quality lossbinary
<decision_framework>
Decision Framework
Which Client Class to Use
New project? +-- YES -> CohereClientV2 (always) +-- Existing V1 code? +-- Working fine? -> Keep CohereClient but plan migration +-- Need V2 features? -> Migrate to CohereClientV2
Which Model to Choose
What is your task? +-- General chat/generation -> command-a-03-2025 (most capable) +-- Reasoning / multi-step -> command-a-reasoning-08-2025 +-- Image/document analysis -> command-a-vision-07-2025 +-- Translation -> command-a-translate-08-2025 +-- Lightweight / low latency -> command-r7b-12-2024 +-- Embeddings -> embed-v4.0 (or embed-english-v3.0 for English-only) +-- Rerank quality -> rerank-v4.0-pro +-- Rerank speed -> rerank-v4.0-fast
Embed inputType
Selection
inputTypeWhat are you embedding? +-- Documents for a search index -> "search_document" +-- Search queries against an index -> "search_query" +-- Text for a classifier -> "classification" +-- Text for clustering -> "clustering" +-- Images -> "image" (embed-v4+ only)
When to Use Rerank
Do you have search results to re-order? +-- YES -> Use rerank as a second-stage ranker | +-- Quality matters most? -> rerank-v4.0-pro | +-- Latency matters most? -> rerank-v4.0-fast +-- NO -> Not applicable (rerank needs existing results to score)
RAG Approach
Do you need grounded answers with citations? +-- YES -> Pass documents to chat() | +-- Have pre-retrieved documents? -> Pass directly via documents param | +-- Need retrieval first? -> Use embed + vector search + rerank pipeline, then pass top results to chat() +-- NO -> Use plain chat without documents
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Using
instead ofCohereClient
for new code (V1 is legacy)CohereClientV2 - Missing
parameter in V2 API calls (required on every call, unlike V1)model - Using wrong
for embeddings (inputType
for documents or vice versa -- silently degrades results)search_query - Hardcoding API keys instead of using environment variables
- Not appending the full assistant message (with
) before appending tool results in the tool use looptool_calls
Medium Priority Issues:
- Not specifying
(defaults may not match your storage format)embeddingTypes - Ignoring
(output was silently truncated)finish_reason: "MAX_TOKENS" - Not handling
separately fromCohereTimeoutErrorCohereError - Processing all stream events without checking
(onlytype
has text)content-delta - Using V1 parameter names (
,preamble
,connectors
) with V2 clientconversation_id
Common Mistakes:
- Accessing
instead ofresponse.text
(V2 response shape changed)response.message.content[0].text - Forgetting that
is required in V2 Embed APIembeddingTypes - Not matching
when submitting tool results (model cannot correlate results)tool_call_id - Using
with string values instead ofdocuments
objects in V2{ data: { text: "..." } } - Expecting
to exist when no documents were provided (citations only appear with grounded responses)response.message.citations
Gotchas & Edge Cases:
- The SDK is in beta -- pin your
version in package.json to avoid breaking changescohere-ai - V2 API is NOT yet supported for cloud deployments (Bedrock, SageMaker, Azure, OCI) -- use V1 client for cloud platforms
is camelCase in TypeScript SDK (inputType
) but snake_case in the REST API (inputType
)input_type- Embed API accepts max 96 texts per call -- batch larger sets yourself
supportsembed-v4.0
for flexible sizing (256, 512, 1024, 1536) but v3 models have fixed dimensionsoutputDimension- Rerank
is normalized 0-1 but not calibrated across queries -- compare scores within a single query onlyrelevanceScore - Stream events include
beforetool-plan-delta
-- the model's reasoning about which tool to calltool-call-start - V2 uses
role for instructions (V1 usedsystem
parameter)preamble - Citation
in tool use responses referencesources
values, not document indicestool_call_id - The
constructor parameter is for logging/analytics, not authenticationclientName
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use
(not CohereClientV2
) for all new code -- V2 is the current API with required CohereClient
parameter)model
(You MUST specify
on every embed call -- inputType
for indexing, search_document
for querying -- mismatched types produce garbage similarity scores)search_query
(You MUST handle the tool use loop correctly: append the full assistant message (with
) to messages, then append tool_calls
role results with matching tool
)tool_call_id
(You MUST check
in responses -- finish_reason
means the output was truncated)MAX_TOKENS
(You MUST never hardcode API keys -- pass via
constructor parameter sourced from environment variables)token
Failure to follow these rules will produce broken embeddings, missing citations, or insecure AI integrations.
</critical_reminders>