Skills api-ai-cohere-sdk

Official Cohere TypeScript SDK patterns -- CohereClientV2, chat, embeddings, rerank, RAG with citations, tool use, streaming, and model selection

install

source · Clone the upstream repo

git clone https://github.com/agents-inc/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/api-ai-cohere-sdk/skills/api-ai-cohere-sdk" ~/.claude/skills/agents-inc-skills-api-ai-cohere-sdk && rm -rf "$T"

manifest: dist/plugins/api-ai-cohere-sdk/skills/api-ai-cohere-sdk/SKILL.md

source content

Cohere SDK Patterns

Quick Guide: Use the
cohere-ai
npm package with
CohereClientV2
for all new Cohere integrations. V2 API requires
model
on every call. Use
chatStream
for streaming with
content-delta
events. Embeddings require
inputType
matching your use case (
search_document
for indexing,
search_query
for querying). Rerank scores documents by relevance. RAG works by passing
documents
to
chat()
-- the model returns inline citations automatically. Tool use follows a 4-step loop: user message, model returns
tool_calls
, you execute and return results, model generates cited response.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type
, named constants)

(You MUST use

CohereClientV2

(not
CohereClient
) for all new code -- V2 is the current API with required
model
parameter)

(You MUST specify

inputType

on every embed call --
search_document
for indexing,
search_query
for querying -- mismatched types produce garbage similarity scores)

(You MUST handle the tool use loop correctly: append the full assistant message (with

tool_calls

) to messages, then append
tool
role results with matching
tool_call_id
)

(You MUST check

finish_reason

in responses --
MAX_TOKENS
means the output was truncated)

(You MUST never hardcode API keys -- pass via

token

constructor parameter sourced from environment variables)

</critical_requirements>

Auto-detection: Cohere, cohere-ai, CohereClientV2, CohereClient, command-a, command-r, command-r-plus, embed-v4, rerank-v4, chatStream, content-delta, inputType, search_document, search_query, embeddingTypes, topN, CO_API_KEY, COHERE_API_KEY

When to use:

Building applications with Cohere Command models (chat, generation, summarization)
Creating semantic search pipelines with Cohere embeddings
Adding relevance scoring to search results with Cohere Rerank
Implementing RAG with inline document grounding and automatic citations
Building agentic workflows with Cohere tool use / function calling
Streaming chat responses for real-time user interfaces

Key patterns covered:

Client setup with
```
CohereClientV2
```
(token, timeout, platform configs)
Chat and streaming (
```
chat
```
,
```
chatStream
```
, event types)
Embeddings with
```
inputType
```
for search/classification/clustering
Rerank for relevance scoring and search result ordering
RAG with documents and automatic citation handling
Tool use / function calling with multi-step loops
Model selection (Command-A, Command-R, Embed v4, Rerank v4)

When NOT to use:

Multi-provider applications needing OpenAI/Anthropic/Google switching -- use a unified provider SDK
React-specific chat UI hooks -- use a framework-integrated AI SDK
Simple text completion without Cohere-specific features (rerank, citations)

Examples Index

Core: Setup, Chat & Error Handling -- CohereClientV2 init, basic chat, streaming, error handling
Embeddings & Rerank -- Semantic search, input types, rerank scoring, RAG pipeline
Tool Use & RAG -- Function calling, document grounding, citation handling
Quick API Reference -- Model IDs, method signatures, event types, error classes

Philosophy

The Cohere TypeScript SDK (

cohere-ai

) provides direct access to Cohere's API surface -- chat, embeddings, rerank, and RAG with citations. The SDK is auto-generated from Cohere's API spec using Fern.

Core principles:

V2 API is current --
```
CohereClientV2
```
provides the modern API.
```
model
```
is required on every call. V1 methods on
```
CohereClient
```
are legacy.
Embeddings are typed -- The
```
inputType
```
parameter (
```
search_document
```
,
```
search_query
```
,
```
classification
```
,
```
clustering
```
) is mandatory for v3+ models. Mismatching input types between indexing and querying silently degrades results.
RAG is first-class -- Pass
```
documents
```
directly to
```
chat()
```
and the model returns grounded answers with inline citations. No external retrieval framework required for the grounding step.
Rerank is a standalone primitive -- Score and reorder search results without building a full RAG pipeline. Feed any list of documents and a query, get relevance scores back.
Citations are automatic -- When documents are provided (via RAG or tool results), the model generates fine-grained citations with start/end positions and source references.

When to use the Cohere SDK directly:

You want Cohere-specific features: rerank, citation grounding, multilingual embeddings
You need semantic search with embed + rerank pipeline
You want RAG with automatic inline citations
You are building on Cohere's platform (or Bedrock/Azure/OCI with Cohere models)

When NOT to use:

You need to switch between multiple LLM providers -- use a unified provider SDK
You want React-specific chat UI hooks -- use a framework-integrated AI SDK
You only need basic chat completion without Cohere differentiators

</philosophy>

Core Patterns

Pattern 1: Client Setup

Initialize

CohereClientV2

. The

token

parameter is required (pass from environment).

// lib/cohere.ts -- basic setup
import { CohereClientV2 } from "cohere-ai";

const client = new CohereClientV2({
  token: process.env.CO_API_KEY,
});

export { client };

// lib/cohere.ts -- production configuration
const TIMEOUT_MS = 30_000;

const client = new CohereClientV2({
  token: process.env.CO_API_KEY,
  timeout: TIMEOUT_MS,
});

Why good: Explicit token from env var, named timeout constant, named export

// BAD: Hardcoded key, default CohereClient (V1)
import { CohereClient } from "cohere-ai";
const client = new CohereClient({ token: "sk-abc123" });

Why bad: Hardcoded API key is a security breach risk,

CohereClient

is the legacy V1 client

See: examples/core.md for error handling, platform configs (Bedrock, Azure)

Pattern 2: Chat Completion

V2 chat uses

messages

array with

system

user

assistant

, and

tool

roles.

const response = await client.chat({
  model: "command-a-03-2025",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "Explain TypeScript generics." },
  ],
});

console.log(response.message.content[0].text);

Why good: System message for instruction,

model

explicitly specified, correct V2 content access path

// BAD: Missing model (required in V2), wrong response access
const response = await client.chat({
  messages: [{ role: "user", content: "Hello" }],
});
console.log(response.text); // WRONG: V2 uses response.message.content[0].text

Why bad: V2 requires

model

, response shape is

response.message.content[0].text

not

response.text

See: examples/core.md for multi-turn, token tracking, temperature control

Pattern 3: Streaming

Use

chatStream

with

for await

and check event

type

for

content-delta

const stream = await client.chatStream({
  model: "command-a-03-2025",
  messages: [{ role: "user", content: "Explain async/await." }],
});

for await (const event of stream) {
  if (event.type === "content-delta") {
    process.stdout.write(event.delta?.message?.content?.text ?? "");
  }
}

Why good: Checks event type before accessing delta, handles nullable content safely

// BAD: Not checking event type
for await (const event of stream) {
  console.log(event.delta?.message); // Many events don't have message delta
}

Why bad: Only

content-delta

events have text content -- other events (

message-start

citation-start

tool-plan-delta

) have different shapes

See: examples/core.md for full streaming with all event types

Pattern 4: Embeddings

inputType

is required for v3+ models. Mismatching types between indexing and querying silently degrades results.

const EMBEDDING_MODEL = "embed-v4.0";

// Index documents with search_document
const docEmbeddings = await client.embed({
  model: EMBEDDING_MODEL,
  inputType: "search_document",
  texts: ["TypeScript is a typed superset of JavaScript."],
  embeddingTypes: ["float"],
});

// Query with search_query
const queryEmbedding = await client.embed({
  model: EMBEDDING_MODEL,
  inputType: "search_query",
  texts: ["What is TypeScript?"],
  embeddingTypes: ["float"],
});

Why good: Correct

inputType

pairing,

embeddingTypes

explicitly specified, named model constant

// BAD: Same inputType for both indexing and querying
const docs = await client.embed({
  model: "embed-v4.0",
  inputType: "search_query", // WRONG for documents
  texts: documents,
  embeddingTypes: ["float"],
});

Why bad: Using

search_query

for document indexing silently produces worse similarity scores -- documents must use

search_document

See: examples/embeddings-rerank.md for cosine similarity, dimension control, batch embedding

Pattern 5: Rerank

Score documents by relevance to a query. Returns ordered results with relevance scores.

const RERANK_MODEL = "rerank-v4.0-pro";
const TOP_N = 3;

const result = await client.rerank({
  model: RERANK_MODEL,
  query: "What is TypeScript?",
  documents: [
    "TypeScript is a typed superset of JavaScript.",
    "Python is a general-purpose language.",
    "TypeScript compiles to JavaScript.",
  ],
  topN: TOP_N,
});

for (const item of result.results) {
  console.log(`Doc ${item.index}: score ${item.relevanceScore}`);
}

Why good: Named constants,

topN

limits results, accesses

index

and

relevanceScore

See: examples/embeddings-rerank.md for embed + rerank pipeline, rank fields

Pattern 6: RAG with Documents

Pass

documents

chat()

and the model returns grounded answers with inline citations.

const response = await client.chat({
  model: "command-a-03-2025",
  messages: [{ role: "user", content: "What is TypeScript?" }],
  documents: [
    {
      data: {
        text: "TypeScript is a typed superset of JavaScript.",
        title: "TS Docs",
      },
    },
    {
      data: {
        text: "TypeScript was developed by Microsoft.",
        title: "History",
      },
    },
  ],
});

console.log(response.message.content[0].text);

// Citations reference which documents support each claim
if (response.message.citations) {
  for (const citation of response.message.citations) {
    console.log(`"${citation.text}" from doc ${citation.sources}`);
  }
}

Why good: Documents passed inline with metadata, citations accessed from response, no external retrieval framework needed

See: examples/tools-rag.md for full RAG pipeline with embed + rerank + chat

Pattern 7: Tool Use / Function Calling

4-step loop: user message -> model returns

tool_calls

-> execute tools -> return results with

tool_call_id

const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Get weather for a city",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string", description: "City name" },
        },
        required: ["location"],
      },
    },
  },
];

const response = await client.chat({
  model: "command-a-03-2025",
  messages: [{ role: "user", content: "Weather in Paris?" }],
  tools,
});

// Check if model wants to call tools
if (response.message.toolCalls) {
  // See examples/tools-rag.md for the complete tool execution loop
}

Why good: Standard JSON Schema tool definition, checks for toolCalls before executing

See: examples/tools-rag.md for complete multi-step tool loop with tool result submission

Pattern 8: Error Handling

Catch

CohereError

for API errors,

CohereTimeoutError

for timeouts.

import { CohereError, CohereTimeoutError } from "cohere-ai";

try {
  const response = await client.chat({
    model: "command-a-03-2025",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof CohereTimeoutError) {
    console.error("Request timed out");
  } else if (error instanceof CohereError) {
    console.error(`API Error [${error.statusCode}]: ${error.message}`);
    console.error("Body:", error.body);
  } else {
    throw error; // Re-throw unknown errors
  }
}

Why good: Specific error types with status codes, re-throws unexpected errors, timeout handled separately

See: examples/core.md for production error handling patterns

</patterns>

Performance Optimization

Model Selection for Cost/Speed

General purpose (best)      -> command-a-03-2025 (256K context, strongest)
Reasoning tasks             -> command-a-reasoning-08-2025 (multi-step reasoning)
Vision/document analysis    -> command-a-vision-07-2025 (images, charts, OCR)
Translation                 -> command-a-translate-08-2025 (23 languages)
Lightweight / edge          -> command-r7b-12-2024 (7B, fast, 128K context)
Legacy (still supported)    -> command-r-08-2024, command-r-plus-08-2024
Embeddings (best)           -> embed-v4.0 (multimodal, 128K context, flexible dims)
Embeddings (English)        -> embed-english-v3.0 (1024 dims)
Embeddings (multilingual)   -> embed-multilingual-v3.0 (23 languages)
Rerank (quality)            -> rerank-v4.0-pro (32K context, multilingual)
Rerank (speed)              -> rerank-v4.0-fast (32K context, latency-optimized)

Key Optimization Patterns

Batch embeddings -- pass up to 96 texts per
```
embed()
```
call instead of calling per-document
Use
topN
in rerank -- limit results to reduce response size and cost
Use
outputDimension
with embed-v4 -- reduce dimensions (256/512/1024) for faster similarity search at minimal quality loss
Check
finish_reason === "MAX_TOKENS"
-- detect truncated output
Use
temperature: 0
for deterministic output (enables caching)
Use embed-v4
int8
/
binary
types for compressed storage with minimal quality loss

</performance>

<decision_framework>

Decision Framework

Which Client Class to Use

New project?
+-- YES -> CohereClientV2 (always)
+-- Existing V1 code?
    +-- Working fine? -> Keep CohereClient but plan migration
    +-- Need V2 features? -> Migrate to CohereClientV2

Which Model to Choose

What is your task?
+-- General chat/generation -> command-a-03-2025 (most capable)
+-- Reasoning / multi-step -> command-a-reasoning-08-2025
+-- Image/document analysis -> command-a-vision-07-2025
+-- Translation -> command-a-translate-08-2025
+-- Lightweight / low latency -> command-r7b-12-2024
+-- Embeddings -> embed-v4.0 (or embed-english-v3.0 for English-only)
+-- Rerank quality -> rerank-v4.0-pro
+-- Rerank speed -> rerank-v4.0-fast

Embed

inputType

Selection

What are you embedding?
+-- Documents for a search index -> "search_document"
+-- Search queries against an index -> "search_query"
+-- Text for a classifier -> "classification"
+-- Text for clustering -> "clustering"
+-- Images -> "image" (embed-v4+ only)

When to Use Rerank

Do you have search results to re-order?
+-- YES -> Use rerank as a second-stage ranker
|   +-- Quality matters most? -> rerank-v4.0-pro
|   +-- Latency matters most? -> rerank-v4.0-fast
+-- NO -> Not applicable (rerank needs existing results to score)

RAG Approach

Do you need grounded answers with citations?
+-- YES -> Pass documents to chat()
|   +-- Have pre-retrieved documents? -> Pass directly via documents param
|   +-- Need retrieval first? -> Use embed + vector search + rerank pipeline, then pass top results to chat()
+-- NO -> Use plain chat without documents

</decision_framework>

<red_flags>

RED FLAGS

High Priority Issues:

Using
```
CohereClient
```
instead of
```
CohereClientV2
```
for new code (V1 is legacy)
Missing
```
model
```
parameter in V2 API calls (required on every call, unlike V1)
Using wrong
```
inputType
```
for embeddings (
```
search_query
```
for documents or vice versa -- silently degrades results)
Hardcoding API keys instead of using environment variables
Not appending the full assistant message (with
```
tool_calls
```
) before appending tool results in the tool use loop

Medium Priority Issues:

Not specifying
```
embeddingTypes
```
(defaults may not match your storage format)
Ignoring
```
finish_reason: "MAX_TOKENS"
```
(output was silently truncated)
Not handling
```
CohereTimeoutError
```
separately from
```
CohereError
```
Processing all stream events without checking
```
type
```
(only
```
content-delta
```
has text)
Using V1 parameter names (
```
preamble
```
,
```
connectors
```
,
```
conversation_id
```
) with V2 client

Common Mistakes:

Accessing
```
response.text
```
instead of
```
response.message.content[0].text
```
(V2 response shape changed)
Forgetting that
```
embeddingTypes
```
is required in V2 Embed API
Not matching
```
tool_call_id
```
when submitting tool results (model cannot correlate results)
Using
```
documents
```
with string values instead of
```
{ data: { text: "..." } }
```
objects in V2
Expecting
```
response.message.citations
```
to exist when no documents were provided (citations only appear with grounded responses)

Gotchas & Edge Cases:

The SDK is in beta -- pin your
```
cohere-ai
```
version in package.json to avoid breaking changes
V2 API is NOT yet supported for cloud deployments (Bedrock, SageMaker, Azure, OCI) -- use V1 client for cloud platforms
```
inputType
```
is camelCase in TypeScript SDK (
```
inputType
```
) but snake_case in the REST API (
```
input_type
```
)
Embed API accepts max 96 texts per call -- batch larger sets yourself
```
embed-v4.0
```
supports
```
outputDimension
```
for flexible sizing (256, 512, 1024, 1536) but v3 models have fixed dimensions
Rerank
```
relevanceScore
```
is normalized 0-1 but not calibrated across queries -- compare scores within a single query only
Stream events include
```
tool-plan-delta
```
before
```
tool-call-start
```
-- the model's reasoning about which tool to call
V2 uses
```
system
```
role for instructions (V1 used
```
preamble
```
parameter)
Citation
```
sources
```
in tool use responses reference
```
tool_call_id
```
values, not document indices
The
```
clientName
```
constructor parameter is for logging/analytics, not authentication

</red_flags>

<critical_reminders>

CRITICAL REMINDERS