Claude-code-plugins-plus-skills cohere-migration-deep-dive
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/cohere-pack/skills/cohere-migration-deep-dive" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-cohere-migration-deep-dive && rm -rf "$T"
manifest:
plugins/saas-packs/cohere-pack/skills/cohere-migration-deep-dive/SKILL.mdsource content
Cohere Migration Deep Dive
Overview
Comprehensive guide for migrating to Cohere from OpenAI, Anthropic, or other LLM providers, including embedding re-vectorization, prompt adaptation, and gradual traffic shifting.
Prerequisites
- Current LLM integration documented
- Cohere API key and SDK installed
- Feature flag infrastructure
- Rollback strategy
Migration Types
| From | Complexity | Duration | Key Challenge |
|---|---|---|---|
| OpenAI → Cohere | Medium | 1-2 weeks | Prompt adaptation, embedding migration |
| Anthropic → Cohere | Medium | 1-2 weeks | Message format, tool definitions |
| Custom/OSS → Cohere | Low | Days | SDK integration |
| Embedding migration | High | 2-4 weeks | Re-vectorize entire corpus |
Instructions
Step 1: OpenAI to Cohere Chat Migration
// --- OpenAI (before) --- import OpenAI from 'openai'; const openai = new OpenAI(); const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: 'You are helpful.' }, { role: 'user', content: 'Hello' }, ], max_tokens: 500, temperature: 0.7, }); const text = response.choices[0].message.content; // --- Cohere (after) --- import { CohereClientV2 } from 'cohere-ai'; const cohere = new CohereClientV2(); const response = await cohere.chat({ model: 'command-a-03-2025', // GPT-4o equivalent messages: [ { role: 'system', content: 'You are helpful.' }, // Same format! { role: 'user', content: 'Hello' }, ], maxTokens: 500, // camelCase, not snake_case temperature: 0.7, }); const text = response.message?.content?.[0]?.text; // Different response shape
Step 2: Embedding Migration
// OpenAI embeddings: 3072 dims (text-embedding-3-large) // Cohere embeddings: 1024 dims (embed-v4.0) // IMPORTANT: You CANNOT mix embeddings from different models in the same vector DB // Migration plan: // 1. Create new vector collection with Cohere dimensions // 2. Re-embed all documents with Cohere // 3. Switch queries to new collection // 4. Delete old collection async function migrateEmbeddings( documents: Array<{ id: string; text: string }>, batchSize = 96 ) { const cohere = new CohereClientV2(); let processed = 0; for (let i = 0; i < documents.length; i += batchSize) { const batch = documents.slice(i, i + batchSize); const response = await cohere.embed({ model: 'embed-v4.0', texts: batch.map(d => d.text), inputType: 'search_document', embeddingTypes: ['float'], }); // Upsert to new vector collection for (let j = 0; j < batch.length; j++) { await vectorDB.upsert({ collection: 'docs-cohere', // New collection id: batch[j].id, vector: response.embeddings.float[j], metadata: { text: batch[j].text }, }); } processed += batch.length; console.log(`Migrated ${processed}/${documents.length} embeddings`); } }
Step 3: Tool Use Migration
// --- OpenAI tools --- const openaiTools = [{ type: 'function', function: { name: 'get_weather', description: 'Get weather', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'], }, }, }]; // --- Cohere tools (same format in v2!) --- const cohereTools = [{ type: 'function', function: { name: 'get_weather', description: 'Get weather', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'], }, }, }]; // Tool definitions are identical! The difference is in response handling. // OpenAI: response.choices[0].message.tool_calls // Cohere: response.message?.toolCalls
Step 4: Streaming Migration
// --- OpenAI streaming --- const openaiStream = await openai.chat.completions.create({ model: 'gpt-4o', messages: [...], stream: true, }); for await (const chunk of openaiStream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ''); } // --- Cohere streaming --- const cohereStream = await cohere.chatStream({ model: 'command-a-03-2025', messages: [...], }); for await (const event of cohereStream) { if (event.type === 'content-delta') { process.stdout.write(event.delta?.message?.content?.text ?? ''); } }
Step 5: Adapter Pattern for Gradual Migration
interface LLMAdapter { chat(message: string, options?: { system?: string; maxTokens?: number }): Promise<string>; embed(texts: string[]): Promise<number[][]>; rerank(query: string, docs: string[], topN?: number): Promise<Array<{ index: number; score: number }>>; } class CohereAdapter implements LLMAdapter { private client = new CohereClientV2(); async chat(message: string, options?: { system?: string; maxTokens?: number }): Promise<string> { const messages: any[] = []; if (options?.system) messages.push({ role: 'system', content: options.system }); messages.push({ role: 'user', content: message }); const response = await this.client.chat({ model: 'command-a-03-2025', messages, maxTokens: options?.maxTokens, }); return response.message?.content?.[0]?.text ?? ''; } async embed(texts: string[]): Promise<number[][]> { const response = await this.client.embed({ model: 'embed-v4.0', texts, inputType: 'search_document', embeddingTypes: ['float'], }); return response.embeddings.float; } async rerank(query: string, docs: string[], topN = 5): Promise<Array<{ index: number; score: number }>> { const response = await this.client.rerank({ model: 'rerank-v3.5', query, documents: docs, topN, }); return response.results.map(r => ({ index: r.index, score: r.relevanceScore })); } } class OpenAIAdapter implements LLMAdapter { // ... OpenAI implementation } // Traffic splitting via feature flag function getLLMAdapter(): LLMAdapter { const coherePercentage = getFeatureFlag('cohere_migration_pct'); // 0-100 if (Math.random() * 100 < coherePercentage) { return new CohereAdapter(); } return new OpenAIAdapter(); }
Step 6: Validation and Comparison
async function compareOutputs(message: string): Promise<{ openai: string; cohere: string; latencyMs: { openai: number; cohere: number }; }> { const startOpenAI = Date.now(); const openaiResult = await openaiAdapter.chat(message); const openaiLatency = Date.now() - startOpenAI; const startCohere = Date.now(); const cohereResult = await cohereAdapter.chat(message); const cohereLatency = Date.now() - startCohere; return { openai: openaiResult, cohere: cohereResult, latencyMs: { openai: openaiLatency, cohere: cohereLatency }, }; } // Run comparison on sample queries during migration const testQueries = ['Summarize this text', 'Translate to French', 'Extract key points']; for (const q of testQueries) { const result = await compareOutputs(q); console.log(`Query: ${q}`); console.log(`OpenAI (${result.latencyMs.openai}ms): ${result.openai.slice(0, 100)}`); console.log(`Cohere (${result.latencyMs.cohere}ms): ${result.cohere.slice(0, 100)}`); }
Cohere-Unique Features (Not in OpenAI)
| Feature | Cohere | OpenAI |
|---|---|---|
| Built-in Rerank | | Not available |
| RAG with citations | param + citations | Manual implementation |
| Connectors (data sources) | param | Not available |
| Classify endpoint | | Not available |
| Safety modes | param | Moderation API (separate) |
Rollback Plan
# Set feature flag to 0% Cohere traffic curl -X POST https://flagservice/flags/cohere_migration_pct -d '{"value": 0}' # Verify traffic is back on old provider # Monitor error rates for 15 minutes # If stable, migration is paused safely
Output
- Adapter layer abstracting LLM provider
- Embedding migration with batch processing
- A/B comparison for output quality validation
- Feature-flag controlled traffic shifting
- Rollback via feature flag (instant, no deploy)
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Embedding dimension mismatch | Mixed providers in same DB | Separate collections per provider |
| Response shape different | Provider-specific format | Use adapter pattern |
| Higher latency on Cohere | Different model size | Try command-r7b for speed |
| Quality difference | Different model strengths | Tune system prompts per provider |
Resources
Next Steps
For Cohere-specific architecture patterns, see
cohere-reference-architecture.