Claude-code-plugins-plus-skills cohere-migration-deep-dive

install

source · Clone the upstream repo

git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/cohere-pack/skills/cohere-migration-deep-dive" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-cohere-migration-deep-dive && rm -rf "$T"

manifest: plugins/saas-packs/cohere-pack/skills/cohere-migration-deep-dive/SKILL.md

Cohere Migration Deep Dive

Overview

Comprehensive guide for migrating to Cohere from OpenAI, Anthropic, or other LLM providers, including embedding re-vectorization, prompt adaptation, and gradual traffic shifting.

Prerequisites

Current LLM integration documented
Cohere API key and SDK installed
Feature flag infrastructure
Rollback strategy

Migration Types

From	Complexity	Duration	Key Challenge
OpenAI → Cohere	Medium	1-2 weeks	Prompt adaptation, embedding migration
Anthropic → Cohere	Medium	1-2 weeks	Message format, tool definitions
Custom/OSS → Cohere	Low	Days	SDK integration
Embedding migration	High	2-4 weeks	Re-vectorize entire corpus

Instructions

Step 1: OpenAI to Cohere Chat Migration

// --- OpenAI (before) ---
import OpenAI from 'openai';
const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello' },
  ],
  max_tokens: 500,
  temperature: 0.7,
});
const text = response.choices[0].message.content;

// --- Cohere (after) ---
import { CohereClientV2 } from 'cohere-ai';
const cohere = new CohereClientV2();

const response = await cohere.chat({
  model: 'command-a-03-2025',   // GPT-4o equivalent
  messages: [
    { role: 'system', content: 'You are helpful.' },  // Same format!
    { role: 'user', content: 'Hello' },
  ],
  maxTokens: 500,               // camelCase, not snake_case
  temperature: 0.7,
});
const text = response.message?.content?.[0]?.text;  // Different response shape

Step 2: Embedding Migration

// OpenAI embeddings: 3072 dims (text-embedding-3-large)
// Cohere embeddings: 1024 dims (embed-v4.0)
// IMPORTANT: You CANNOT mix embeddings from different models in the same vector DB

// Migration plan:
// 1. Create new vector collection with Cohere dimensions
// 2. Re-embed all documents with Cohere
// 3. Switch queries to new collection
// 4. Delete old collection

async function migrateEmbeddings(
  documents: Array<{ id: string; text: string }>,
  batchSize = 96
) {
  const cohere = new CohereClientV2();
  let processed = 0;

  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);

    const response = await cohere.embed({
      model: 'embed-v4.0',
      texts: batch.map(d => d.text),
      inputType: 'search_document',
      embeddingTypes: ['float'],
    });

    // Upsert to new vector collection
    for (let j = 0; j < batch.length; j++) {
      await vectorDB.upsert({
        collection: 'docs-cohere', // New collection
        id: batch[j].id,
        vector: response.embeddings.float[j],
        metadata: { text: batch[j].text },
      });
    }

    processed += batch.length;
    console.log(`Migrated ${processed}/${documents.length} embeddings`);
  }
}

Step 3: Tool Use Migration

// --- OpenAI tools ---
const openaiTools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get weather',
    parameters: {
      type: 'object',
      properties: { city: { type: 'string' } },
      required: ['city'],
    },
  },
}];

// --- Cohere tools (same format in v2!) ---
const cohereTools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get weather',
    parameters: {
      type: 'object',
      properties: { city: { type: 'string' } },
      required: ['city'],
    },
  },
}];
// Tool definitions are identical! The difference is in response handling.

// OpenAI: response.choices[0].message.tool_calls
// Cohere: response.message?.toolCalls

Step 4: Streaming Migration

// --- OpenAI streaming ---
const openaiStream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  stream: true,
});
for await (const chunk of openaiStream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

// --- Cohere streaming ---
const cohereStream = await cohere.chatStream({
  model: 'command-a-03-2025',
  messages: [...],
});
for await (const event of cohereStream) {
  if (event.type === 'content-delta') {
    process.stdout.write(event.delta?.message?.content?.text ?? '');
  }
}

Step 5: Adapter Pattern for Gradual Migration

interface LLMAdapter {
  chat(message: string, options?: { system?: string; maxTokens?: number }): Promise<string>;
  embed(texts: string[]): Promise<number[][]>;
  rerank(query: string, docs: string[], topN?: number): Promise<Array<{ index: number; score: number }>>;
}

class CohereAdapter implements LLMAdapter {
  private client = new CohereClientV2();

  async chat(message: string, options?: { system?: string; maxTokens?: number }): Promise<string> {
    const messages: any[] = [];
    if (options?.system) messages.push({ role: 'system', content: options.system });
    messages.push({ role: 'user', content: message });

    const response = await this.client.chat({
      model: 'command-a-03-2025',
      messages,
      maxTokens: options?.maxTokens,
    });
    return response.message?.content?.[0]?.text ?? '';
  }

  async embed(texts: string[]): Promise<number[][]> {
    const response = await this.client.embed({
      model: 'embed-v4.0',
      texts,
      inputType: 'search_document',
      embeddingTypes: ['float'],
    });
    return response.embeddings.float;
  }

  async rerank(query: string, docs: string[], topN = 5): Promise<Array<{ index: number; score: number }>> {
    const response = await this.client.rerank({
      model: 'rerank-v3.5',
      query,
      documents: docs,
      topN,
    });
    return response.results.map(r => ({ index: r.index, score: r.relevanceScore }));
  }
}

class OpenAIAdapter implements LLMAdapter {
  // ... OpenAI implementation
}

// Traffic splitting via feature flag
function getLLMAdapter(): LLMAdapter {
  const coherePercentage = getFeatureFlag('cohere_migration_pct'); // 0-100
  if (Math.random() * 100 < coherePercentage) {
    return new CohereAdapter();
  }
  return new OpenAIAdapter();
}

Step 6: Validation and Comparison

async function compareOutputs(message: string): Promise<{
  openai: string;
  cohere: string;
  latencyMs: { openai: number; cohere: number };
}> {
  const startOpenAI = Date.now();
  const openaiResult = await openaiAdapter.chat(message);
  const openaiLatency = Date.now() - startOpenAI;

  const startCohere = Date.now();
  const cohereResult = await cohereAdapter.chat(message);
  const cohereLatency = Date.now() - startCohere;

  return {
    openai: openaiResult,
    cohere: cohereResult,
    latencyMs: { openai: openaiLatency, cohere: cohereLatency },
  };
}

// Run comparison on sample queries during migration
const testQueries = ['Summarize this text', 'Translate to French', 'Extract key points'];
for (const q of testQueries) {
  const result = await compareOutputs(q);
  console.log(`Query: ${q}`);
  console.log(`OpenAI (${result.latencyMs.openai}ms): ${result.openai.slice(0, 100)}`);
  console.log(`Cohere (${result.latencyMs.cohere}ms): ${result.cohere.slice(0, 100)}`);
}

Cohere-Unique Features (Not in OpenAI)

Feature	Cohere	OpenAI
Built-in Rerank	`cohere.rerank()`	Not available
RAG with citations	`documents` param + citations	Manual implementation
Connectors (data sources)	`connectors` param	Not available
Classify endpoint	`cohere.classify()`	Not available
Safety modes	`safetyMode` param	Moderation API (separate)

Rollback Plan

# Set feature flag to 0% Cohere traffic
curl -X POST https://flagservice/flags/cohere_migration_pct -d '{"value": 0}'

# Verify traffic is back on old provider
# Monitor error rates for 15 minutes
# If stable, migration is paused safely

Output

Adapter layer abstracting LLM provider
Embedding migration with batch processing
A/B comparison for output quality validation
Feature-flag controlled traffic shifting
Rollback via feature flag (instant, no deploy)

Error Handling

Issue	Cause	Solution
Embedding dimension mismatch	Mixed providers in same DB	Separate collections per provider
Response shape different	Provider-specific format	Use adapter pattern
Higher latency on Cohere	Different model size	Try command-r7b for speed
Quality difference	Different model strengths	Tune system prompts per provider

Resources

Next Steps

For Cohere-specific architecture patterns, see

cohere-reference-architecture