Skillshub cohere-prod-checklist
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/cohere-prod-checklist" ~/.claude/skills/comeonoliver-skillshub-cohere-prod-checklist && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/cohere-prod-checklist/SKILL.mdsource content
Cohere Production Checklist
Overview
Complete go-live checklist for deploying Cohere API v2 integrations to production with safety gates, health checks, and rollback procedures.
Prerequisites
- Staging environment tested and verified
- Production API key (not trial) from dashboard.cohere.com
- Deployment pipeline configured
- Monitoring and alerting ready
Checklist
API & Authentication
- Using production API key (not trial — trial is rate-limited to 20 calls/min)
-
stored in secret manager (Vault, AWS Secrets Manager, GCP Secret Manager)CO_API_KEY - Key rotation procedure documented and tested
- Billing alerts configured at dashboard.cohere.com
- Using API v2 endpoints (
, notCohereClientV2
)CohereClient
Code Quality
- All API calls specify
parameter explicitlymodel -
set for all Embed calls (required for v3+)embeddingTypes -
set for all Embed calls (required for v3+)inputType - Error handling catches
andCohereErrorCohereTimeoutError - Retry logic with exponential backoff for 429 and 5xx
- No hardcoded API keys in source code
- Request/response logging excludes API keys and PII
Model Selection
- Correct model IDs used (not deprecated names):
| Use Case | Recommended Model | Fallback |
|---|---|---|
| Chat/generation | | |
| Lightweight chat | | |
| Embeddings | | |
| Reranking | | |
Performance
- Embed calls batched (up to 96 texts per request)
- Rerank calls limited to 1000 documents per request
- Streaming enabled for user-facing chat (
)chatStream - Connection pooling / keep-alive configured
- Response caching for repeated embed/rerank queries
-
set to prevent runaway generation costsmaxTokens
Health Check Endpoint
// /api/health import { CohereClientV2, CohereError } from 'cohere-ai'; const cohere = new CohereClientV2(); export async function GET() { const start = Date.now(); let cohereStatus: 'healthy' | 'degraded' | 'down' = 'down'; try { // Cheapest possible health check — minimal chat await cohere.chat({ model: 'command-r7b-12-2024', messages: [{ role: 'user', content: 'ping' }], maxTokens: 1, }); cohereStatus = 'healthy'; } catch (err) { if (err instanceof CohereError && err.statusCode === 429) { cohereStatus = 'degraded'; // Rate limited but reachable } } return Response.json({ status: cohereStatus === 'healthy' ? 'ok' : 'degraded', cohere: { status: cohereStatus, latencyMs: Date.now() - start, }, timestamp: new Date().toISOString(), }); }
Circuit Breaker
class CohereCircuitBreaker { private failures = 0; private lastFailure = 0; private state: 'closed' | 'open' | 'half-open' = 'closed'; constructor( private threshold = 5, private resetMs = 60_000 ) {} async call<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> { if (this.state === 'open') { if (Date.now() - this.lastFailure > this.resetMs) { this.state = 'half-open'; } else if (fallback) { return fallback(); } else { throw new Error('Cohere circuit breaker is open'); } } try { const result = await fn(); this.failures = 0; this.state = 'closed'; return result; } catch (err) { this.failures++; this.lastFailure = Date.now(); if (this.failures >= this.threshold) { this.state = 'open'; console.error(`Cohere circuit breaker OPEN after ${this.failures} failures`); } throw err; } } } const breaker = new CohereCircuitBreaker();
Gradual Rollout
# Pre-flight curl -sf https://staging.example.com/api/health | jq '.cohere' curl -s https://status.cohere.com/api/v2/status.json | jq '.status' # Deploy with canary (10% traffic) kubectl apply -f k8s/production.yaml kubectl rollout pause deployment/app # Monitor for 10 minutes: error rate, latency, 429s # Check: No increase in CohereError rate # Check: P95 latency < 5s for chat, < 500ms for embed/rerank # Proceed to 100% kubectl rollout resume deployment/app kubectl rollout status deployment/app
Monitoring Alerts
| Alert | Condition | Severity |
|---|---|---|
| Cohere unreachable | Health check fails 3x | P1 |
| High error rate | 5xx > 5% of requests/5min | P1 |
| Rate limited | 429 > 10/min | P2 |
| High latency | Chat P95 > 10s | P2 |
| Auth failure | Any 401 response | P1 |
| Budget exceeded | Daily token cost > threshold | P2 |
Rollback
# Immediate rollback kubectl rollout undo deployment/app kubectl rollout status deployment/app # Verify rollback curl -sf https://api.example.com/api/health | jq '.cohere'
Output
- Production-ready Cohere integration with health checks
- Circuit breaker preventing cascade failures
- Monitoring alerts for Cohere-specific error conditions
- Documented rollback procedure
Resources
Next Steps
For version upgrades, see
cohere-upgrade-migration.