Claude-code-plugins groq-prod-checklist
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/groq-pack/skills/groq-prod-checklist" ~/.claude/skills/jeremylongshore-claude-code-plugins-groq-prod-checklist && rm -rf "$T"
manifest:
plugins/saas-packs/groq-pack/skills/groq-prod-checklist/SKILL.mdsource content
Groq Production Checklist
Overview
Complete pre-launch checklist for deploying Groq-powered applications to production. Covers API key security, model selection, rate limit planning, fallback strategies, and monitoring setup.
Prerequisites
- Staging environment tested with Groq API
- Groq Developer or Enterprise plan (free tier is not suitable for production)
- Production API key created in console.groq.com
- Monitoring and alerting infrastructure ready
Pre-Deployment Checklist
API Key & Auth
- Production API key stored in secret manager (not
files).env - Key is NOT shared with development or staging environments
- Key rotation procedure documented and tested
- Pre-commit hook blocks
pattern in codegsk_
Model Selection
- Production model chosen and tested (recommend
)llama-3.3-70b-versatile - Fallback model configured (
)llama-3.1-8b-instant - Deprecated model IDs removed (check deprecations)
-
set to actual expected output size (not context max)max_tokens
Rate Limit Planning
- Production rate limits known (check console.groq.com/settings/limits)
- Estimated peak RPM < 80% of limit
- Estimated peak TPM < 80% of limit
- Exponential backoff with
header implementedretry-after - Request queue for burst protection (
or similar)p-queue
Error Handling
- All Groq error types caught (
,Groq.APIError
)Groq.APIConnectionError - 429 errors retried with backoff
- 5xx errors retried with backoff
- 401 errors trigger alert (key may be revoked)
- Network timeouts configured (default 60s may be too long)
- Circuit breaker pattern for sustained failures
Fallback & Degradation
async function completionWithFallback(messages: any[]) { try { return await groq.chat.completions.create({ model: "llama-3.3-70b-versatile", messages, timeout: 15_000, }); } catch (err: any) { if (err.status === 429 || err.status >= 500) { console.warn("Groq primary failed, trying fallback model"); try { return await groq.chat.completions.create({ model: "llama-3.1-8b-instant", messages, timeout: 10_000, }); } catch { console.error("Groq fully unavailable, degrading gracefully"); return { choices: [{ message: { content: "Service temporarily unavailable. Please try again." } }] }; } } throw err; } }
Health Check Endpoint
// /api/health or /healthz export async function GET() { const checks: Record<string, any> = { status: "healthy" }; const start = performance.now(); try { await groq.chat.completions.create({ model: "llama-3.1-8b-instant", messages: [{ role: "user", content: "OK" }], max_tokens: 1, temperature: 0, }); checks.groq = { status: "connected", latencyMs: Math.round(performance.now() - start) }; } catch (err: any) { checks.status = "degraded"; checks.groq = { status: "error", error: err.status || err.message }; } return Response.json(checks, { status: checks.status === "healthy" ? 200 : 503 }); }
Monitoring Setup
- Latency histogram (p50, p95, p99)
- Token throughput counter (tokens/sec by model)
- Error rate by status code (429, 5xx)
- Rate limit remaining gauge (from response headers)
- Cost tracking (tokens * price per million)
- Alert: latency p95 > 1s (Groq normally < 200ms)
- Alert: error rate > 5%
- Alert: rate limit remaining < 10%
Spending Controls
- Monthly spending cap set in Groq Console
- Budget alerts at 50%, 80%, 95%
- Auto-pause enabled when cap is reached
Documentation
- Incident runbook created (see
)groq-incident-runbook - Key rotation SOP documented
- On-call knows how to check status.groq.com
- Rollback procedure tested
Go-Live Verification
set -euo pipefail # Pre-flight checks echo "1. Groq API status..." curl -sf https://status.groq.com > /dev/null && echo "OK" || echo "ISSUE" echo "2. Production key valid..." curl -sf https://api.groq.com/openai/v1/models \ -H "Authorization: Bearer $GROQ_API_KEY_PROD" | jq '.data | length' echo "3. Health endpoint..." curl -sf https://your-app.com/api/health | jq . echo "4. Rate limit headroom..." curl -si https://api.groq.com/openai/v1/chat/completions \ -H "Authorization: Bearer $GROQ_API_KEY_PROD" \ -H "Content-Type: application/json" \ -d '{"model":"llama-3.1-8b-instant","messages":[{"role":"user","content":"ping"}],"max_tokens":1}' \ 2>/dev/null | grep -i "x-ratelimit-remaining"
Error Handling
| Alert | Condition | Severity |
|---|---|---|
| API errors spike | 5xx rate > 5/min | P1 |
| Latency degraded | p95 > 1000ms | P2 |
| Rate limited | 429 count > 5/min | P2 |
| Auth failure | Any 401 error | P1 |
| Spending near cap | >90% of monthly budget | P3 |
Resources
Next Steps
For version upgrades, see
groq-upgrade-migration.