Claude-code-plugins-plus-skills perplexity-prod-checklist
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/perplexity-pack/skills/perplexity-prod-checklist" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-perplexity-prod-checklist && rm -rf "$T"
manifest:
plugins/saas-packs/perplexity-pack/skills/perplexity-prod-checklist/SKILL.mdsource content
Perplexity Production Checklist
Overview
Complete checklist for deploying Perplexity Sonar API integrations to production. Perplexity-specific concerns: every API call performs a live web search (variable latency), citations link to third-party sites (must validate), and costs scale per-request plus per-token.
Prerequisites
- Staging environment tested
- Production API key generated (separate from dev/staging)
- Monitoring configured
- Cost budget defined
Production Readiness Checklist
API Configuration
- Production
in secret manager (not env file)PERPLEXITY_API_KEY - Key starts with
and has credits loadedpplx- - Separate API keys for dev/staging/prod
- Base URL is
(not localhost/proxy)https://api.perplexity.ai - Model selection configured:
for fast,sonar
for deepsonar-pro
Code Quality
- All search calls wrapped in retry with exponential backoff
- Rate limiting implemented (50 RPM default)
- Query sanitization strips PII before sending to Perplexity
- Citations parsed from response (not extracted from text)
-
set on all requests (prevents runaway costs)max_tokens - Timeouts configured: 15s for sonar, 30s for sonar-pro
- Error handling covers 401, 402, 429, 500+ status codes
- No hardcoded API keys in source code
Performance
- Result caching implemented for repeated queries
- Cache TTL appropriate: 30min for news, 4hrs for research, 24hrs for facts
- Streaming enabled for user-facing search (reduces perceived latency)
- Request queue prevents burst overload
-
used where appropriate (reduces search time)search_domain_filter
Monitoring
- Latency tracked per model (sonar ~2s, sonar-pro ~5s, deep-research ~30s)
- Error rate monitored (alert on >5% failure rate)
- Token usage tracked for cost projection
- Citation count per response logged (quality signal)
- 429 rate limit errors tracked with alert
Cost Controls
- Monthly budget cap set on API key
- Model routing: simple queries to
, complex tosonarsonar-pro -
capped per endpointmax_tokens - Cache hit rate monitored (target >30%)
- Cost per query tracked by model
Graceful Degradation
async function searchWithFallback(query: string) { try { // Primary: sonar-pro for deep answers return await perplexity.chat.completions.create({ model: "sonar-pro", messages: [{ role: "user", content: query }], max_tokens: 2048, }); } catch (err: any) { if (err.status === 429 || err.status >= 500) { // Fallback: sonar for faster, cheaper response return await perplexity.chat.completions.create({ model: "sonar", messages: [{ role: "user", content: query }], max_tokens: 512, }); } throw err; } }
Health Check Endpoint
app.get("/health/perplexity", async (req, res) => { const start = Date.now(); try { const response = await perplexity.chat.completions.create({ model: "sonar", messages: [{ role: "user", content: "ping" }], max_tokens: 5, }); res.json({ status: "healthy", latencyMs: Date.now() - start, model: response.model, }); } catch (err: any) { res.status(503).json({ status: "unhealthy", error: err.status || err.message, latencyMs: Date.now() - start, }); } });
Alerting Rules
| Alert | Condition | Severity |
|---|---|---|
| API Unreachable | Health check fails 3x | P1 |
| High Error Rate | 429/5xx > 5% over 5min | P2 |
| High Latency | p95 > 15s for sonar | P2 |
| Budget Exceeded | Monthly cost > 80% cap | P2 |
| Auth Failure | Any 401/402 error | P1 |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Variable latency | Web search per request | Set appropriate timeouts per model |
| Broken citations | Source pages changed | Validate citation URLs before displaying |
| Cost overrun | No model routing | Route simple queries to sonar |
| Rate limit spikes | Burst traffic | Queue requests with p-queue |
Output
- Production-ready Perplexity integration with all checks passing
- Health check endpoint for monitoring
- Graceful degradation from sonar-pro to sonar
- Alerting rules configured
Resources
Next Steps
For version upgrades, see
perplexity-upgrade-migration.