Claude-code-plugins-plus-skills exa-performance-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/exa-pack/skills/exa-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-exa-performance-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/exa-pack/skills/exa-performance-tuning/SKILL.mdsource content
Exa Performance Tuning
Overview
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.
Latency by Search Type
| Type | Typical Latency | Use Case |
|---|---|---|
| < 150ms | Real-time autocomplete, typeahead |
| p50 < 425ms | Speed-critical user-facing search |
| 300-1500ms | General purpose (default) |
| 500-2000ms | Best semantic quality |
| 2-5s | Maximum coverage, light deep search |
| 5-15s | Complex research questions |
Instructions
Step 1: Match Search Type to Latency Budget
import Exa from "exa-js"; const exa = new Exa(process.env.EXA_API_KEY); function selectSearchType(latencyBudgetMs: number) { if (latencyBudgetMs < 200) return "instant"; if (latencyBudgetMs < 500) return "fast"; if (latencyBudgetMs < 1500) return "auto"; if (latencyBudgetMs < 3000) return "neural"; return "deep"; } async function optimizedSearch(query: string, latencyBudgetMs: number) { const type = selectSearchType(latencyBudgetMs); const numResults = latencyBudgetMs < 500 ? 3 : latencyBudgetMs < 2000 ? 5 : 10; return exa.search(query, { type, numResults }); }
Step 2: Minimize Content Retrieval
// Each content option adds latency. Only request what you need. // Fastest: metadata only (no content retrieval) const metadataOnly = await exa.search("query", { numResults: 5 }); // Medium: highlights only (much smaller than full text) const highlightsOnly = await exa.searchAndContents("query", { numResults: 5, highlights: { maxCharacters: 300 }, // No text or summary — saves content retrieval time }); // Slower: full text (use maxCharacters to limit) const withText = await exa.searchAndContents("query", { numResults: 3, // fewer results = faster text: { maxCharacters: 1000 }, // limit content size });
Step 3: Cache Search Results
import { LRUCache } from "lru-cache"; const searchCache = new LRUCache<string, any>({ max: 5000, ttl: 2 * 3600 * 1000, // 2-hour TTL }); async function cachedSearch(query: string, opts: any) { const key = `${query}:${opts.type || "auto"}:${opts.numResults || 10}`; const cached = searchCache.get(key); if (cached) return cached; // Cache hit: 0ms vs 500-2000ms const results = await exa.search(query, opts); searchCache.set(key, results); return results; }
Step 4: Parallelize Independent Searches
// Run independent queries concurrently instead of sequentially async function parallelSearch(queries: string[]) { const searches = queries.map(q => cachedSearch(q, { type: "auto", numResults: 3 }) ); return Promise.all(searches); // 3 parallel searches: ~600ms total (limited by slowest) // 3 sequential searches: ~1800ms total }
Step 5: Two-Phase Search Pattern
// Phase 1: Fast search for URLs only // Phase 2: Selective content retrieval for top results only async function twoPhaseSearch(query: string) { // Phase 1: metadata only (fast) const results = await exa.search(query, { type: "auto", numResults: 10 }); // Phase 2: get content only for top 3 results const topUrls = results.results.slice(0, 3).map(r => r.url); const contents = await exa.getContents(topUrls, { text: { maxCharacters: 2000 }, highlights: { maxCharacters: 500, query }, }); return contents; // Saves content retrieval time for 7 results you won't use }
Step 6: Query Normalization for Cache Hits
function normalizeQuery(query: string): string { return query .toLowerCase() .trim() .replace(/\s+/g, " ") // collapse whitespace .replace(/[?.!,;:]+$/, ""); // strip trailing punctuation } async function normalizedSearch(query: string, opts: any) { return cachedSearch(normalizeQuery(query), opts); } // Increases cache hit rate by 20-40% for user-generated queries
Performance Comparison
| Strategy | Latency Savings | Implementation |
|---|---|---|
type | 5-10x faster than neural | One-line change |
| Reduce numResults (10 -> 3) | ~200-500ms saved | One-line change |
| Highlights instead of text | ~100-300ms saved | Replace with |
| LRU cache | 100% for cache hits | ~20 lines |
| Parallel queries | 2-3x throughput | wrapper |
| Two-phase search | ~30-50% for large result sets | ~15 lines |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Search taking 3s+ | Neural search on complex query | Switch to or type |
| Timeout on content | Large pages, slow sources | Set limit |
| Cache miss rate high | Unique queries each time | Normalize queries before caching |
| Rate limit (429) | Too many concurrent searches | Add request queue with concurrency limit |
Resources
Next Steps
For cost optimization, see
exa-cost-tuning. For reliability, see exa-reliability-patterns.