Skillshub exa-architecture-variants

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/exa-architecture-variants" ~/.claude/skills/comeonoliver-skillshub-exa-architecture-variants && rm -rf "$T"
manifest: skills/jeremylongshore/claude-code-plugins-plus-skills/exa-architecture-variants/SKILL.md
source content

Exa Architecture Variants

Overview

Three deployment architectures for Exa neural search at different scales. Each uses real Exa SDK methods:

search
,
searchAndContents
,
findSimilar
,
getContents
, and
answer
.

Decision Matrix

FactorDirect SearchCached SearchRAG Pipeline
Volume< 1K/day1K-50K/dayAny volume
Latency500-2000ms~50ms (cached)3-8s total
Use CaseSimple search UIContent aggregationAI answers with citations
ComplexityLowMediumHigh
Cache RequiredNoYes (Redis/LRU)Yes
Exa Methods
searchAndContents
searchAndContents
+ cache
All methods

Instructions

Variant 1: Direct Search Integration

Best for: Adding search to an existing app, < 1K queries/day.

import Exa from "exa-js";
import express from "express";

const app = express();
const exa = new Exa(process.env.EXA_API_KEY);

// Simple search endpoint
app.get("/api/search", async (req, res) => {
  const query = req.query.q as string;
  if (!query) return res.status(400).json({ error: "q required" });

  try {
    const results = await exa.searchAndContents(query, {
      type: "auto",
      numResults: 5,
      text: { maxCharacters: 500 },
      highlights: { maxCharacters: 300, query },
    });

    res.json(results.results.map(r => ({
      title: r.title,
      url: r.url,
      snippet: r.highlights?.join(" ") || r.text?.substring(0, 200),
      score: r.score,
    })));
  } catch (err: any) {
    res.status(err.status || 500).json({ error: err.message });
  }
});

Variant 2: Cached Search with Category Profiles

Best for: High-traffic search, 1K-50K queries/day, content discovery.

import Exa from "exa-js";
import { LRUCache } from "lru-cache";

const exa = new Exa(process.env.EXA_API_KEY);
const cache = new LRUCache<string, any>({ max: 5000, ttl: 3600 * 1000 });

const PROFILES = {
  news: {
    type: "auto" as const,
    category: "news" as const,
    numResults: 10,
    text: { maxCharacters: 500 },
  },
  research: {
    type: "neural" as const,
    category: "research paper" as const,
    numResults: 10,
    text: { maxCharacters: 2000 },
    highlights: { maxCharacters: 500 },
  },
  companies: {
    type: "auto" as const,
    category: "company" as const,
    numResults: 10,
    text: { maxCharacters: 500 },
  },
};

async function cachedProfileSearch(
  query: string,
  profile: keyof typeof PROFILES
) {
  const key = `${query.toLowerCase()}:${profile}`;
  const cached = cache.get(key);
  if (cached) return cached;

  const results = await exa.searchAndContents(query, PROFILES[profile]);
  cache.set(key, results);
  return results;
}

Variant 3: Full RAG Pipeline

Best for: AI-powered answers, research agents, 50K+ queries/day.

import Exa from "exa-js";
import { LRUCache } from "lru-cache";

const exa = new Exa(process.env.EXA_API_KEY);
const contextCache = new LRUCache<string, any>({ max: 10000, ttl: 7200 * 1000 });

class ExaRAGPipeline {
  // Phase 1: Search for relevant sources
  async gatherContext(question: string, maxSources = 5) {
    const cacheKey = question.toLowerCase().trim();
    const cached = contextCache.get(cacheKey);
    if (cached) return cached;

    const results = await exa.searchAndContents(question, {
      type: "neural",
      numResults: maxSources,
      text: { maxCharacters: 2000 },
      highlights: { maxCharacters: 500, query: question },
    });

    contextCache.set(cacheKey, results);
    return results;
  }

  // Phase 2: Expand with similar content
  async expandContext(topResultUrl: string, numSimilar = 3) {
    return exa.findSimilarAndContents(topResultUrl, {
      numResults: numSimilar,
      text: { maxCharacters: 1500 },
      excludeSourceDomain: true,
    });
  }

  // Phase 3: Format for LLM context injection
  formatForLLM(results: any[]) {
    return results.map((r, i) =>
      `[Source ${i + 1}] ${r.title}\n` +
      `URL: ${r.url}\n` +
      `Content: ${r.text}\n` +
      `Key points: ${r.highlights?.join(" | ") || "N/A"}`
    ).join("\n\n---\n\n");
  }

  // Phase 4: Use Exa's built-in answer endpoint
  async getAnswer(question: string) {
    const answer = await exa.answer(question, { text: true });
    return {
      answer: answer.answer,
      sources: answer.results.map(r => ({
        title: r.title,
        url: r.url,
      })),
    };
  }

  // Full pipeline
  async research(question: string) {
    const context = await this.gatherContext(question, 5);

    // Expand with similar content from top result
    let expanded = { results: [] as any[] };
    if (context.results[0]?.url) {
      expanded = await this.expandContext(context.results[0].url);
    }

    const allResults = [...context.results, ...expanded.results];
    const llmContext = this.formatForLLM(allResults);

    return {
      context: llmContext,
      sourceCount: allResults.length,
      sources: allResults.map(r => ({ title: r.title, url: r.url, score: r.score })),
    };
  }
}

Scaling Notes

Architecture10 QPS Limit Strategy
DirectNatural limit: ~864K searches/day at full rate
Cached50% cache hit = ~1.7M effective searches/day
RAG Pipeline2-3 API calls per question; cache aggressively

Error Handling

IssueCauseSolution
Slow search in UINo cachingAdd LRU or Redis cache
Stale cached resultsLong TTLReduce TTL for time-sensitive profiles
RAG hallucinationPoor source selectionUse highlights, increase numResults
High API costsNo query deduplicationCache layer deduplicates identical queries

Resources

Next Steps

For reference architecture details, see

exa-reference-architecture
.