Skillshub clade-architecture-variants
git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/clade-architecture-variants" ~/.claude/skills/comeonoliver-skillshub-clade-architecture-variants && rm -rf "$T"
skills/jeremylongshore/claude-code-plugins-plus-skills/clade-architecture-variants/SKILL.mdClaude Architecture Variants
Overview
Five architecture patterns for Claude-powered applications: Chatbot (stateless API wrapper), RAG (retrieval-augmented generation with vector search), Agent (tool use loop), Content Pipeline (batch processing), and Evaluation (using Claude as a judge). Each includes complete code and a comparison table.
1. Chatbot (Stateless API Wrapper)
Simplest pattern — proxy Claude with a system prompt.
// api/chat.ts export async function POST(req: Request) { const { messages } = await req.json(); const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, system: 'You are a helpful assistant for our SaaS product.', messages, stream: true, }); return new Response(response.toReadableStream()); }
Best for: Customer support, Q&A, simple conversational interfaces.
2. RAG (Retrieval-Augmented Generation)
Fetch relevant context, inject into prompt, generate grounded answer.
async function ragQuery(question: string) { // 1. Embed the question (use Voyage, OpenAI, or Cohere — not Anthropic) const embedding = await embeddingClient.embed(question); // 2. Search vector DB for relevant chunks const chunks = await vectorDb.query(embedding, { topK: 5 }); // 3. Send to Claude with context const message = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, system: `Answer based on the provided context. If the context doesn't contain the answer, say so.`, messages: [{ role: 'user', content: `Context:\n${chunks.map(c => c.text).join('\n---\n')}\n\nQuestion: ${question}`, }], }); return message.content[0].text; }
Best for: Documentation Q&A, knowledge bases, support with source citations.
3. Agent (Tool Use Loop)
Claude decides which tools to call, you execute them, loop until done.
async function agentLoop(userInput: string, tools: Anthropic.Tool[]) { let messages: MessageParam[] = [{ role: 'user', content: userInput }]; while (true) { const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, tools, messages, }); messages.push({ role: 'assistant', content: response.content }); if (response.stop_reason === 'end_turn') { return response.content.find(b => b.type === 'text')?.text; } // Execute tools const results = []; for (const block of response.content) { if (block.type === 'tool_use') { const result = await executeTool(block.name, block.input); results.push({ type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result) }); } } messages.push({ role: 'user', content: results }); } }
Best for: Data analysis, code generation, multi-step workflows.
4. Content Pipeline (Batch Processing)
Process thousands of documents through Claude asynchronously.
const batch = await client.messages.batches.create({ requests: documents.map((doc, i) => ({ custom_id: doc.id, params: { model: 'claude-haiku-4-5-20251001', // Cheap for bulk max_tokens: 512, messages: [{ role: 'user', content: `Extract entities: ${doc.text}` }], }, })), }); // 50% cheaper, processes within 24h
Best for: Summarization, classification, extraction at scale.
5. Evaluation / Grading
Use Claude to evaluate other AI outputs or human content.
const evaluation = await client.messages.create({ model: 'claude-opus-4-20250514', // Best judgment max_tokens: 1024, system: `You are an expert evaluator. Score the response 1-5 on accuracy, relevance, and completeness. Return JSON: { "accuracy": N, "relevance": N, "completeness": N, "reasoning": "..." }`, messages: [{ role: 'user', content: `Question: ${question}\nResponse to evaluate: ${candidateResponse}`, }], });
Best for: AI output quality, content moderation, automated grading.
Choosing a Pattern
| Pattern | Latency | Cost | Complexity |
|---|---|---|---|
| Chatbot | Low (streaming) | Low | Simple |
| RAG | Medium (embed + search + generate) | Medium | Medium |
| Agent | High (multi-turn) | High | Complex |
| Pipeline | High (async batch) | Low (50% off) | Simple |
| Evaluation | Medium | Varies | Simple |
Output
- Architecture pattern selected based on requirements
- Implementation code for chosen pattern
- Cost and latency characteristics understood
- Scaling strategy identified (streaming for chatbots, batches for pipelines)
Error Handling
| Error | Cause | Solution |
|---|---|---|
| API Error | Check error type and status code | See |
Examples
See five numbered pattern sections with complete TypeScript code, and the Choosing a Pattern comparison table with latency, cost, and complexity ratings.
Resources
Next Steps
See
clade-known-pitfalls for common mistakes.
Prerequisites
- Completed
andclade-install-authclade-model-inference - Understanding of your use case requirements (latency, cost, complexity)
- For RAG: vector database and embedding model (Voyage, OpenAI, or Cohere)
Instructions
Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.
Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.
Step 3: Verify
Run your test suite to confirm the integration works correctly.