Skillshub anth-architecture-variants

install

source · Clone the upstream repo

git clone https://github.com/ComeOnOliver/skillshub

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/anth-architecture-variants" ~/.claude/skills/comeonoliver-skillshub-anth-architecture-variants && rm -rf "$T"

manifest: skills/jeremylongshore/claude-code-plugins-plus-skills/anth-architecture-variants/SKILL.md

Anthropic Architecture Variants

Overview

Four validated architecture patterns for Claude API integrations at different scales and use cases.

Variant 1: Serverless (AWS Lambda / Cloud Functions)

# Best for: < 100 RPM, event-driven, pay-per-invocation
# lambda_function.py
import anthropic
import json

def handler(event, context):
    client = anthropic.Anthropic()  # Key from Lambda env var

    body = json.loads(event["body"])
    msg = client.messages.create(
        model="claude-haiku-4-20250514",  # Haiku for Lambda speed
        max_tokens=512,
        messages=[{"role": "user", "content": body["prompt"]}]
    )

    return {
        "statusCode": 200,
        "body": json.dumps({
            "text": msg.content[0].text,
            "tokens": msg.usage.input_tokens + msg.usage.output_tokens
        })
    }

Trade-offs: Cold starts add 1-3s. Lambda timeout (15min) limits long generations. No connection pooling between invocations.

Variant 2: Streaming Microservice (FastAPI + WebSocket)

# Best for: chatbots, interactive UIs, real-time responses
from fastapi import FastAPI, WebSocket
import anthropic

app = FastAPI()
client = anthropic.Anthropic()

@app.websocket("/chat")
async def chat_ws(websocket: WebSocket):
    await websocket.accept()
    while True:
        prompt = await websocket.receive_text()
        with client.messages.stream(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{"role": "user", "content": prompt}]
        ) as stream:
            for text in stream.text_stream:
                await websocket.send_text(text)
            await websocket.send_text("[DONE]")

Variant 3: Queue-Based Pipeline (Celery / Cloud Tasks)

# Best for: batch processing, async workflows, high volume
from celery import Celery
import anthropic

app = Celery("tasks", broker="redis://localhost")

@app.task(bind=True, max_retries=3, default_retry_delay=30)
def process_document(self, doc_id: str, content: str):
    try:
        client = anthropic.Anthropic()
        msg = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{"role": "user", "content": f"Summarize:\n\n{content}"}]
        )
        save_result(doc_id, msg.content[0].text)
    except anthropic.RateLimitError as e:
        self.retry(exc=e, countdown=int(e.response.headers.get("retry-after", 30)))

Variant 4: Multi-Model Orchestrator

# Best for: complex workflows needing different model strengths
class ClaudeOrchestrator:
    def __init__(self):
        self.client = anthropic.Anthropic()

    def classify_then_respond(self, user_input: str) -> str:
        # Step 1: Classify intent with Haiku (fast, cheap)
        classification = self.client.messages.create(
            model="claude-haiku-4-20250514",
            max_tokens=32,
            messages=[{
                "role": "user",
                "content": f"Classify as: question|task|creative|code\nInput: {user_input[:200]}"
            }]
        )
        intent = classification.content[0].text.strip().lower()

        # Step 2: Route to optimal model
        model = {
            "question": "claude-haiku-4-20250514",
            "task": "claude-sonnet-4-20250514",
            "creative": "claude-sonnet-4-20250514",
            "code": "claude-sonnet-4-20250514",
        }.get(intent, "claude-sonnet-4-20250514")

        # Step 3: Generate response
        msg = self.client.messages.create(
            model=model,
            max_tokens=4096,
            messages=[{"role": "user", "content": user_input}]
        )
        return msg.content[0].text

Architecture Selection Guide

Factor	Serverless	Microservice	Queue-Based	Orchestrator
Latency	High (cold start)	Low (streaming)	N/A (async)	Medium
Volume	Low (<100 RPM)	Medium	High	Medium
Cost	Pay-per-use	Fixed infra	Batch savings	Optimized per-task
Complexity	Low	Medium	Medium	High
Best for	APIs, triggers	Chatbots	ETL, processing	Complex workflows

Resources

Next Steps

For common pitfalls, see

anth-known-pitfalls