Claude-scaffold claude-api-patterns

Claude API Patterns

install
source · Clone the upstream repo
git clone https://github.com/pyramidheadshark/claude-scaffold
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/pyramidheadshark/claude-scaffold "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/claude-api-patterns" ~/.claude/skills/pyramidheadshark-claude-scaffold-claude-api-patterns && rm -rf "$T"
manifest: .claude/skills/claude-api-patterns/SKILL.md
source content

Claude API Patterns

When to Load This Skill

Load when working with: Anthropic SDK,

anthropic
package, Claude API, tool use, streaming responses, message batches,
MessageCreate
,
@anthropic-ai/sdk
.

SDK Setup

import anthropic

client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)

Never hardcode the API key. Always use environment variables validated at startup.

Basic Message

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text

System Prompt

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=system_prompt,
    messages=[{"role": "user", "content": user_message}],
)

Keep system prompts in separate

.txt
or
.md
files, not inline strings. Version them.

Streaming

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
) as stream:
    for text in stream.text_stream:
        yield text

Use streaming for: long responses, real-time UX, progress indication.

Tool Use (Function Calling)

from pydantic import BaseModel

class SearchInput(BaseModel):
    query: str
    max_results: int = 10

tools = [{
    "name": "search",
    "description": "Search for information",
    "input_schema": SearchInput.model_json_schema(),
}]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Find recent papers on RAG"}],
)

Always define tool schemas with Pydantic — never write raw JSON schemas by hand.

Tool Result Loop

messages = [{"role": "user", "content": user_message}]

while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=tools,
        messages=messages,
    )

    if response.stop_reason == "end_turn":
        break

    if response.stop_reason == "tool_use":
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = dispatch_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

Observability

Log every call:

  • model, max_tokens, stop_reason
  • input_tokens, output_tokens (from
    response.usage
    )
  • latency (wall time)
  • cost estimate (use token counts × price per token)
import time

start = time.monotonic()
response = client.messages.create(...)
latency_ms = (time.monotonic() - start) * 1000

logger.info("claude_call", extra={
    "model": response.model,
    "input_tokens": response.usage.input_tokens,
    "output_tokens": response.usage.output_tokens,
    "stop_reason": response.stop_reason,
    "latency_ms": round(latency_ms, 1),
})

Rate Limits and Retry

import anthropic

client = anthropic.Anthropic(
    max_retries=3,
)

The SDK retries on 429 and 529 by default. For production, add exponential backoff via

httpx
transport or a wrapper. Never swallow
anthropic.APIError
silently.

Model Selection

TaskModel
Code, architecture, tests
claude-sonnet-4-6
Complex reasoning, long docs
claude-opus-4-6
Fast classification, routing
claude-haiku-4-5-20251001

Routing must be explicit in code — never auto-escalate based on vague heuristics.

See Also

  • resources/streaming-patterns.md
    — advanced streaming with SSE + FastAPI
  • resources/eval-design.md
    — building evals for prompt changes