Skills api-ai-anthropic-sdk
Official Anthropic SDK patterns for TypeScript/Node.js — client setup, Messages API, streaming, tool use, vision, extended thinking, structured outputs, prompt caching, batch API, and production best practices
git clone https://github.com/agents-inc/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/api-ai-anthropic-sdk/skills/api-ai-anthropic-sdk" ~/.claude/skills/agents-inc-skills-api-ai-anthropic-sdk && rm -rf "$T"
dist/plugins/api-ai-anthropic-sdk/skills/api-ai-anthropic-sdk/SKILL.mdAnthropic SDK Patterns
Quick Guide: Use the official
package to interact with Claude models directly. Use@anthropic-ai/sdkfor single-turn and multi-turn conversations. Useclient.messages.create()for streaming with event-based consumption.client.messages.stream()is always required. Content blocks are typed unions (max_tokens,text,tool_use). Usethinkingwithclient.messages.parse()for structured outputs. Tool use requires a tool-result loop -- Claude returnszodOutputFormat()blocks, you execute the tool and send backtool_useblocks. Extended thinking addstool_resultcontent blocks before the response.thinking
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST always provide
in every max_tokens
/ messages.create()
call -- it is required and has no default)messages.stream()
(You MUST handle the
field to detect stop_reason
, end_turn
, max_tokens
, and tool_use
-- ignoring it causes silent truncation or broken tool loops)stop_sequence
(You MUST iterate over
blocks (not assume a single text block) -- responses can contain response.content
, text
, and tool_use
blocks mixed together)thinking
(You MUST handle errors using
and its subclasses -- never use bare catch blocks without error type checking)Anthropic.APIError
(You MUST never hardcode API keys -- always use environment variables via
)process.env.ANTHROPIC_API_KEY
</critical_requirements>
Auto-detection: Anthropic, @anthropic-ai/sdk, client.messages.create, client.messages.stream, client.messages.parse, client.messages.countTokens, client.messages.batches, ANTHROPIC_API_KEY, claude-sonnet, claude-opus, claude-haiku, ContentBlock, ToolUseBlock, tool_use, tool_result, thinking, budget_tokens, adaptive, cache_control, zodOutputFormat, betaZodTool, toolRunner
When to use:
- Building applications that call Claude models directly (Opus, Sonnet, Haiku families)
- Implementing streaming chat responses with event-based text accumulation
- Using tool use / function calling where Claude decides which tools to invoke
- Processing images, PDFs, or documents alongside text prompts
- Enabling extended thinking for complex reasoning tasks
- Extracting structured data from responses with Zod schema validation
- Caching large system prompts or conversation prefixes for cost savings
- Running batch jobs for high-volume, asynchronous processing
- Counting tokens before sending requests for cost estimation
Key patterns covered:
- Client initialization and configuration (retries, timeouts, API key)
- Messages API (
, system prompts, multi-turn conversations)messages.create - Streaming with
helper and.stream()
low-level SSEstream: true - Tool use / function calling (tools array,
/tool_use
content blocks)tool_result - Vision (base64 images, URL images, PDFs/documents)
- Extended thinking (
config,thinking
, thinking content blocks)budget_tokens - Structured outputs (
,zodOutputFormat
,messages.parse
)output_config - Prompt caching (
)cache_control: { type: "ephemeral" } - Batch API (
)messages.batches.create - Token counting (
)messages.countTokens - Error handling, retries, and production best practices
When NOT to use:
- Multi-provider applications where you need to switch between multiple LLM providers -- use a unified provider SDK instead
- React-specific chat UI hooks (
,useChat
) -- use a framework-integrated AI SDKuseCompletion - When you need a higher-level agent framework -- consider the Claude Agent SDK (
)@anthropic-ai/claude-agent-sdk
Examples Index
- Core: Setup & Configuration -- Client init, production config, error handling, token counting
- Streaming --
helper,.stream()
SSE, event types, abortstream: true - Tool Use / Function Calling -- Tool definitions, tool loops, parallel tool calls, automated tool runner
- Vision & Documents -- Base64 images, URL images, PDFs, multi-modal
- Extended Thinking -- Thinking config, streaming thinking, thinking with tool use
- Quick API Reference -- Model IDs, method signatures, error types, streaming events, content block types
<philosophy>
Philosophy
The official Anthropic SDK provides direct, typed access to the Claude API. It is auto-generated from Anthropic's API specification using Stainless, giving you the exact API surface that Anthropic documents with full TypeScript types.
Core principles:
- Content blocks, not strings -- Responses are arrays of typed content blocks (
,TextBlock
,ToolUseBlock
), not plain strings. Always iterate overThinkingBlock
and switch onresponse.content
.block.type - Explicit resource limits --
is always required. There is no default. The API will reject requests without it.max_tokens - Tool use is a conversation loop -- When
, Claude is requesting you execute a tool. You must send the result back as astop_reason === "tool_use"
content block to continue the conversation.tool_result - Built-in resilience -- The SDK retries 2 times by default on 429, 409, 408, 529, and 5xx errors with exponential backoff.
- Streaming as a first-class pattern -- Use
for an event-based API with.stream()
, or.on("text", ...)
for raw SSE iteration.stream: true
When to use the Anthropic SDK directly:
- You only use Claude models and want the simplest, most direct integration
- You need access to Anthropic-specific features (extended thinking, prompt caching, batch API)
- You want minimal dependencies and zero abstraction overhead
- You need the latest API features on day one
When NOT to use:
- You need to switch between multiple LLM providers -- use a unified provider SDK
- You want React-specific chat UI hooks -- use a framework-integrated AI SDK
- You want a higher-level agent framework -- consider the Claude Agent SDK
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize the Anthropic client. It auto-reads
ANTHROPIC_API_KEY from the environment.
// lib/anthropic.ts -- basic setup import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); export { client };
// lib/anthropic.ts -- production configuration const TIMEOUT_MS = 30_000; const MAX_RETRIES = 3; const client = new Anthropic({ timeout: TIMEOUT_MS, maxRetries: MAX_RETRIES });
Why good: Minimal setup, env var auto-detected, named constants for production settings
// BAD: Hardcoded API key const client = new Anthropic({ apiKey: "sk-ant-api03-..." });
Why bad: Hardcoded keys get committed to version control, causing security breaches
See: examples/core.md for per-request overrides, error handling patterns, token counting
Pattern 2: Messages API
All interactions use
client.messages.create(). max_tokens is always required.
const MAX_TOKENS = 1024; const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, system: "You are a helpful coding assistant.", messages: [{ role: "user", content: "Explain TypeScript generics." }], }); // Response is an array of content blocks -- iterate, don't assume for (const block of message.content) { if (block.type === "text") { console.log(block.text); } }
Why good: Named constant for max_tokens, system prompt separated from messages, content blocks iterated
// BAD: Assuming content is a single text string const text = message.content[0].text; // Crashes if block is tool_use or thinking
Why bad: Content can contain multiple blocks of different types -- direct index access without type checking crashes at runtime
See: examples/core.md for multi-turn conversations, system prompts, token tracking
Pattern 3: Streaming
Use
.stream() for event-based streaming with text accumulation helpers.
const MAX_TOKENS = 1024; const stream = client.messages.stream({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, messages: [{ role: "user", content: "Explain async/await." }], }); stream.on("text", (text) => { process.stdout.write(text); }); const finalMessage = await stream.finalMessage();
Why good: Event-based API handles accumulation,
finalMessage() gives the complete response object
// BAD: Using stream: true without consuming events const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, messages: [{ role: "user", content: "Hello" }], stream: true, }); // Response is an async iterable, not a Message -- must iterate
Why bad:
stream: true returns an async iterable of raw SSE events, not a Message. Treating it as a Message silently breaks.
See: examples/streaming.md for raw SSE iteration, abort, stream events, streaming with thinking
Pattern 4: Tool Use / Function Calling
Define tools Claude can invoke. Handle the
tool_use -> tool_result conversation loop.
const tools: Anthropic.Messages.Tool[] = [ { name: "get_weather", description: "Get current weather for a location", input_schema: { type: "object" as const, properties: { location: { type: "string", description: "City name" }, }, required: ["location"], }, }, ]; const MAX_TOKENS = 1024; const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, tools, messages: [{ role: "user", content: "Weather in Paris?" }], }); // Check stop_reason to know if Claude wants to call a tool if (response.stop_reason === "tool_use") { const toolBlock = response.content.find( (block): block is Anthropic.Messages.ToolUseBlock => block.type === "tool_use", ); if (toolBlock) { console.log(`Call ${toolBlock.name} with:`, toolBlock.input); } }
Why good: Typed tool definitions,
stop_reason checked, type guard for ToolUseBlock
// BAD: Not checking stop_reason, not sending tool_result back const response = await client.messages.create({ /* ... with tools */ }); console.log(response.content[0]); // May be a tool_use block, not text!
Why bad: When Claude wants to call a tool, there is no text content -- only
tool_use blocks. You must execute the tool and send back a tool_result to get the final answer.
See: examples/tool-use.md for complete tool loops, parallel tool calls, automated tool runner
Pattern 5: Vision & Documents
Pass images and PDFs as content blocks alongside text.
import { readFileSync } from "node:fs"; const MAX_TOKENS = 1024; const imageData = readFileSync("photo.jpg").toString("base64"); const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, messages: [ { role: "user", content: [ { type: "image", source: { type: "base64", media_type: "image/jpeg", data: imageData }, }, { type: "text", text: "What's in this image?" }, ], }, ], });
Why good: Multi-part content array, explicit media type, text and image combined in one message
See: examples/vision-documents.md for URL images, PDFs, multiple images
Pattern 6: Extended Thinking
Enable extended thinking for complex reasoning. Responses include
thinking content blocks. Use adaptive thinking on Opus 4.6 and Sonnet 4.6 (recommended). Use manual budget_tokens on older models.
const MAX_TOKENS = 16_000; // Adaptive thinking (recommended for 4.6 models) const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, thinking: { type: "adaptive" }, messages: [ { role: "user", content: "Prove there are infinitely many primes." }, ], } as unknown as Anthropic.MessageCreateParamsNonStreaming); for (const block of response.content) { if (block.type === "thinking") { console.log("Thinking:", block.thinking); } else if (block.type === "text") { console.log("Answer:", block.text); } }
Why good: Adaptive thinking lets Claude decide how much to reason, iterates content blocks, handles both thinking and text blocks
// Manual thinking (deprecated on 4.6 models, required on older models) const THINKING_BUDGET = 10_000; const response = await client.messages.create({ model: "claude-sonnet-4-5", max_tokens: MAX_TOKENS, thinking: { type: "enabled", budget_tokens: THINKING_BUDGET }, messages: [ { role: "user", content: "Prove there are infinitely many primes." }, ], });
Note: The TypeScript SDK does not yet have
"adaptive" in its type definitions. The as unknown as Anthropic.MessageCreateParamsNonStreaming assertion is required until the SDK types are updated.
See: examples/extended-thinking.md for streaming thinking, thinking with tools, display options
Pattern 7: Structured Outputs
Use
zodOutputFormat() and messages.parse() for type-safe structured responses.
import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod"; import { z } from "zod"; const ContactInfo = z.object({ name: z.string(), email: z.string(), topics: z.array(z.string()), }); const MAX_TOKENS = 1024; const response = await client.messages.parse({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, messages: [ { role: "user", content: "Extract info: John (john@example.com) asked about billing and API limits.", }, ], output_config: { format: zodOutputFormat(ContactInfo) }, }); const parsed = response.parsed_output; // Fully typed: { name, email, topics }
Why good: Auto-converts Zod schema, validates output, fully typed result
See: examples/core.md for raw JSON schema, combined with tool use
Pattern 8: Prompt Caching
Cache large system prompts and conversation prefixes for cost savings.
const MAX_TOKENS = 1024; const response = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: MAX_TOKENS, system: [ { type: "text", text: "You are a legal document analyst.", }, { type: "text", text: largeDocumentText, // 50+ pages of legal text cache_control: { type: "ephemeral" }, }, ], messages: [{ role: "user", content: "What are the key terms?" }], }); // Check cache performance console.log("Cache read tokens:", response.usage.cache_read_input_tokens); console.log("Cache write tokens:", response.usage.cache_creation_input_tokens);
Why good: Cache breakpoint on the large static content, cache metrics tracked
See: reference.md for cache pricing, TTL options, automatic caching
Pattern 9: Error Handling
Always catch
Anthropic.APIError and its subclasses. Re-throw unexpected errors.
try { const message = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, messages: [{ role: "user", content: "Hello" }], }); } catch (error) { if (error instanceof Anthropic.APIError) { console.error(`API Error [${error.status}]: ${error.message}`); if (error instanceof Anthropic.RateLimitError) { console.error("Rate limited -- SDK will auto-retry 2 times"); } if (error instanceof Anthropic.AuthenticationError) { throw new Error("Invalid API key. Check ANTHROPIC_API_KEY."); } } else { throw error; // Re-throw non-API errors } }
Why good: Specific error types, status code access, re-throws unexpected errors
See: examples/core.md for full error hierarchy, stream error handling
</patterns><performance>
Performance Optimization
Model Selection for Cost/Speed
Most capable, complex reasoning -> claude-opus-4-6 (1M context, 128K output) General purpose, best value -> claude-sonnet-4-6 (1M context, 64K output) Fast + cheap, simple tasks -> claude-haiku-4-5 (200K context, 64K output) Extended thinking -> claude-sonnet-4-6 or claude-opus-4-6 (use adaptive thinking) Vision / multimodal -> claude-sonnet-4-6 or claude-opus-4-6 Batch processing -> Any model at 50% batch discount
Key Optimization Patterns
- Track token usage via
for cost visibility (message.usage
,input_tokens
)output_tokens - Check
to detect truncated outputstop_reason === "max_tokens" - Use prompt caching for large system prompts -- cache reads cost 0.1x base input price
- Use
before sending to estimate costsmessages.countTokens() - Use Batch API for high-volume async jobs at 50% cost reduction
- Use
to cancel long-running requestsAbortController - Set
for deterministic output when caching matterstemperature: 0
<decision_framework>
Decision Framework
Which Model to Choose
What is your task? +-- Complex reasoning / analysis -> claude-opus-4-6 +-- General purpose (best balance) -> claude-sonnet-4-6 +-- Fast + cheap, high throughput -> claude-haiku-4-5 +-- Extended thinking needed -> claude-sonnet-4-6 (or opus-4-6 with adaptive thinking) +-- Vision / image analysis -> claude-sonnet-4-6 or claude-opus-4-6 +-- Batch processing -> Any model (50% discount)
Streaming vs Non-Streaming
Is the response user-facing? +-- YES -> Use streaming (client.messages.stream()) | +-- Need event-level control? -> .on("text", ...) + .on("contentBlock", ...) | +-- Just want final message? -> stream.finalMessage() (avoids HTTP timeouts on large responses) +-- NO -> Use non-streaming (client.messages.create()) +-- Background processing -> messages.create() +-- Structured output -> messages.parse() +-- High volume -> Batch API
When to Use Extended Thinking
Does the task require multi-step reasoning? +-- YES -> Which model? | +-- Opus 4.6 or Sonnet 4.6? -> Use adaptive: thinking: { type: "adaptive" } | | +-- Control depth? -> Add output_config: { effort: "high" | "medium" | "low" } | | +-- Opus only max depth? -> effort: "max" | +-- Older models? -> Manual: thinking: { type: "enabled", budget_tokens: N } +-- NO -> Standard messages.create() is sufficient (omit thinking param or type: "disabled")
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Not providing
(request will be rejected -- it has no default)max_tokens - Hardcoding API keys instead of using environment variables (security breach risk)
- Treating
as a string instead of iterating content blocks (crashes onresponse.content
ortool_use
blocks)thinking - Not checking
forstop_reason
(breaks function calling flows -- Claude is waiting for tool results)"tool_use" - Using bare
blocks without checkingcatch
(hides API-specific error information)Anthropic.APIError
Medium Priority Issues:
- Not setting
/maxRetries
for production deployments (default timeout is 10 minutes, which may be too long)timeout - Ignoring
(response was truncated but you are using it as complete)stop_reason === "max_tokens" - Ignoring
data (no cost visibility or budget tracking)usage - Not sending
blocks back in multi-turn conversations when using extended thinking (Claude loses reasoning context)thinking - Changing
parameters between turns in a tool use loop (invalidates message cache, causes errors)thinking
Common Mistakes:
- Using
as a message role instead of the top-levelsystem
parameter (there is nosystem
role in messages -- use thesystem
parameter)system - Assuming
has exactly one block (it can have multipleresponse.content
,text
, andtool_use
blocks)thinking - Not passing
back after atool_result
response (Claude cannot continue without it)tool_use - Using
instead ofmax_completion_tokens
(the Anthropic API usesmax_tokens
, notmax_tokens
)max_completion_tokens - Using
instead ofresponse_format
for structured outputs (wrong parameter name)output_config - Forgetting that
must be less thanbudget_tokens
(except with interleaved thinking)max_tokens
Gotchas & Edge Cases:
- The SDK auto-retries on 429 (rate limit), 529 (overloaded), 408 (timeout), 409 (conflict), and 5xx errors -- 2 retries by default with exponential backoff. Disable with
.maxRetries: 0
returns aclient.messages.stream()
with event helpers.MessageStream
returns a raw async iterable of SSE events. They are different APIs.client.messages.create({ stream: true })- When using extended thinking with tool use, you must include the
blocks unmodified when sending conversation history back. Omitting or modifying them causes errors.thinking
forces Claude to call a tool but cannot be used with extended thinking. Onlytool_choice: { type: "any" }
and"auto"
work with thinking enabled."none"- Prompt caching requires a minimum of 1024-4096 tokens (model-dependent) to be cacheable. Small prompts will not be cached.
- Cache breakpoints on messages are invalidated when
parameters change between requests. System prompt cache is preserved.thinking
is deprecated on both Claude Opus 4.6 and Sonnet 4.6 -- usebudget_tokens
instead.thinking: { type: "adaptive" }
still works but will be removed in a future release.budget_tokens- The
field on thinking config controls whether thinking text is returned:display
(default) or"summarized"
(only signature, faster streaming)."omitted" - Adaptive thinking automatically enables interleaved thinking (thinking between tool calls). Manual mode on Sonnet 4.6 requires the
beta header for interleaved thinking.interleaved-thinking-2025-05-14 - The
parameter (effort
) works with adaptive thinking to control thinking depth.output_config: { effort: "high" | "medium" | "low" | "max" }
is Opus 4.6 only."max" - The TypeScript SDK does not yet include
in its type definitions -- use a type assertion when passing"adaptive"
.thinking: { type: "adaptive" } - Multi-turn conversations require you to include the full assistant response (all content blocks) in the conversation history, not just the text.
- Batch API requests have a 24-hour completion window. Use
to retrieve completed results.messages.batches.results()
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST always provide
in every max_tokens
/ messages.create()
call -- it is required and has no default)messages.stream()
(You MUST handle the
field to detect stop_reason
, end_turn
, max_tokens
, and tool_use
-- ignoring it causes silent truncation or broken tool loops)stop_sequence
(You MUST iterate over
blocks (not assume a single text block) -- responses can contain response.content
, text
, and tool_use
blocks mixed together)thinking
(You MUST handle errors using
and its subclasses -- never use bare catch blocks without error type checking)Anthropic.APIError
(You MUST never hardcode API keys -- always use environment variables via
)process.env.ANTHROPIC_API_KEY
Failure to follow these rules will produce broken tool loops, silent truncation, security vulnerabilities, or untyped AI integrations.
</critical_reminders>