Claude-skill-registry claude-api-cost-optimization

Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/claude-api-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-claude-api-cost-optimization && rm -rf "$T"

manifest: skills/data/claude-api-cost-optimization/SKILL.md

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

Technique	Savings	Use When
Batch API	50%	Tasks can wait up to 24h
Prompt Caching	90%	Repeated system prompts (>1K tokens)
Extended Thinking	~80%	Complex reasoning tasks
Batch + Cache	~95%	Bulk tasks with shared context

1. Batch API (50% Off)

When to Use

Bulk translations
Daily content generation
Overnight report processing
NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

Batch Size	Time/Request
Large (294)	0.45 min
Small (10)	9.84 min

22x efficiency difference! Always batch 100+ requests together.

2. Prompt Caching (90% Off)

When to Use

Long system prompts (>1K tokens)
Repeated instructions
RAG with large context

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

Minimum: 1,024 tokens (Sonnet)
TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

Complex code architecture
Strategic planning
Mathematical reasoning

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs

Made with 🐾 by Washin Village - Verified against official Anthropic documentation