Claude-skill-registry claude-api-cost-optimization

Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/claude-api-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-claude-api-cost-optimization && rm -rf "$T"
manifest: skills/data/claude-api-cost-optimization/SKILL.md
source content

Claude API Cost Optimization

Save 50-90% on Claude API costs with three officially verified techniques

Quick Reference

TechniqueSavingsUse When
Batch API50%Tasks can wait up to 24h
Prompt Caching90%Repeated system prompts (>1K tokens)
Extended Thinking~80%Complex reasoning tasks
Batch + Cache~95%Bulk tasks with shared context

1. Batch API (50% Off)

When to Use

  • Bulk translations
  • Daily content generation
  • Overnight report processing
  • NOT for real-time chat

Code Example

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "task-001",
            "params": {
                "model": "claude-sonnet-4-5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Task 1"}]
            }
        }
    ]
)

# Results available within 24h (usually <1h)
for result in client.messages.batches.results(batch.id):
    print(f"{result.custom_id}: {result.result.message.content[0].text}")

Key Finding: Bigger Batches = Faster!

Batch SizeTime/Request
Large (294)0.45 min
Small (10)9.84 min

22x efficiency difference! Always batch 100+ requests together.


2. Prompt Caching (90% Off)

When to Use

  • Long system prompts (>1K tokens)
  • Repeated instructions
  • RAG with large context

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Your long system prompt here...",
        "cache_control": {"type": "ephemeral"}  # Enable caching!
    }],
    messages=[{"role": "user", "content": "User question"}]
)
# First call: +25% (cache write)
# Subsequent: -90% (cache read!)

Cache Rules

  • Minimum: 1,024 tokens (Sonnet)
  • TTL: 5 minutes (refreshes on use)

3. Extended Thinking (~80% Off)

When to Use

  • Complex code architecture
  • Strategic planning
  • Mathematical reasoning

Code Example

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Design architecture for..."}]
)

Decision Flowchart

Can wait 24h? → Yes → Batch API (50% off)
                 ↓ No
Repeated prompts >1K? → Yes → Prompt Caching (90% off)
                         ↓ No
Complex reasoning? → Yes → Extended Thinking
                      ↓ No
Use normal API

Official Docs


Made with 🐾 by Washin Village - Verified against official Anthropic documentation