Claude-skill-registry claude-api-cost-optimization
Save 50-90% on Claude API costs with Batch API, Prompt Caching & Extended Thinking. Official techniques, verified.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/claude-api-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-claude-api-cost-optimization && rm -rf "$T"
manifest:
skills/data/claude-api-cost-optimization/SKILL.mdsource content
Claude API Cost Optimization
Save 50-90% on Claude API costs with three officially verified techniques
Quick Reference
| Technique | Savings | Use When |
|---|---|---|
| Batch API | 50% | Tasks can wait up to 24h |
| Prompt Caching | 90% | Repeated system prompts (>1K tokens) |
| Extended Thinking | ~80% | Complex reasoning tasks |
| Batch + Cache | ~95% | Bulk tasks with shared context |
1. Batch API (50% Off)
When to Use
- Bulk translations
- Daily content generation
- Overnight report processing
- NOT for real-time chat
Code Example
import anthropic client = anthropic.Anthropic() batch = client.messages.batches.create( requests=[ { "custom_id": "task-001", "params": { "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Task 1"}] } } ] ) # Results available within 24h (usually <1h) for result in client.messages.batches.results(batch.id): print(f"{result.custom_id}: {result.result.message.content[0].text}")
Key Finding: Bigger Batches = Faster!
| Batch Size | Time/Request |
|---|---|
| Large (294) | 0.45 min |
| Small (10) | 9.84 min |
22x efficiency difference! Always batch 100+ requests together.
2. Prompt Caching (90% Off)
When to Use
- Long system prompts (>1K tokens)
- Repeated instructions
- RAG with large context
Code Example
response = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system=[{ "type": "text", "text": "Your long system prompt here...", "cache_control": {"type": "ephemeral"} # Enable caching! }], messages=[{"role": "user", "content": "User question"}] ) # First call: +25% (cache write) # Subsequent: -90% (cache read!)
Cache Rules
- Minimum: 1,024 tokens (Sonnet)
- TTL: 5 minutes (refreshes on use)
3. Extended Thinking (~80% Off)
When to Use
- Complex code architecture
- Strategic planning
- Mathematical reasoning
Code Example
response = client.messages.create( model="claude-sonnet-4-5", max_tokens=16000, thinking={ "type": "enabled", "budget_tokens": 10000 }, messages=[{"role": "user", "content": "Design architecture for..."}] )
Decision Flowchart
Can wait 24h? → Yes → Batch API (50% off) ↓ No Repeated prompts >1K? → Yes → Prompt Caching (90% off) ↓ No Complex reasoning? → Yes → Extended Thinking ↓ No Use normal API
Official Docs
Made with 🐾 by Washin Village - Verified against official Anthropic documentation