Skillshub anth-cost-tuning

install

source · Clone the upstream repo

git clone https://github.com/ComeOnOliver/skillshub

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/anth-cost-tuning" ~/.claude/skills/comeonoliver-skillshub-anth-cost-tuning && rm -rf "$T"

manifest: skills/jeremylongshore/claude-code-plugins-plus-skills/anth-cost-tuning/SKILL.md

Anthropic Cost Tuning

Overview

Optimize Claude API spend through model routing, prompt caching, the Message Batches API, and real-time cost tracking. The four biggest levers: model selection (4-19x), prompt caching (10x input), batches (2x), and

max_tokens

discipline.

Pricing Reference (per million tokens)

Model	Input	Output	Cache Read	Cache Write
Claude Haiku	$0.80	$4.00	$0.08	$1.00
Claude Sonnet	$3.00	$15.00	$0.30	$3.75
Claude Opus	$15.00	$75.00	$1.50	$18.75

Message Batches: 50% off all model pricing for async processing.

Cost Calculator

def estimate_cost(
    input_tokens: int,
    output_tokens: int,
    model: str = "claude-sonnet-4-20250514",
    cached_input: int = 0,
    use_batch: bool = False
) -> float:
    pricing = {
        "claude-haiku-4-20250514": {"input": 0.80, "output": 4.00, "cache_read": 0.08},
        "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00, "cache_read": 0.30},
        "claude-opus-4-20250514": {"input": 15.00, "output": 75.00, "cache_read": 1.50},
    }
    rates = pricing[model]
    uncached_input = input_tokens - cached_input

    cost = (
        uncached_input * rates["input"] +
        cached_input * rates["cache_read"] +
        output_tokens * rates["output"]
    ) / 1_000_000

    if use_batch:
        cost *= 0.5

    return cost

# Example: 10K requests/day, 500 input + 200 output tokens each
daily = estimate_cost(500, 200, "claude-sonnet-4-20250514") * 10_000
print(f"Daily: ${daily:.2f}")      # ~$0.045 * 10K = $450/day
print(f"Monthly: ${daily * 30:.2f}")  # ~$13,500/month

# Same with Haiku + batching
daily_optimized = estimate_cost(500, 200, "claude-haiku-4-20250514", use_batch=True) * 10_000
print(f"Optimized: ${daily_optimized:.2f}/day")  # ~$22/day (20x cheaper)

Strategy 1: Model Routing

def route_to_model(task: str, complexity: str) -> str:
    """Route tasks to cheapest adequate model."""
    # Haiku: classification, extraction, yes/no, routing ($0.80/$4)
    if task in ("classify", "extract", "route", "validate"):
        return "claude-haiku-4-20250514"

    # Sonnet: general tasks, code, tool use ($3/$15)
    if complexity in ("low", "medium"):
        return "claude-sonnet-4-20250514"

    # Opus: only for complex reasoning, research ($15/$75)
    return "claude-opus-4-20250514"

Strategy 2: Prompt Caching

# Cache system prompts and reference documents (90% input savings)
# Break-even: 2 requests with same cached content
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system=[{
        "type": "text",
        "text": large_reference_document,  # 10K+ tokens
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": user_question}]
)

Strategy 3: Batches for Non-Real-Time

# 50% cost reduction for anything that doesn't need immediate response
# Ideal for: summarization pipelines, data extraction, content generation
batch = client.messages.batches.create(requests=[...])  # Up to 100K requests

Strategy 4: Spend Tracking

import anthropic
from dataclasses import dataclass, field

@dataclass
class SpendTracker:
    budget_usd: float = 100.0
    spent_usd: float = 0.0
    requests: int = 0

    def track(self, response):
        cost = estimate_cost(
            response.usage.input_tokens,
            response.usage.output_tokens,
            response.model,
            getattr(response.usage, "cache_read_input_tokens", 0)
        )
        self.spent_usd += cost
        self.requests += 1

        if self.spent_usd > self.budget_usd * 0.8:
            print(f"WARNING: 80% budget used (${self.spent_usd:.2f}/${self.budget_usd})")
        if self.spent_usd > self.budget_usd:
            raise RuntimeError(f"Budget exceeded: ${self.spent_usd:.2f}")

tracker = SpendTracker(budget_usd=50.0)

Cost Reduction Checklist

Use Haiku for classification/extraction/routing tasks
Enable prompt caching for repeated system prompts
Use Message Batches for non-real-time processing
Set
```
max_tokens
```
to realistic values (not maximum)
Use prefill to reduce output preamble tokens
Implement spend tracking and budget alerts
Monitor via Usage API

Resources

Next Steps

For architecture patterns, see

anth-reference-architecture