Claude-skill-registry llm-cost-optimization

Reduce LLM API costs without sacrificing quality. Covers prompt caching (Anthropic), local response caching, prompt compression, debouncing triggers, and cost analysis. Use when building LLM-powered features, analyzing API costs, optimizing prompts, or implementing caching strategies.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/llm-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-llm-cost-optimization && rm -rf "$T"

manifest: skills/data/llm-cost-optimization/SKILL.md

LLM Cost Optimization

Practical techniques to reduce LLM API costs by 35-65%.

Quick Reference

Technique	Savings	When to Use	Reference
Prompt Caching	25-45%	Same system prompt, frequent calls	caching.md
Response Cache	100%	Repeated identical requests	caching.md
Prompt Compression	10-20%	Long system prompts	prompts.md
Debouncing	50%+	Duplicate triggers	triggers.md

The 80/20 of LLM Costs

For short user inputs, system prompts dominate costs:

Text Length	Input Tokens	System Prompt %
Short (~100 chars)	~250	80-87%
Medium (~500 chars)	~450	44%
Long (~2000 chars)	~900	22%

Optimization priority:

Cache system prompts (biggest impact)
Cache identical requests (free repeats)
Debounce triggers (prevent waste)
Compress prompts (last resort)

Cost Estimation (Claude Haiku 3.5)

Text Length	Est. Cost
Short (~100 chars)	~$0.0004
Medium (~500 chars)	~$0.0008
Long (~2000 chars)	~$0.002

Benchmark: 1000 translations ≈ $0.80 (before optimization)

Implementation Checklist

Before Building

Add logging to every AI trigger point
Verify triggers fire exactly once per user action
Check for Pressed/Released event duplicates

Caching Strategy

Enable Anthropic Prompt Caching for system prompts
Implement local response cache (hash-based)
Include model name in cache key
Set reasonable cache limits (e.g., 500 entries LRU)

Prompt Design

Measure current token count
Identify critical rules (security, output format)
Test quality after compression
Document WHY for each rule kept

Common Mistakes

Mistake	Impact	Fix
Trigger fires twice	2x cost	Check event.state
No prompt caching	Full price every call	Use cache_control
Aggressive prompt compression	Quality drops	Keep critical rules
Cache key missing model	Wrong results	Include model in key

Quick Wins

1. Check for Duplicate Triggers

// Before ANY optimization, verify this
log::info!("AI trigger fired: {:?}", event);
if event.state != ShortcutState::Pressed {
    return;  // Ignore Released events
}

2. Enable Prompt Caching (Anthropic)

let system = vec![SystemBlock {
    block_type: "text".to_string(),
    text: system_prompt,
    cache_control: CacheControl { cache_type: "ephemeral".to_string() },
}];

3. Add Response Cache

// Check cache before API call
if let Some(cached) = get_cached(&text, &model) {
    return Ok(cached);  // Free!
}

// Save after API call
save_to_cache(&text, &result, &model)?;

Anti-Patterns

TOON format for plain text - Only helps with structured data
Caching without model key - Haiku vs Sonnet give different results
Prompt compression first - Optimize triggers and caching before touching prompts