Claude-skill-registry llm-cost-optimization

Reduce LLM API costs without sacrificing quality. Covers prompt caching (Anthropic), local response caching, prompt compression, debouncing triggers, and cost analysis. Use when building LLM-powered features, analyzing API costs, optimizing prompts, or implementing caching strategies.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/llm-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-llm-cost-optimization && rm -rf "$T"
manifest: skills/data/llm-cost-optimization/SKILL.md
source content

LLM Cost Optimization

Practical techniques to reduce LLM API costs by 35-65%.

Quick Reference

TechniqueSavingsWhen to UseReference
Prompt Caching25-45%Same system prompt, frequent callscaching.md
Response Cache100%Repeated identical requestscaching.md
Prompt Compression10-20%Long system promptsprompts.md
Debouncing50%+Duplicate triggerstriggers.md

The 80/20 of LLM Costs

For short user inputs, system prompts dominate costs:

Text LengthInput TokensSystem Prompt %
Short (~100 chars)~25080-87%
Medium (~500 chars)~45044%
Long (~2000 chars)~90022%

Optimization priority:

  1. Cache system prompts (biggest impact)
  2. Cache identical requests (free repeats)
  3. Debounce triggers (prevent waste)
  4. Compress prompts (last resort)

Cost Estimation (Claude Haiku 3.5)

Text LengthEst. Cost
Short (~100 chars)~$0.0004
Medium (~500 chars)~$0.0008
Long (~2000 chars)~$0.002

Benchmark: 1000 translations ≈ $0.80 (before optimization)

Implementation Checklist

Before Building

  • Add logging to every AI trigger point
  • Verify triggers fire exactly once per user action
  • Check for Pressed/Released event duplicates

Caching Strategy

  • Enable Anthropic Prompt Caching for system prompts
  • Implement local response cache (hash-based)
  • Include model name in cache key
  • Set reasonable cache limits (e.g., 500 entries LRU)

Prompt Design

  • Measure current token count
  • Identify critical rules (security, output format)
  • Test quality after compression
  • Document WHY for each rule kept

Common Mistakes

MistakeImpactFix
Trigger fires twice2x costCheck event.state
No prompt cachingFull price every callUse cache_control
Aggressive prompt compressionQuality dropsKeep critical rules
Cache key missing modelWrong resultsInclude model in key

Quick Wins

1. Check for Duplicate Triggers

// Before ANY optimization, verify this
log::info!("AI trigger fired: {:?}", event);
if event.state != ShortcutState::Pressed {
    return;  // Ignore Released events
}

2. Enable Prompt Caching (Anthropic)

let system = vec![SystemBlock {
    block_type: "text".to_string(),
    text: system_prompt,
    cache_control: CacheControl { cache_type: "ephemeral".to_string() },
}];

3. Add Response Cache

// Check cache before API call
if let Some(cached) = get_cached(&text, &model) {
    return Ok(cached);  // Free!
}

// Save after API call
save_to_cache(&text, &result, &model)?;

Anti-Patterns

  • TOON format for plain text - Only helps with structured data
  • Caching without model key - Haiku vs Sonnet give different results
  • Prompt compression first - Optimize triggers and caching before touching prompts