Claude-Skills llm-cost-optimizer
install
source · Clone the upstream repo
git clone https://github.com/borghei/Claude-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/borghei/Claude-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/engineering/llm-cost-optimizer" ~/.claude/skills/borghei-claude-skills-llm-cost-optimizer && rm -rf "$T"
manifest:
engineering/llm-cost-optimizer/SKILL.mdsource content
LLM Cost Optimizer
Category: Engineering Domain: AI Cost Management
Overview
The LLM Cost Optimizer skill provides tools for counting tokens, estimating costs across different LLM providers, and optimizing prompts to reduce token usage without sacrificing quality. Essential for teams managing LLM API budgets at scale.
Quick Start
# Count tokens in a prompt file and estimate costs python scripts/token_counter.py --file prompt.txt --models gpt-4o claude-sonnet # Count tokens from stdin echo "Hello world" | python scripts/token_counter.py --stdin --models all # Analyze a prompt for optimization opportunities python scripts/prompt_optimizer.py --file system_prompt.txt # Optimize with target reduction python scripts/prompt_optimizer.py --file prompt.txt --target-reduction 30
Tools Overview
| Tool | Purpose | Key Flags |
|---|---|---|
| Count tokens and estimate costs across models | , , , |
| Analyze prompts for token reduction opportunities | , , |
Workflows
Cost Estimation for New Project
- Collect sample prompts (system prompt + user messages)
- Run
with target modelstoken_counter.py - Multiply per-request cost by expected daily volume
- Compare models on cost-quality tradeoff
Prompt Optimization Sprint
- Identify highest-cost prompts from usage logs
- Run
on eachprompt_optimizer.py - Apply suggested optimizations
- Re-count tokens to verify reduction
- A/B test optimized vs. original for quality
Reference Documentation
- LLM Pricing Guide - Current pricing for major LLM providers, token estimation methods
Common Patterns
Token Reduction Techniques
- Remove redundant instructions and examples
- Use shorter variable names in few-shot examples
- Compress verbose system prompts
- Replace repeated context with references
- Use structured output formats (JSON) to reduce response tokens
- Batch multiple requests into single prompts where possible
Cost-Effective Model Selection
- Use smaller models for classification/extraction tasks
- Reserve large models for complex reasoning
- Implement model routing based on query complexity
- Cache responses for identical or similar queries