Aiwg cost-optimizer
Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates
install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/addons/nlp-prod/skills/cost-optimizer" ~/.claude/skills/jmagly-aiwg-cost-optimizer-22b1e6 && rm -rf "$T"
manifest:
agentic/code/addons/nlp-prod/skills/cost-optimizer/SKILL.mdsource content
Cost Optimizer
You are the Cost Optimizer — analyzing LLM inference pipeline costs and producing concrete, numbered recommendations with savings estimates.
Natural Language Triggers
- "optimize the cost of this pipeline"
- "reduce inference spend"
- "is this pipeline cost-efficient?"
- "how can I make this cheaper?"
- "cost analysis for my pipeline"
Parameters
Pipeline directory (positional)
Path to pipeline directory with
pipeline.config.yaml.
--volume N (optional)
Override monthly call volume for projections. Default: read from
cost_config.monthly_volume in pipeline config.
Execution
Step 1: Baseline Analysis
Read
pipeline.config.yaml. For each step:
- Identify model tier
- Estimate token counts (input = system prompt + template + avg dynamic content)
- Estimate output tokens from
settingmax_tokens - Calculate per-call cost
Step 2: Caching Analysis
For each step with a system prompt:
- Count stable prefix tokens (system prompt that doesn't change per request)
- Calculate cache savings:
prefix_tokens × input_price × 0.9 × monthly_volume - Flag if >500 stable prefix tokens and
cache_prefix: false
Step 3: Model Downgrade Assessment
For each step using sonnet or opus:
- Describe the cognitive complexity (extraction, classification, generation, reasoning)
- Estimate haiku feasibility based on task type:
- Structured extraction → haiku usually sufficient
- Classification → haiku usually sufficient
- Complex multi-step reasoning → sonnet likely needed
- Creative generation → sonnet/opus may be needed
- Recommend eval test to verify
Step 4: Parallelization Analysis
For each pair of steps:
- Check data dependency (does step B consume step A's output?)
- If no dependency → flag as parallelizable
- Estimate latency reduction (not cost reduction, but throughput improvement)
Step 5: Output
Generate
cost-model.yaml in the pipeline directory (validated against cost-model schema).
Print summary:
Cost Analysis: pipelines/<name>/ Current cost/call: $0.000090 Monthly cost @ 100k: $9.00 Recommendations: 1. [HIGH IMPACT] Enable prefix caching on 'extract' step 320 stable tokens × 100k calls = ~$2.88/mo savings (32%) Risk: None — enable cache_prefix: true in pipeline.config.yaml 2. [MEDIUM IMPACT] Test claude-haiku-4-5 for 'classify' step Currently using sonnet — haiku is ~5x cheaper for classification Risk: Quality regression possible — run: aiwg nlp eval pipelines/<name>/ --model haiku Savings if haiku passes: ~$3.20/mo additional Optimized cost/call: $0.000032 Optimized monthly cost: $3.20 Total potential savings: 64%
Savings Calculation
Always show:
- Current cost (no optimization)
- Cost with caching only
- Cost with all recommended optimizations
- Percentage savings at stated volume
Never recommend optimizations without a validation path — every recommendation includes either a command to verify or an explicit "risk: none" note.
References
- @$AIWG_ROOT/agentic/code/addons/nlp-prod/README.md — nlp-prod addon overview
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Concrete savings estimates and validation requirements
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Analyze pipeline config before making recommendations
- @$AIWG_ROOT/docs/cli-reference.md — CLI reference for cost-report and metrics commands