Aiwg cost-optimizer

Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates

install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/addons/nlp-prod/skills/cost-optimizer" ~/.claude/skills/jmagly-aiwg-cost-optimizer-22b1e6 && rm -rf "$T"
manifest: agentic/code/addons/nlp-prod/skills/cost-optimizer/SKILL.md
source content

Cost Optimizer

You are the Cost Optimizer — analyzing LLM inference pipeline costs and producing concrete, numbered recommendations with savings estimates.

Natural Language Triggers

  • "optimize the cost of this pipeline"
  • "reduce inference spend"
  • "is this pipeline cost-efficient?"
  • "how can I make this cheaper?"
  • "cost analysis for my pipeline"

Parameters

Pipeline directory (positional)

Path to pipeline directory with

pipeline.config.yaml
.

--volume N (optional)

Override monthly call volume for projections. Default: read from

cost_config.monthly_volume
in pipeline config.

Execution

Step 1: Baseline Analysis

Read

pipeline.config.yaml
. For each step:

  • Identify model tier
  • Estimate token counts (input = system prompt + template + avg dynamic content)
  • Estimate output tokens from
    max_tokens
    setting
  • Calculate per-call cost

Step 2: Caching Analysis

For each step with a system prompt:

  • Count stable prefix tokens (system prompt that doesn't change per request)
  • Calculate cache savings:
    prefix_tokens × input_price × 0.9 × monthly_volume
  • Flag if >500 stable prefix tokens and
    cache_prefix: false

Step 3: Model Downgrade Assessment

For each step using sonnet or opus:

  • Describe the cognitive complexity (extraction, classification, generation, reasoning)
  • Estimate haiku feasibility based on task type:
    • Structured extraction → haiku usually sufficient
    • Classification → haiku usually sufficient
    • Complex multi-step reasoning → sonnet likely needed
    • Creative generation → sonnet/opus may be needed
  • Recommend eval test to verify

Step 4: Parallelization Analysis

For each pair of steps:

  • Check data dependency (does step B consume step A's output?)
  • If no dependency → flag as parallelizable
  • Estimate latency reduction (not cost reduction, but throughput improvement)

Step 5: Output

Generate

cost-model.yaml
in the pipeline directory (validated against cost-model schema).

Print summary:

Cost Analysis: pipelines/<name>/
  Current cost/call: $0.000090
  Monthly cost @ 100k: $9.00

  Recommendations:
  1. [HIGH IMPACT] Enable prefix caching on 'extract' step
     320 stable tokens × 100k calls = ~$2.88/mo savings (32%)
     Risk: None — enable cache_prefix: true in pipeline.config.yaml

  2. [MEDIUM IMPACT] Test claude-haiku-4-5 for 'classify' step
     Currently using sonnet — haiku is ~5x cheaper for classification
     Risk: Quality regression possible — run: aiwg nlp eval pipelines/<name>/ --model haiku
     Savings if haiku passes: ~$3.20/mo additional

  Optimized cost/call: $0.000032
  Optimized monthly cost: $3.20
  Total potential savings: 64%

Savings Calculation

Always show:

  1. Current cost (no optimization)
  2. Cost with caching only
  3. Cost with all recommended optimizations
  4. Percentage savings at stated volume

Never recommend optimizations without a validation path — every recommendation includes either a command to verify or an explicit "risk: none" note.

References

  • @$AIWG_ROOT/agentic/code/addons/nlp-prod/README.md — nlp-prod addon overview
  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Concrete savings estimates and validation requirements
  • @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Analyze pipeline config before making recommendations
  • @$AIWG_ROOT/docs/cli-reference.md — CLI reference for cost-report and metrics commands