Some_claude_skills cost-verification-auditor

Audit LLM token cost estimates against actual API usage. Activate on 'cost verification', 'token estimate accuracy', 'API cost audit', 'estimation variance'. NOT for pricing lookups, budget

install
source · Clone the upstream repo
git clone https://github.com/curiositech/some_claude_skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/curiositech/some_claude_skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/cost-verification-auditor" ~/.claude/skills/curiositech-some-claude-skills-cost-verification-auditor && rm -rf "$T"
manifest: .claude/skills/cost-verification-auditor/SKILL.md
source content

Cost Verification Auditor

Verify that token cost estimates are within ±20% of actual Claude API usage.

When to Use

Use for:

  • Validating token estimation systems after implementation
  • Pre-deployment cost accuracy checks
  • Debugging unexpected API bills
  • Periodic estimation drift detection

NOT for:

  • Looking up model pricing (use pricing docs)
  • Budget planning or forecasting
  • Cost optimization strategies
  • Comparing models by price

Core Audit Process

Decision Tree

Has estimator? ──No──→ Build estimator first (see Calibration Guidelines)
      │
     Yes
      ↓
Define 3+ test cases (simple/medium/complex)
      ↓
Estimate BEFORE execution (no peeking!)
      ↓
Execute against real API
      ↓
Calculate variance: (actual - estimated) / estimated
      ↓
Variance ≤ ±20%? ──Yes──→ PASS ✓
      │
     No
      ↓
Apply fixes from Anti-Patterns section
      ↓
Re-run verification

Variance Formula

const inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;

// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;

Common Anti-Patterns

Anti-Pattern: The 500-Token Overhead Myth

Novice thinking: "Claude Code adds ~500 tokens overhead, so add that to every estimate."

Reality: Direct API calls have ~10 token overhead. The 500+ overhead is ONLY when using Claude Code's full context (system prompts, tools, conversation history).

Timeline:

  • Pre-2025: Many tutorials used 500+ token estimates
  • 2025+: Direct API overhead is minimal (~10 tokens)

What to use instead:

ContextOverhead
Direct API call~10 tokens
With system prompt50-200 tokens
With tools/functions100-500 tokens
Claude Code full context500-2000 tokens

How to detect: Consistent 40-90% overestimation = overhead too high.


Anti-Pattern: Per-Node Accuracy Obsession

Novice thinking: "Every node must be within ±20% or the estimator is broken."

Reality: LLM output length is non-deterministic. Per-node output variance of 30-50% is normal. What matters is aggregate cost accuracy.

What to use instead:

  • Focus on total DAG cost variance (should be ±20%)
  • Accept per-node output variance up to ±40%
  • Use constrained prompts ("list exactly 3") to reduce variance

How to detect: Input estimates accurate, output varies wildly = normal LLM behavior.


Anti-Pattern: Peeking Before Estimating

Novice thinking: "Let me run the API call first to see what tokens we get, then build the estimator."

Reality: This produces perfectly-fitted estimates that fail on new prompts. Estimation must happen BEFORE execution.

Correct approach:

  1. Estimate based on prompt length and heuristics
  2. Execute API call
  3. Compare variance
  4. Adjust heuristics if needed

Calibration Guidelines

Input Token Estimation

// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;
Text TypeCHARS_PER_TOKENNotes
English prose4.0Most consistent
Code3.0-3.5Symbols tokenize differently
Mixed3.5Balanced (recommended default)
JSON/structured3.0Punctuation heavy

Output Token Estimation

Prompt ConstraintMultiplierNotes
"List exactly N items"0.8x inputHighly constrained
"Brief summary"1.0x inputModerate
"Explain in detail"2-3x inputExpansive
Unconstrained1.5x inputVariable

Always: Minimum 100 output tokens for any meaningful response.

Model Behavior

ModelOutput Tendency
Claude OpusLonger, more detailed
Claude SonnetBalanced
Claude HaikuConcise, efficient

Quick Fixes

SymptomCauseFix
Overestimating by 40%+Overhead too highReduce from 500 → 10
Underestimating inputsChars/token too highReduce from 4.0 → 3.5
Output wildly variesLLM non-determinismUse constrained prompts
Total cost accurate but per-node offNormal aggregationAccept it, focus on totals

Verification Checklist

  • 3+ test cases (simple, medium, complex)
  • Estimates run BEFORE API calls
  • Variance formula:
    (actual - estimated) / estimated
  • Target: ±20% for input AND output
  • Report includes actionable recommendations

References

See

/references/calibration-data.md
for detailed calibration tables and historical data.