Claude-code-plugins-plus openrouter-cost-controls
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/openrouter-pack/skills/openrouter-cost-controls" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-openrouter-cost-controls && rm -rf "$T"
manifest:
plugins/saas-packs/openrouter-pack/skills/openrouter-cost-controls/SKILL.mdsource content
OpenRouter Cost Controls
Overview
OpenRouter provides per-key credit limits, a credit balance API, and per-generation cost queries. Combined with client-side budget middleware, you can enforce hard spending caps at the key level and soft caps in your application. This skill covers key-level limits, per-request cost tracking, budget enforcement middleware, and alert systems.
Check Credit Balance
# Current balance and limits curl -s https://openrouter.ai/api/v1/auth/key \ -H "Authorization: Bearer $OPENROUTER_API_KEY" | jq '{ credits_used: .data.usage, credit_limit: .data.limit, remaining: ((.data.limit // 0) - .data.usage), is_free_tier: .data.is_free_tier, rate_limit: .data.rate_limit }'
Per-Key Credit Limits
import os, requests MGMT_KEY = os.environ["OPENROUTER_MGMT_KEY"] # Management key # Create a key with a $50 credit limit resp = requests.post( "https://openrouter.ai/api/v1/keys", headers={"Authorization": f"Bearer {MGMT_KEY}"}, json={"name": "backend-prod", "limit": 50.0}, ) new_key = resp.json()["data"]["key"] # sk-or-v1-... # List all keys with their limits and usage keys = requests.get( "https://openrouter.ai/api/v1/keys", headers={"Authorization": f"Bearer {MGMT_KEY}"}, ).json() for k in keys.get("data", []): print(f"{k['name']}: ${k.get('usage', 0):.4f} / ${k.get('limit', 'unlimited')}")
Budget Enforcement Middleware
import os, time, requests from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"}, ) class BudgetEnforcer: """Client-side budget enforcement with server-side cost verification.""" def __init__(self, daily_limit: float = 10.0, per_request_limit: float = 0.50): self.daily_limit = daily_limit self.per_request_limit = per_request_limit self._daily_spend = 0.0 self._day = time.strftime("%Y-%m-%d") def _reset_if_new_day(self): today = time.strftime("%Y-%m-%d") if today != self._day: self._daily_spend = 0.0 self._day = today def estimate_cost(self, model: str, prompt_tokens: int, max_tokens: int) -> float: """Pre-flight cost estimate using cached pricing.""" # Representative rates (fetch from /models in production) RATES = { "anthropic/claude-3.5-sonnet": (3.0, 15.0), # per 1M tokens "openai/gpt-4o": (2.50, 10.0), "openai/gpt-4o-mini": (0.15, 0.60), "meta-llama/llama-3.1-8b-instruct": (0.06, 0.06), } prompt_rate, comp_rate = RATES.get(model, (3.0, 15.0)) return (prompt_tokens * prompt_rate / 1_000_000) + (max_tokens * comp_rate / 1_000_000) def check_budget(self, model: str, prompt_tokens: int, max_tokens: int): """Raise if request would exceed budget.""" self._reset_if_new_day() estimated = self.estimate_cost(model, prompt_tokens, max_tokens) if estimated > self.per_request_limit: raise ValueError( f"Request estimated at ${estimated:.4f} exceeds per-request limit ${self.per_request_limit}" ) if self._daily_spend + estimated > self.daily_limit: raise ValueError( f"Daily spend ${self._daily_spend:.4f} + request ${estimated:.4f} " f"exceeds daily limit ${self.daily_limit}" ) def record_cost(self, generation_id: str): """Record actual cost from generation endpoint.""" try: gen = requests.get( f"https://openrouter.ai/api/v1/generation?id={generation_id}", headers={"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}"}, timeout=5, ).json() cost = float(gen.get("data", {}).get("total_cost", 0)) self._daily_spend += cost return cost except Exception: return 0.0 budget = BudgetEnforcer(daily_limit=25.0, per_request_limit=1.0)
Cost-Saving Model Variants
# :floor variant -- cheapest provider for a model response = client.chat.completions.create( model="anthropic/claude-3.5-sonnet:floor", # Cheapest provider messages=[{"role": "user", "content": "Summarize this..."}], max_tokens=500, ) # :free variant -- free providers (where available) response = client.chat.completions.create( model="google/gemma-2-9b-it:free", messages=[{"role": "user", "content": "Hello"}], max_tokens=100, ) # Route simple tasks to cheap models ROUTING = { "classification": "openai/gpt-4o-mini", # $0.15/$0.60 per 1M "summarization": "anthropic/claude-3-haiku", # $0.25/$1.25 per 1M "code_generation": "anthropic/claude-3.5-sonnet", # $3/$15 per 1M "simple_qa": "meta-llama/llama-3.1-8b-instruct", # $0.06/$0.06 per 1M }
Budget Alert Script
#!/bin/bash # Alert when credits drop below threshold THRESHOLD=5.0 REMAINING=$(curl -s https://openrouter.ai/api/v1/auth/key \ -H "Authorization: Bearer $OPENROUTER_API_KEY" | \ jq '((.data.limit // 0) - .data.usage)') if (( $(echo "$REMAINING < $THRESHOLD" | bc -l) )); then echo "ALERT: OpenRouter credits low: \$$REMAINING remaining" # Send to Slack, PagerDuty, etc. fi
Error Handling
| Error | Cause | Fix |
|---|---|---|
| 402 Payment Required | Credits exhausted | Top up at openrouter.ai/credits or use model |
| 402 Key limit reached | Per-key credit limit hit | Increase key limit or create new key |
| Budget middleware rejects | Client-side limit exceeded | Increase limit or optimize prompt tokens |
| Stale pricing data | Cached rates outdated | Refresh from daily |
Enterprise Considerations
- Set per-key credit limits via management API to isolate blast radius per service/team
- Query
after each request for exact cost auditing/api/v1/generation?id= - Use
variant to automatically pick the cheapest provider for a model:floor - Route simple tasks to budget models ($0.06/1M) and reserve premium models for complex tasks
- Set
on every request to cap completion costmax_tokens - Enable auto-topup in the dashboard to prevent production service interruptions