Skills ai-infrastructure-litellm
LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment
git clone https://github.com/agents-inc/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/ai-infrastructure-litellm/skills/ai-infrastructure-litellm" ~/.claude/skills/agents-inc-skills-ai-infrastructure-litellm && rm -rf "$T"
dist/plugins/ai-infrastructure-litellm/skills/ai-infrastructure-litellm/SKILL.mdLiteLLM Proxy Patterns
Quick Guide: LiteLLM is an OpenAI-compatible proxy (AI gateway) that routes requests to 100+ LLM providers. TypeScript clients connect via the standard OpenAI SDK with
pointed at the proxy. Configure models, fallbacks, load balancing, and budgets inbaseURL. Useconfig.yamlformat inprovider/model-name(e.g.,litellm_params.model). Theanthropic/claude-sonnet-4-20250514in config is the user-facing alias clients request. Virtual keys require PostgreSQL. Master key must start withmodel_name.sk-
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use the
format in provider/model-name
-- e.g., litellm_params.model
, anthropic/claude-sonnet-4-20250514
, openai/gpt-4o
-- the provider prefix is how LiteLLM routes to the correct API)azure/my-deployment
(You MUST set
as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model_name
)model
(You MUST point the OpenAI SDK
at the proxy URL (e.g., baseURL
) and pass the proxy key as http://localhost:4000
-- do NOT use provider API keys directly in client code)apiKey
(You MUST start master keys with
-- LiteLLM rejects master keys that do not follow this prefix convention)sk-
(You MUST configure
pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)database_url
</critical_requirements>
Auto-detection: LiteLLM, litellm, litellm_params, litellm_settings, LLM proxy, LLM gateway, model_list, master_key, virtual keys, model fallback, load balancing LLM, provider/model, anthropic/claude, openai/gpt, azure/, litellm --config, LITELLM_MASTER_KEY, LITELLM_SALT_KEY
When to use:
- Running a unified LLM gateway that routes to multiple providers (OpenAI, Anthropic, Azure, Bedrock, etc.)
- Configuring model fallbacks, load balancing, or routing strategies across deployments
- Managing API key access with virtual keys, per-key budgets, and rate limits
- Tracking spend across models, teams, users, and tags
- Deploying a self-hosted OpenAI-compatible proxy with Docker
Key patterns covered:
- Proxy server config.yaml structure (model_list, litellm_settings, router_settings, general_settings)
- TypeScript client setup via OpenAI SDK pointed at proxy
- Model routing with provider prefixes and user-facing aliases
- Fallback chains (regular, context window, content policy, default)
- Load balancing strategies (simple-shuffle, least-busy, usage-based, latency-based, cost-based)
- Virtual keys with budgets, rate limits, and model restrictions
- Spend tracking per key, user, team, and tag
- Docker Compose production deployment
When NOT to use:
- Calling a single LLM provider directly with no proxy layer -- use the provider's SDK directly
- Building a Python application that calls LiteLLM as a library -- this skill covers the proxy server + TypeScript client pattern
- When you need framework-specific chat UI hooks -- use a framework-integrated AI SDK
Examples Index
- Core: Config & Client Setup -- config.yaml structure, TypeScript OpenAI SDK client, model routing, Docker deployment
- Routing & Reliability -- Fallbacks, load balancing, cooldowns, retries, priority routing
- Keys & Spend -- Virtual keys, budgets, rate limits, spend tracking, team management
<philosophy>
Philosophy
LiteLLM Proxy is an AI gateway -- a single OpenAI-compatible endpoint that routes to 100+ LLM providers. TypeScript applications never talk to providers directly; they talk to the proxy using the standard OpenAI SDK.
Core principles:
- Provider abstraction -- Client code uses a single
and standard OpenAI SDK. Switching providers means changingbaseURL
, not application code.config.yaml - Two-layer naming --
is what clients request (e.g.,model_name
)."claude-sonnet"
is the actual provider routing (e.g.,litellm_params.model
). This decouples client code from provider specifics."anthropic/claude-sonnet-4-20250514" - Resilience via config -- Fallbacks, retries, load balancing, and cooldowns are all declared in
. No application-level retry logic needed.config.yaml - Spend governance -- Virtual keys, per-key budgets, rate limits, and tag-based tracking give fine-grained cost control without changing client code.
- OpenAI compatibility -- Any client, SDK, or tool that works with OpenAI's API works with LiteLLM. No custom SDK required.
<patterns>
Core Patterns
Pattern 1: Minimal config.yaml
The proxy needs a
config.yaml with at least one model defined. model_name is client-facing; litellm_params.model is the provider route.
# config.yaml model_list: - model_name: claude-sonnet # What clients request litellm_params: model: anthropic/claude-sonnet-4-20250514 # Provider/model route api_key: os.environ/ANTHROPIC_API_KEY # Never hardcode keys - model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY
Why good: Two-layer naming decouples clients from providers,
os.environ/ syntax reads secrets from environment at runtime
# BAD: Missing provider prefix, hardcoded key model_list: - model_name: claude-sonnet-4-20250514 # Using provider model ID as name litellm_params: model: claude-sonnet-4-20250514 # No provider prefix -- routing fails api_key: sk-ant-abc123 # Hardcoded API key
Why bad: Without
anthropic/ prefix, LiteLLM cannot route to the correct provider; hardcoded keys are a security risk; using the provider model ID as model_name couples clients to provider naming
See: examples/core.md for complete config with general_settings, Docker setup
Pattern 2: TypeScript Client via OpenAI SDK
Connect to the proxy using the standard OpenAI SDK. Point
baseURL at the proxy, use the proxy key as apiKey.
// lib/llm-client.ts import OpenAI from "openai"; const PROXY_URL = "http://localhost:4000"; const client = new OpenAI({ baseURL: PROXY_URL, apiKey: process.env.LITELLM_API_KEY, // Virtual key or master key }); export { client };
// usage.ts import { client } from "./lib/llm-client.js"; const completion = await client.chat.completions.create({ model: "claude-sonnet", // model_name from config.yaml, NOT provider model ID messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Explain TypeScript generics." }, ], }); console.log(completion.choices[0].message.content);
Why good: Standard OpenAI SDK, no custom dependencies; model name matches config.yaml
model_name; proxy key keeps provider keys server-side
// BAD: Using provider model ID, provider API key const client = new OpenAI({ baseURL: "http://localhost:4000", apiKey: process.env.ANTHROPIC_API_KEY, // Wrong -- use proxy key }); const completion = await client.chat.completions.create({ model: "anthropic/claude-sonnet-4-20250514", // Wrong -- use model_name alias messages: [{ role: "user", content: "Hello" }], });
Why bad: Provider API key bypasses proxy auth and virtual key controls; using provider model ID instead of alias couples client to provider naming and bypasses proxy routing logic
See: examples/core.md for streaming, metadata tagging
Pattern 3: Fallback Chains
Configure model fallbacks so requests automatically retry on a different model when the primary fails.
# config.yaml model_list: - model_name: claude-sonnet litellm_params: model: anthropic/claude-sonnet-4-20250514 api_key: os.environ/ANTHROPIC_API_KEY - model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY litellm_settings: num_retries: 2 # Retries per model before fallback fallbacks: [{ "claude-sonnet": ["gpt-4o"] }] # General fallback chain context_window_fallbacks: [{ "gpt-4o": ["claude-sonnet"] }] # Context overflow fallback default_fallbacks: ["gpt-4o"] # Catch-all for any model failure
Why good: Fallbacks use
model_name aliases (not provider IDs), ordered chains tried sequentially, separate chains for context overflow vs general errors
See: examples/routing.md for content policy fallbacks, combining with load balancing
Pattern 4: Load Balancing Across Deployments
Multiple entries with the same
model_name create a load-balanced group. The proxy distributes requests using the configured strategy.
model_list: - model_name: gpt-4o litellm_params: model: azure/gpt-4o-eastus api_base: https://eastus.openai.azure.com/ api_key: os.environ/AZURE_EASTUS_KEY rpm: 100 # Requests per minute for this deployment - model_name: gpt-4o litellm_params: model: azure/gpt-4o-westus api_base: https://westus.openai.azure.com/ api_key: os.environ/AZURE_WESTUS_KEY rpm: 100 router_settings: routing_strategy: usage-based-routing # Route to deployment with lowest RPM/TPM usage num_retries: 2 timeout: 30
Why good: Same
model_name across entries creates automatic load balancing, rpm/tpm limits per deployment enable usage-aware routing
See: examples/routing.md for all five routing strategies, priority routing with
order
Pattern 5: Virtual Keys with Budgets
Virtual keys let you distribute access with per-key budgets, rate limits, and model restrictions. Requires PostgreSQL.
# config.yaml general_settings: master_key: sk-litellm-master-key-change-me # Must start with sk- database_url: os.environ/DATABASE_URL # PostgreSQL required
# Generate a virtual key via API curl 'http://localhost:4000/key/generate' \ -H 'Authorization: Bearer sk-litellm-master-key-change-me' \ -H 'Content-Type: application/json' \ -d '{ "models": ["claude-sonnet", "gpt-4o"], "max_budget": 50.0, "duration": "30d", "metadata": {"team": "backend", "project": "search"} }' # Returns: { "key": "sk-generated-key-abc123", ... }
Why good: Per-key model restrictions, budget caps, and expiry; metadata enables tag-based spend tracking; master key authentication protects key generation
See: examples/keys-and-spend.md for team management, spend queries, rate limit tiers
Pattern 6: Spend Tracking with Tags
Attach metadata tags to requests for granular cost attribution. The proxy tracks spend automatically per key, user, team, and tag.
// Tag requests for cost attribution const completion = await client.chat.completions.create({ model: "claude-sonnet", messages: [{ role: "user", content: "Summarize this document." }], // LiteLLM-specific: pass metadata for spend tracking metadata: { tags: ["project:search", "team:backend"], trace_user_id: "user-123", }, } as any); // metadata is a LiteLLM extension, not in OpenAI types
Why good: Tags enable cost attribution by project, team, or feature without changing model routing; cost appears in
x-litellm-response-cost response header
When to use: When you need cost visibility across teams, projects, or features
See: examples/keys-and-spend.md for querying spend by tag, user, and team
</patterns><decision_framework>
Decision Framework
Do You Need a Proxy?
Do you call multiple LLM providers? +-- YES -> LiteLLM Proxy adds value (unified API, routing, fallbacks) +-- NO -> Do you need budgets, rate limits, or virtual keys? +-- YES -> LiteLLM Proxy (governance layer) +-- NO -> Do you need fallbacks or load balancing? +-- YES -> LiteLLM Proxy (reliability layer) +-- NO -> Use the provider SDK directly (simpler)
Which Routing Strategy?
What is your priority? +-- Even distribution -> simple-shuffle (default) +-- Minimize latency -> latency-based-routing +-- Respect rate limits -> usage-based-routing +-- Minimize cost -> cost-based-routing +-- Handle concurrent load -> least-busy
Virtual Keys vs Master Key Only
Do you have multiple teams or users? +-- YES -> Virtual keys (per-team budgets, model restrictions) | Requires: PostgreSQL database +-- NO -> Do you need spend tracking? +-- YES -> Virtual keys (even for single user, enables spend logs) | Requires: PostgreSQL database +-- NO -> Master key only (simplest setup, no database needed)
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Missing provider prefix in
(e.g.,litellm_params.model
instead ofclaude-sonnet-4-20250514
) -- proxy cannot route without the prefixanthropic/claude-sonnet-4-20250514 - Hardcoding provider API keys in config.yaml instead of using
-- security breach riskos.environ/VAR_NAME - Using provider model IDs as
-- couples all clients to provider naming, breaks when you switch providersmodel_name - Master key not starting with
-- LiteLLM silently rejects itsk- - Using virtual keys without a PostgreSQL
-- key generation failsdatabase_url
Medium Priority Issues:
- Not setting
innum_retries
-- defaults to 0, no retries on transient failureslitellm_settings - Confusing
(client-facing alias) withmodel_name
(provider route) -- most common config mistakelitellm_params.model - Not setting
/rpm
on deployments when usingtpm
-- routing strategy has no data to work withusage-based-routing - Missing
in production -- virtual key credentials stored without encryptionLITELLM_SALT_KEY
Common Mistakes:
- Passing
as theanthropic/claude-sonnet-4-20250514
parameter in TypeScript client code -- use themodel
alias insteadmodel_name - Expecting
field to be typed in OpenAI SDK -- it is a LiteLLM extension, requiresmetadata
oras anyextra_body - Setting fallbacks using provider model IDs instead of
aliases -- fallbacks reference model names, not provider routesmodel_name - Forgetting that
changes require proxy restart (or use theconfig.yaml
API endpoint)/config/update
Gotchas & Edge Cases:
- The
syntax in config.yaml (noos.environ/
prefix) is LiteLLM-specific -- not standard YAML environment variable substitution$
matching is exact --model_name
and"claude-sonnet"
are different models"Claude-Sonnet"- When using
, they do NOT apply todefault_fallbacks
orContentPolicyViolationError
-- use specialized fallback types for thoseContextWindowExceededError - The proxy adds a network hop -- expect 5-20ms additional latency compared to direct provider calls
/rpm
limits in config are per-deployment, not per-model-group -- a model group with 3 deployments attpm
each gets 300 RPM totalrpm: 100- Virtual key spend tracking is eventually consistent -- the
field on a key may lag a few seconds behind actual usagespend - The
prefix on endpoints is optional -- both/v1/
andhttp://localhost:4000/chat/completions
workhttp://localhost:4000/v1/chat/completions - Streaming through the proxy works transparently -- no special configuration needed on the proxy side
- The LiteLLM admin UI is available at
when the proxy is runninghttp://localhost:4000/ui
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use the
format in provider/model-name
-- e.g., litellm_params.model
, anthropic/claude-sonnet-4-20250514
, openai/gpt-4o
-- the provider prefix is how LiteLLM routes to the correct API)azure/my-deployment
(You MUST set
as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as model_name
)model
(You MUST point the OpenAI SDK
at the proxy URL (e.g., baseURL
) and pass the proxy key as http://localhost:4000
-- do NOT use provider API keys directly in client code)apiKey
(You MUST start master keys with
-- LiteLLM rejects master keys that do not follow this prefix convention)sk-
(You MUST configure
pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)database_url
Failure to follow these rules will produce misconfigured proxies with broken routing, security issues, or missing spend data.
</critical_reminders>