Skills ai-infrastructure-litellm

LiteLLM proxy server setup, TypeScript client patterns via OpenAI SDK, model routing, fallbacks, load balancing, spend tracking, virtual keys, and production deployment

install
source · Clone the upstream repo
git clone https://github.com/agents-inc/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/ai-infrastructure-litellm/skills/ai-infrastructure-litellm" ~/.claude/skills/agents-inc-skills-ai-infrastructure-litellm && rm -rf "$T"
manifest: dist/plugins/ai-infrastructure-litellm/skills/ai-infrastructure-litellm/SKILL.md
source content

LiteLLM Proxy Patterns

Quick Guide: LiteLLM is an OpenAI-compatible proxy (AI gateway) that routes requests to 100+ LLM providers. TypeScript clients connect via the standard OpenAI SDK with

baseURL
pointed at the proxy. Configure models, fallbacks, load balancing, and budgets in
config.yaml
. Use
provider/model-name
format in
litellm_params.model
(e.g.,
anthropic/claude-sonnet-4-20250514
). The
model_name
in config is the user-facing alias clients request. Virtual keys require PostgreSQL. Master key must start with
sk-
.


<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,

import type
, named constants)

(You MUST use the

provider/model-name
format in
litellm_params.model
-- e.g.,
anthropic/claude-sonnet-4-20250514
,
openai/gpt-4o
,
azure/my-deployment
-- the provider prefix is how LiteLLM routes to the correct API)

(You MUST set

model_name
as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as
model
)

(You MUST point the OpenAI SDK

baseURL
at the proxy URL (e.g.,
http://localhost:4000
) and pass the proxy key as
apiKey
-- do NOT use provider API keys directly in client code)

(You MUST start master keys with

sk-
-- LiteLLM rejects master keys that do not follow this prefix convention)

(You MUST configure

database_url
pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)

</critical_requirements>


Auto-detection: LiteLLM, litellm, litellm_params, litellm_settings, LLM proxy, LLM gateway, model_list, master_key, virtual keys, model fallback, load balancing LLM, provider/model, anthropic/claude, openai/gpt, azure/, litellm --config, LITELLM_MASTER_KEY, LITELLM_SALT_KEY

When to use:

  • Running a unified LLM gateway that routes to multiple providers (OpenAI, Anthropic, Azure, Bedrock, etc.)
  • Configuring model fallbacks, load balancing, or routing strategies across deployments
  • Managing API key access with virtual keys, per-key budgets, and rate limits
  • Tracking spend across models, teams, users, and tags
  • Deploying a self-hosted OpenAI-compatible proxy with Docker

Key patterns covered:

  • Proxy server config.yaml structure (model_list, litellm_settings, router_settings, general_settings)
  • TypeScript client setup via OpenAI SDK pointed at proxy
  • Model routing with provider prefixes and user-facing aliases
  • Fallback chains (regular, context window, content policy, default)
  • Load balancing strategies (simple-shuffle, least-busy, usage-based, latency-based, cost-based)
  • Virtual keys with budgets, rate limits, and model restrictions
  • Spend tracking per key, user, team, and tag
  • Docker Compose production deployment

When NOT to use:

  • Calling a single LLM provider directly with no proxy layer -- use the provider's SDK directly
  • Building a Python application that calls LiteLLM as a library -- this skill covers the proxy server + TypeScript client pattern
  • When you need framework-specific chat UI hooks -- use a framework-integrated AI SDK

Examples Index


<philosophy>

Philosophy

LiteLLM Proxy is an AI gateway -- a single OpenAI-compatible endpoint that routes to 100+ LLM providers. TypeScript applications never talk to providers directly; they talk to the proxy using the standard OpenAI SDK.

Core principles:

  1. Provider abstraction -- Client code uses a single
    baseURL
    and standard OpenAI SDK. Switching providers means changing
    config.yaml
    , not application code.
  2. Two-layer naming --
    model_name
    is what clients request (e.g.,
    "claude-sonnet"
    ).
    litellm_params.model
    is the actual provider routing (e.g.,
    "anthropic/claude-sonnet-4-20250514"
    ). This decouples client code from provider specifics.
  3. Resilience via config -- Fallbacks, retries, load balancing, and cooldowns are all declared in
    config.yaml
    . No application-level retry logic needed.
  4. Spend governance -- Virtual keys, per-key budgets, rate limits, and tag-based tracking give fine-grained cost control without changing client code.
  5. OpenAI compatibility -- Any client, SDK, or tool that works with OpenAI's API works with LiteLLM. No custom SDK required.
</philosophy>
<patterns>

Core Patterns

Pattern 1: Minimal config.yaml

The proxy needs a

config.yaml
with at least one model defined.
model_name
is client-facing;
litellm_params.model
is the provider route.

# config.yaml
model_list:
  - model_name: claude-sonnet # What clients request
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514 # Provider/model route
      api_key: os.environ/ANTHROPIC_API_KEY # Never hardcode keys

  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

Why good: Two-layer naming decouples clients from providers,

os.environ/
syntax reads secrets from environment at runtime

# BAD: Missing provider prefix, hardcoded key
model_list:
  - model_name: claude-sonnet-4-20250514 # Using provider model ID as name
    litellm_params:
      model: claude-sonnet-4-20250514 # No provider prefix -- routing fails
      api_key: sk-ant-abc123 # Hardcoded API key

Why bad: Without

anthropic/
prefix, LiteLLM cannot route to the correct provider; hardcoded keys are a security risk; using the provider model ID as
model_name
couples clients to provider naming

See: examples/core.md for complete config with general_settings, Docker setup


Pattern 2: TypeScript Client via OpenAI SDK

Connect to the proxy using the standard OpenAI SDK. Point

baseURL
at the proxy, use the proxy key as
apiKey
.

// lib/llm-client.ts
import OpenAI from "openai";

const PROXY_URL = "http://localhost:4000";

const client = new OpenAI({
  baseURL: PROXY_URL,
  apiKey: process.env.LITELLM_API_KEY, // Virtual key or master key
});

export { client };
// usage.ts
import { client } from "./lib/llm-client.js";

const completion = await client.chat.completions.create({
  model: "claude-sonnet", // model_name from config.yaml, NOT provider model ID
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain TypeScript generics." },
  ],
});

console.log(completion.choices[0].message.content);

Why good: Standard OpenAI SDK, no custom dependencies; model name matches config.yaml

model_name
; proxy key keeps provider keys server-side

// BAD: Using provider model ID, provider API key
const client = new OpenAI({
  baseURL: "http://localhost:4000",
  apiKey: process.env.ANTHROPIC_API_KEY, // Wrong -- use proxy key
});

const completion = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-20250514", // Wrong -- use model_name alias
  messages: [{ role: "user", content: "Hello" }],
});

Why bad: Provider API key bypasses proxy auth and virtual key controls; using provider model ID instead of alias couples client to provider naming and bypasses proxy routing logic

See: examples/core.md for streaming, metadata tagging


Pattern 3: Fallback Chains

Configure model fallbacks so requests automatically retry on a different model when the primary fails.

# config.yaml
model_list:
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  num_retries: 2 # Retries per model before fallback
  fallbacks: [{ "claude-sonnet": ["gpt-4o"] }] # General fallback chain
  context_window_fallbacks: [{ "gpt-4o": ["claude-sonnet"] }] # Context overflow fallback
  default_fallbacks: ["gpt-4o"] # Catch-all for any model failure

Why good: Fallbacks use

model_name
aliases (not provider IDs), ordered chains tried sequentially, separate chains for context overflow vs general errors

See: examples/routing.md for content policy fallbacks, combining with load balancing


Pattern 4: Load Balancing Across Deployments

Multiple entries with the same

model_name
create a load-balanced group. The proxy distributes requests using the configured strategy.

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-eastus
      api_base: https://eastus.openai.azure.com/
      api_key: os.environ/AZURE_EASTUS_KEY
      rpm: 100 # Requests per minute for this deployment

  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o-westus
      api_base: https://westus.openai.azure.com/
      api_key: os.environ/AZURE_WESTUS_KEY
      rpm: 100

router_settings:
  routing_strategy: usage-based-routing # Route to deployment with lowest RPM/TPM usage
  num_retries: 2
  timeout: 30

Why good: Same

model_name
across entries creates automatic load balancing,
rpm
/
tpm
limits per deployment enable usage-aware routing

See: examples/routing.md for all five routing strategies, priority routing with

order


Pattern 5: Virtual Keys with Budgets

Virtual keys let you distribute access with per-key budgets, rate limits, and model restrictions. Requires PostgreSQL.

# config.yaml
general_settings:
  master_key: sk-litellm-master-key-change-me # Must start with sk-
  database_url: os.environ/DATABASE_URL # PostgreSQL required
# Generate a virtual key via API
curl 'http://localhost:4000/key/generate' \
  -H 'Authorization: Bearer sk-litellm-master-key-change-me' \
  -H 'Content-Type: application/json' \
  -d '{
    "models": ["claude-sonnet", "gpt-4o"],
    "max_budget": 50.0,
    "duration": "30d",
    "metadata": {"team": "backend", "project": "search"}
  }'
# Returns: { "key": "sk-generated-key-abc123", ... }

Why good: Per-key model restrictions, budget caps, and expiry; metadata enables tag-based spend tracking; master key authentication protects key generation

See: examples/keys-and-spend.md for team management, spend queries, rate limit tiers


Pattern 6: Spend Tracking with Tags

Attach metadata tags to requests for granular cost attribution. The proxy tracks spend automatically per key, user, team, and tag.

// Tag requests for cost attribution
const completion = await client.chat.completions.create({
  model: "claude-sonnet",
  messages: [{ role: "user", content: "Summarize this document." }],
  // LiteLLM-specific: pass metadata for spend tracking
  metadata: {
    tags: ["project:search", "team:backend"],
    trace_user_id: "user-123",
  },
} as any); // metadata is a LiteLLM extension, not in OpenAI types

Why good: Tags enable cost attribution by project, team, or feature without changing model routing; cost appears in

x-litellm-response-cost
response header

When to use: When you need cost visibility across teams, projects, or features

See: examples/keys-and-spend.md for querying spend by tag, user, and team

</patterns>

<decision_framework>

Decision Framework

Do You Need a Proxy?

Do you call multiple LLM providers?
+-- YES -> LiteLLM Proxy adds value (unified API, routing, fallbacks)
+-- NO -> Do you need budgets, rate limits, or virtual keys?
    +-- YES -> LiteLLM Proxy (governance layer)
    +-- NO -> Do you need fallbacks or load balancing?
        +-- YES -> LiteLLM Proxy (reliability layer)
        +-- NO -> Use the provider SDK directly (simpler)

Which Routing Strategy?

What is your priority?
+-- Even distribution      -> simple-shuffle (default)
+-- Minimize latency       -> latency-based-routing
+-- Respect rate limits     -> usage-based-routing
+-- Minimize cost           -> cost-based-routing
+-- Handle concurrent load  -> least-busy

Virtual Keys vs Master Key Only

Do you have multiple teams or users?
+-- YES -> Virtual keys (per-team budgets, model restrictions)
|   Requires: PostgreSQL database
+-- NO -> Do you need spend tracking?
    +-- YES -> Virtual keys (even for single user, enables spend logs)
    |   Requires: PostgreSQL database
    +-- NO -> Master key only (simplest setup, no database needed)

</decision_framework>


<red_flags>

RED FLAGS

High Priority Issues:

  • Missing provider prefix in
    litellm_params.model
    (e.g.,
    claude-sonnet-4-20250514
    instead of
    anthropic/claude-sonnet-4-20250514
    ) -- proxy cannot route without the prefix
  • Hardcoding provider API keys in config.yaml instead of using
    os.environ/VAR_NAME
    -- security breach risk
  • Using provider model IDs as
    model_name
    -- couples all clients to provider naming, breaks when you switch providers
  • Master key not starting with
    sk-
    -- LiteLLM silently rejects it
  • Using virtual keys without a PostgreSQL
    database_url
    -- key generation fails

Medium Priority Issues:

  • Not setting
    num_retries
    in
    litellm_settings
    -- defaults to 0, no retries on transient failures
  • Confusing
    model_name
    (client-facing alias) with
    litellm_params.model
    (provider route) -- most common config mistake
  • Not setting
    rpm
    /
    tpm
    on deployments when using
    usage-based-routing
    -- routing strategy has no data to work with
  • Missing
    LITELLM_SALT_KEY
    in production -- virtual key credentials stored without encryption

Common Mistakes:

  • Passing
    anthropic/claude-sonnet-4-20250514
    as the
    model
    parameter in TypeScript client code -- use the
    model_name
    alias instead
  • Expecting
    metadata
    field to be typed in OpenAI SDK -- it is a LiteLLM extension, requires
    as any
    or
    extra_body
  • Setting fallbacks using provider model IDs instead of
    model_name
    aliases -- fallbacks reference model names, not provider routes
  • Forgetting that
    config.yaml
    changes require proxy restart (or use the
    /config/update
    API endpoint)

Gotchas & Edge Cases:

  • The
    os.environ/
    syntax in config.yaml (no
    $
    prefix) is LiteLLM-specific -- not standard YAML environment variable substitution
  • model_name
    matching is exact --
    "claude-sonnet"
    and
    "Claude-Sonnet"
    are different models
  • When using
    default_fallbacks
    , they do NOT apply to
    ContentPolicyViolationError
    or
    ContextWindowExceededError
    -- use specialized fallback types for those
  • The proxy adds a network hop -- expect 5-20ms additional latency compared to direct provider calls
  • rpm
    /
    tpm
    limits in config are per-deployment, not per-model-group -- a model group with 3 deployments at
    rpm: 100
    each gets 300 RPM total
  • Virtual key spend tracking is eventually consistent -- the
    spend
    field on a key may lag a few seconds behind actual usage
  • The
    /v1/
    prefix on endpoints is optional -- both
    http://localhost:4000/chat/completions
    and
    http://localhost:4000/v1/chat/completions
    work
  • Streaming through the proxy works transparently -- no special configuration needed on the proxy side
  • The LiteLLM admin UI is available at
    http://localhost:4000/ui
    when the proxy is running

</red_flags>


<critical_reminders>

CRITICAL REMINDERS

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,

import type
, named constants)

(You MUST use the

provider/model-name
format in
litellm_params.model
-- e.g.,
anthropic/claude-sonnet-4-20250514
,
openai/gpt-4o
,
azure/my-deployment
-- the provider prefix is how LiteLLM routes to the correct API)

(You MUST set

model_name
as the user-facing alias that clients request -- this is NOT the provider model ID, it is the name your TypeScript client passes as
model
)

(You MUST point the OpenAI SDK

baseURL
at the proxy URL (e.g.,
http://localhost:4000
) and pass the proxy key as
apiKey
-- do NOT use provider API keys directly in client code)

(You MUST start master keys with

sk-
-- LiteLLM rejects master keys that do not follow this prefix convention)

(You MUST configure

database_url
pointing to PostgreSQL before using virtual keys, spend tracking, or team/user management -- these features require persistent storage)

Failure to follow these rules will produce misconfigured proxies with broken routing, security issues, or missing spend data.

</critical_reminders>