git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/TerminalSkills/skills/litellm" ~/.claude/skills/comeonoliver-skillshub-litellm && rm -rf "$T"
skills/TerminalSkills/skills/litellm/SKILL.mdLiteLLM
Overview
LiteLLM provides a single API to call 100+ LLM providers — OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure, Bedrock, Ollama, and more. Write your code once using the OpenAI SDK format, then switch providers by changing a model string. As a proxy server, it adds load balancing, fallbacks, rate limiting, spend tracking, and API key management for teams.
When to Use
- Using multiple LLM providers and want a unified interface
- Need automatic fallbacks (if Claude is down, use GPT)
- Cost tracking across multiple providers and teams
- Load balancing requests across multiple API keys or models
- Self-hosted proxy to manage LLM access for a team
Instructions
Setup
pip install litellm # Or run as proxy server pip install 'litellm[proxy]'
SDK Usage (Python)
# llm.py — Call any LLM with the same interface from litellm import completion # OpenAI response = completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], ) # Anthropic — same interface, just change the model string response = completion( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello!"}], ) # Google Gemini response = completion( model="gemini/gemini-2.0-flash", messages=[{"role": "user", "content": "Hello!"}], ) # Local Ollama response = completion( model="ollama/llama3", messages=[{"role": "user", "content": "Hello!"}], api_base="http://localhost:11434", ) # All return the same response format (OpenAI-compatible) print(response.choices[0].message.content)
Proxy Server
# litellm_config.yaml — Proxy configuration model_list: - model_name: "fast" litellm_params: model: gpt-4o-mini api_key: sk-... - model_name: "smart" litellm_params: model: claude-sonnet-4-20250514 api_key: sk-ant-... - model_name: "smart" # Second "smart" model = load balancing litellm_params: model: gpt-4o api_key: sk-... - model_name: "cheap" litellm_params: model: gemini/gemini-2.0-flash api_key: AIza... router_settings: routing_strategy: "latency-based-routing" num_retries: 3 timeout: 30 fallbacks: [{"smart": ["fast"]}] # If smart fails, use fast general_settings: master_key: "sk-master-key-xxx" # Admin key
# Start proxy litellm --config litellm_config.yaml --port 4000 # Call via OpenAI SDK (any language!) curl http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer sk-master-key-xxx" \ -d '{"model": "smart", "messages": [{"role": "user", "content": "Hello"}]}'
Node.js via Proxy
// app.ts — Use any OpenAI SDK client with LiteLLM proxy import OpenAI from "openai"; const client = new OpenAI({ baseURL: "http://localhost:4000/v1", apiKey: "sk-master-key-xxx", }); // Calls route to Claude or GPT based on load balancing config const response = await client.chat.completions.create({ model: "smart", messages: [{ role: "user", content: "Explain monads simply." }], });
Spend Tracking
# Track costs per team/user/project from litellm import completion response = completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], metadata={ "user": "user-123", "team": "engineering", "project": "chatbot", }, ) # LiteLLM proxy stores costs in its database # Query via API: GET /spend/logs?user=user-123
Examples
Example 1: Multi-provider AI application
User prompt: "My app uses Claude for reasoning and GPT-4o for function calling. Set up a unified interface."
The agent will configure LiteLLM with named model groups, route by capability, and add fallbacks between providers.
Example 2: Team LLM gateway with cost controls
User prompt: "Set up an LLM proxy for our team with per-user rate limits and spend tracking."
The agent will deploy the LiteLLM proxy, configure API keys per team member, set rate limits and budget caps, and enable spend logging.
Guidelines
- Model format:
—provider/model
,anthropic/claude-sonnet-4-20250514gemini/gemini-2.0-flash - Proxy for teams — centralize API keys, track spend, enforce rate limits
- Fallbacks for reliability — if primary model fails, route to backup
- Load balancing — multiple entries with same
distribute trafficmodel_name - Latency-based routing — LiteLLM picks the fastest responding provider
- Spend tracking — costs calculated per-request, queryable via API
- OpenAI SDK compatible — any OpenAI client library works with the proxy
- Streaming works —
works across all providersstream=True - Environment variables —
,OPENAI_API_KEY
etc. auto-detectedANTHROPIC_API_KEY