install
source · Clone the upstream repo
git clone https://github.com/MacPhobos/research-mind
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/MacPhobos/research-mind "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/toolchains-ai-services-openrouter" ~/.claude/skills/macphobos-research-mind-toolchains-ai-services-openrouter && rm -rf "$T"
manifest:
.claude/skills/toolchains-ai-services-openrouter/skill.mdsource content
OpenRouter - Unified AI API Gateway
Overview
OpenRouter provides a single API to access 200+ language models from OpenAI, Anthropic, Google, Meta, Mistral, and more. It offers intelligent routing, streaming, cost optimization, and standardized OpenAI-compatible interface.
Key Features:
- Access 200+ models through one API
- OpenAI-compatible interface (drop-in replacement)
- Intelligent model routing and fallbacks
- Real-time streaming responses
- Cost tracking and optimization
- Model performance analytics
- Function calling support
- Vision model support
Pricing Model:
- Pay-per-token (no subscriptions)
- Volume discounts available
- Free tier with credits
- Per-model pricing varies
Installation:
npm install openai # Use OpenAI SDK # or pip install openai # Python
Quick Start
1. Get API Key
# Sign up at https://openrouter.ai/keys export OPENROUTER_API_KEY="sk-or-v1-..."
2. Basic Chat Completion
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: process.env.OPENROUTER_API_KEY, defaultHeaders: { 'HTTP-Referer': 'https://your-app.com', // Optional 'X-Title': 'Your App Name', // Optional } }); async function chat() { const completion = await client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [ { role: 'user', content: 'Explain quantum computing in simple terms' } ], }); console.log(completion.choices[0].message.content); }
3. Streaming Response
async function streamChat() { const stream = await client.chat.completions.create({ model: 'openai/gpt-4-turbo', messages: [ { role: 'user', content: 'Write a short story about AI' } ], stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; process.stdout.write(content); } }
Model Selection Strategy
Available Model Categories
Flagship Models (Highest Quality):
const flagshipModels = { claude: 'anthropic/claude-3.5-sonnet', // Best reasoning gpt4: 'openai/gpt-4-turbo', // Best general purpose gemini: 'google/gemini-pro-1.5', // Best long context opus: 'anthropic/claude-3-opus', // Best complex tasks };
Fast Models (Low Latency):
const fastModels = { claude: 'anthropic/claude-3-haiku', // Fastest Claude gpt35: 'openai/gpt-3.5-turbo', // Fast GPT gemini: 'google/gemini-flash-1.5', // Fast Gemini llama: 'meta-llama/llama-3.1-8b-instruct', // Fast open source };
Cost-Optimized Models:
const budgetModels = { haiku: 'anthropic/claude-3-haiku', // $0.25/$1.25 per 1M tokens gemini: 'google/gemini-flash-1.5', // $0.075/$0.30 per 1M tokens llama: 'meta-llama/llama-3.1-8b-instruct', // $0.06/$0.06 per 1M tokens mixtral: 'mistralai/mixtral-8x7b-instruct', // $0.24/$0.24 per 1M tokens };
Specialized Models:
const specializedModels = { vision: 'openai/gpt-4-vision-preview', // Image understanding code: 'anthropic/claude-3.5-sonnet', // Code generation longContext: 'google/gemini-pro-1.5', // 2M token context function: 'openai/gpt-4-turbo', // Function calling };
Model Selection Logic
interface ModelSelector { task: 'chat' | 'code' | 'vision' | 'function' | 'summary'; priority: 'quality' | 'speed' | 'cost'; maxCost?: number; // Max cost per 1M tokens contextSize?: number; } function selectModel(criteria: ModelSelector): string { if (criteria.task === 'vision') { return 'openai/gpt-4-vision-preview'; } if (criteria.task === 'code') { return criteria.priority === 'quality' ? 'anthropic/claude-3.5-sonnet' : 'meta-llama/llama-3.1-70b-instruct'; } if (criteria.contextSize && criteria.contextSize > 100000) { return 'google/gemini-pro-1.5'; // 2M context } // Default selection by priority switch (criteria.priority) { case 'quality': return 'anthropic/claude-3.5-sonnet'; case 'speed': return 'anthropic/claude-3-haiku'; case 'cost': return criteria.maxCost && criteria.maxCost < 0.5 ? 'google/gemini-flash-1.5' : 'anthropic/claude-3-haiku'; default: return 'openai/gpt-4-turbo'; } } // Usage const model = selectModel({ task: 'code', priority: 'quality', });
Streaming Implementation
TypeScript Streaming with Error Handling
async function robustStreamingChat( prompt: string, model: string = 'anthropic/claude-3.5-sonnet' ) { try { const stream = await client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], stream: true, max_tokens: 4000, }); let fullResponse = ''; for await (const chunk of stream) { const delta = chunk.choices[0]?.delta; if (delta?.content) { fullResponse += delta.content; process.stdout.write(delta.content); } // Handle function calls if (delta?.function_call) { console.log('\nFunction call:', delta.function_call); } // Check for finish reason if (chunk.choices[0]?.finish_reason) { console.log(`\n[Finished: ${chunk.choices[0].finish_reason}]`); } } return fullResponse; } catch (error) { if (error instanceof Error) { console.error('Streaming error:', error.message); } throw error; } }
Python Streaming
from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ.get("OPENROUTER_API_KEY"), ) def stream_chat(prompt: str, model: str = "anthropic/claude-3.5-sonnet"): stream = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], stream=True, ) full_response = "" for chunk in stream: if chunk.choices[0].delta.content: content = chunk.choices[0].delta.content full_response += content print(content, end="", flush=True) print() # New line return full_response
React Streaming Component
import { useState } from 'react'; function StreamingChat() { const [response, setResponse] = useState(''); const [isStreaming, setIsStreaming] = useState(false); async function handleSubmit(prompt: string) { setIsStreaming(true); setResponse(''); try { const res = await fetch('https://openrouter.ai/api/v1/chat/completions', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'anthropic/claude-3.5-sonnet', messages: [{ role: 'user', content: prompt }], stream: true, }), }); const reader = res.body?.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader!.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim()); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6); if (data === '[DONE]') continue; try { const parsed = JSON.parse(data); const content = parsed.choices[0]?.delta?.content || ''; setResponse(prev => prev + content); } catch (e) { // Skip invalid JSON } } } } } catch (error) { console.error('Streaming error:', error); } finally { setIsStreaming(false); } } return ( <div> <textarea value={response} readOnly rows={20} cols={80} placeholder="Response will appear here..." /> <button onClick={() => handleSubmit('Explain AI')}> {isStreaming ? 'Streaming...' : 'Send'} </button> </div> ); }
Function Calling
Basic Function Calling
const tools = [ { type: 'function', function: { name: 'get_weather', description: 'Get current weather for a location', parameters: { type: 'object', properties: { location: { type: 'string', description: 'City name, e.g. San Francisco', }, unit: { type: 'string', enum: ['celsius', 'fahrenheit'], }, }, required: ['location'], }, }, }, ]; async function chatWithFunctions() { const completion = await client.chat.completions.create({ model: 'openai/gpt-4-turbo', messages: [ { role: 'user', content: 'What is the weather in Tokyo?' } ], tools, tool_choice: 'auto', }); const message = completion.choices[0].message; if (message.tool_calls) { for (const toolCall of message.tool_calls) { console.log('Function:', toolCall.function.name); console.log('Arguments:', toolCall.function.arguments); // Execute function const args = JSON.parse(toolCall.function.arguments); const result = await getWeather(args.location, args.unit); // Send result back const followUp = await client.chat.completions.create({ model: 'openai/gpt-4-turbo', messages: [ { role: 'user', content: 'What is the weather in Tokyo?' }, message, { role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result), }, ], tools, }); console.log(followUp.choices[0].message.content); } } }
Multi-Step Function Calling
async function multiStepFunctionCall(userQuery: string) { const messages = [{ role: 'user', content: userQuery }]; let iterationCount = 0; const maxIterations = 5; while (iterationCount < maxIterations) { const completion = await client.chat.completions.create({ model: 'openai/gpt-4-turbo', messages, tools, tool_choice: 'auto', }); const message = completion.choices[0].message; messages.push(message); if (!message.tool_calls) { // No more function calls, return final response return message.content; } // Execute all function calls for (const toolCall of message.tool_calls) { const functionName = toolCall.function.name; const args = JSON.parse(toolCall.function.arguments); // Execute function (implement your function registry) const result = await executeFunctionCall(functionName, args); messages.push({ role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result), }); } iterationCount++; } throw new Error('Max iterations reached'); }
Cost Optimization
Token Counting and Cost Estimation
import { encoding_for_model } from 'tiktoken'; interface CostEstimate { promptTokens: number; completionTokens: number; promptCost: number; completionCost: number; totalCost: number; } const modelPricing = { 'anthropic/claude-3.5-sonnet': { input: 3.00, output: 15.00 }, // per 1M tokens 'anthropic/claude-3-haiku': { input: 0.25, output: 1.25 }, 'openai/gpt-4-turbo': { input: 10.00, output: 30.00 }, 'openai/gpt-3.5-turbo': { input: 0.50, output: 1.50 }, 'google/gemini-flash-1.5': { input: 0.075, output: 0.30 }, }; function estimateCost( prompt: string, expectedCompletion: number, model: string ): CostEstimate { const encoder = encoding_for_model('gpt-4'); // Approximation const promptTokens = encoder.encode(prompt).length; const completionTokens = expectedCompletion; const pricing = modelPricing[model] || { input: 0, output: 0 }; const promptCost = (promptTokens / 1_000_000) * pricing.input; const completionCost = (completionTokens / 1_000_000) * pricing.output; return { promptTokens, completionTokens, promptCost, completionCost, totalCost: promptCost + completionCost, }; } // Usage const estimate = estimateCost( 'Explain quantum computing', 500, // Expected response tokens 'anthropic/claude-3.5-sonnet' ); console.log(`Estimated cost: $${estimate.totalCost.toFixed(4)}`);
Dynamic Model Selection by Budget
async function budgetOptimizedChat( prompt: string, maxCostPerRequest: number = 0.01 // $0.01 max ) { // Estimate with expensive model const expensiveEstimate = estimateCost( prompt, 1000, 'anthropic/claude-3.5-sonnet' ); let selectedModel = 'anthropic/claude-3.5-sonnet'; if (expensiveEstimate.totalCost > maxCostPerRequest) { // Try cheaper models const cheapEstimate = estimateCost( prompt, 1000, 'anthropic/claude-3-haiku' ); if (cheapEstimate.totalCost > maxCostPerRequest) { selectedModel = 'google/gemini-flash-1.5'; } else { selectedModel = 'anthropic/claude-3-haiku'; } } console.log(`Selected model: ${selectedModel}`); const completion = await client.chat.completions.create({ model: selectedModel, messages: [{ role: 'user', content: prompt }], }); return completion.choices[0].message.content; }
Batching for Cost Reduction
async function batchProcess(prompts: string[], model: string) { // Process multiple prompts in parallel with rate limiting const concurrency = 5; const results = []; for (let i = 0; i < prompts.length; i += concurrency) { const batch = prompts.slice(i, i + concurrency); const batchResults = await Promise.all( batch.map(prompt => client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], max_tokens: 500, // Limit tokens to control cost }) ) ); results.push(...batchResults); // Rate limiting delay if (i + concurrency < prompts.length) { await new Promise(resolve => setTimeout(resolve, 1000)); } } return results; }
Model Fallback and Retry Strategy
Automatic Fallback
const modelFallbackChain = [ 'anthropic/claude-3.5-sonnet', 'openai/gpt-4-turbo', 'anthropic/claude-3-haiku', 'google/gemini-flash-1.5', ]; async function chatWithFallback( prompt: string, maxRetries: number = 3 ): Promise<string> { for (const model of modelFallbackChain) { try { console.log(`Trying model: ${model}`); const completion = await client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], max_tokens: 2000, }); return completion.choices[0].message.content || ''; } catch (error) { console.warn(`Model ${model} failed:`, error); // Continue to next model if (model === modelFallbackChain[modelFallbackChain.length - 1]) { throw new Error('All models failed'); } } } throw new Error('No models available'); }
Exponential Backoff for Rate Limits
async function retryWithBackoff<T>( fn: () => Promise<T>, maxRetries: number = 5 ): Promise<T> { let lastError: Error; for (let i = 0; i < maxRetries; i++) { try { return await fn(); } catch (error) { lastError = error as Error; // Check if rate limit error if (error.status === 429) { const delay = Math.pow(2, i) * 1000; // Exponential backoff console.log(`Rate limited. Retrying in ${delay}ms...`); await new Promise(resolve => setTimeout(resolve, delay)); } else { throw error; // Non-retryable error } } } throw lastError!; } // Usage const result = await retryWithBackoff(() => client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [{ role: 'user', content: 'Hello' }], }) );
Prompt Engineering Best Practices
System Prompts for Consistency
const systemPrompts = { concise: 'You are a helpful assistant. Be concise and direct.', detailed: 'You are a knowledgeable expert. Provide comprehensive answers with examples.', code: 'You are an expert programmer. Provide clean, well-commented code with explanations.', creative: 'You are a creative writing assistant. Be imaginative and engaging.', }; async function chatWithPersonality( prompt: string, personality: keyof typeof systemPrompts ) { const completion = await client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [ { role: 'system', content: systemPrompts[personality] }, { role: 'user', content: prompt }, ], }); return completion.choices[0].message.content; }
Few-Shot Prompting
async function fewShotClassification(text: string) { const completion = await client.chat.completions.create({ model: 'openai/gpt-4-turbo', messages: [ { role: 'system', content: 'Classify text sentiment as positive, negative, or neutral.', }, { role: 'user', content: 'I love this product!' }, { role: 'assistant', content: 'positive' }, { role: 'user', content: 'This is terrible.' }, { role: 'assistant', content: 'negative' }, { role: 'user', content: 'It works fine.' }, { role: 'assistant', content: 'neutral' }, { role: 'user', content: text }, ], }); return completion.choices[0].message.content; }
Chain of Thought Prompting
async function reasoningTask(problem: string) { const completion = await client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [ { role: 'user', content: `${problem}\n\nLet's solve this step by step:\n1.`, }, ], max_tokens: 3000, }); return completion.choices[0].message.content; }
Rate Limits and Throttling
Rate Limit Handler
class RateLimitedClient { private requestQueue: Array<() => Promise<any>> = []; private processing = false; private requestsPerMinute = 60; private requestInterval = 60000 / this.requestsPerMinute; async enqueue<T>(request: () => Promise<T>): Promise<T> { return new Promise((resolve, reject) => { this.requestQueue.push(async () => { try { const result = await request(); resolve(result); } catch (error) { reject(error); } }); this.processQueue(); }); } private async processQueue() { if (this.processing || this.requestQueue.length === 0) return; this.processing = true; while (this.requestQueue.length > 0) { const request = this.requestQueue.shift()!; await request(); await new Promise(resolve => setTimeout(resolve, this.requestInterval)); } this.processing = false; } } // Usage const rateLimitedClient = new RateLimitedClient(); const result = await rateLimitedClient.enqueue(() => client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [{ role: 'user', content: 'Hello' }], }) );
Vision Models
Image Understanding
async function analyzeImage(imageUrl: string, question: string) { const completion = await client.chat.completions.create({ model: 'openai/gpt-4-vision-preview', messages: [ { role: 'user', content: [ { type: 'text', text: question }, { type: 'image_url', image_url: { url: imageUrl } }, ], }, ], max_tokens: 1000, }); return completion.choices[0].message.content; } // Usage const result = await analyzeImage( 'https://example.com/image.jpg', 'What objects are in this image?' );
Multi-Image Analysis
async function compareImages(imageUrls: string[]) { const completion = await client.chat.completions.create({ model: 'openai/gpt-4-vision-preview', messages: [ { role: 'user', content: [ { type: 'text', text: 'Compare these images and describe the differences:' }, ...imageUrls.map(url => ({ type: 'image_url' as const, image_url: { url }, })), ], }, ], }); return completion.choices[0].message.content; }
Error Handling and Monitoring
Comprehensive Error Handler
interface ErrorResponse { error: { message: string; type: string; code: string; }; } async function robustCompletion(prompt: string) { try { const completion = await client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [{ role: 'user', content: prompt }], }); return completion.choices[0].message.content; } catch (error: any) { // Rate limit errors if (error.status === 429) { console.error('Rate limit exceeded. Please wait.'); throw new Error('RATE_LIMIT_EXCEEDED'); } // Invalid API key if (error.status === 401) { console.error('Invalid API key'); throw new Error('INVALID_API_KEY'); } // Model not found if (error.status === 404) { console.error('Model not found'); throw new Error('MODEL_NOT_FOUND'); } // Server errors if (error.status >= 500) { console.error('OpenRouter server error'); throw new Error('SERVER_ERROR'); } // Unknown error console.error('Unknown error:', error); throw error; } }
Request/Response Logging
class LoggingClient { async chat(prompt: string, model: string) { const startTime = Date.now(); console.log('[Request]', { timestamp: new Date().toISOString(), model, promptLength: prompt.length, }); try { const completion = await client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], }); const duration = Date.now() - startTime; console.log('[Response]', { timestamp: new Date().toISOString(), duration, usage: completion.usage, finishReason: completion.choices[0].finish_reason, }); return completion; } catch (error) { console.error('[Error]', { timestamp: new Date().toISOString(), duration: Date.now() - startTime, error, }); throw error; } } }
Best Practices
-
Model Selection:
- Use fast models (Haiku, Flash) for simple tasks
- Use flagship models (Sonnet, GPT-4) for complex reasoning
- Consider context size requirements
- Test multiple models for your use case
-
Cost Optimization:
- Estimate costs before requests
- Use cheaper models when possible
- Implement token limits
- Cache common responses
- Batch similar requests
-
Streaming:
- Always use streaming for user-facing apps
- Handle connection interruptions
- Show progress indicators
- Buffer partial responses
-
Error Handling:
- Implement retry logic with exponential backoff
- Use model fallbacks for reliability
- Log all errors for debugging
- Handle rate limits gracefully
-
Prompt Engineering:
- Use system prompts for consistency
- Implement few-shot learning for specific tasks
- Use chain-of-thought for complex reasoning
- Keep prompts concise to reduce costs
-
Rate Limiting:
- Respect API rate limits
- Implement request queuing
- Use exponential backoff
- Monitor usage metrics
-
Security:
- Never expose API keys in client code
- Use environment variables
- Implement server-side proxies
- Validate user inputs
-
Monitoring:
- Track token usage
- Monitor response times
- Log errors and failures
- Analyze model performance
Common Pitfalls
❌ Exposing API keys in frontend:
// WRONG - API key exposed const client = new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: 'sk-or-v1-...', // Exposed! });
✅ Correct - Server-side proxy:
// Backend proxy app.post('/api/chat', async (req, res) => { const { prompt } = req.body; const completion = await client.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: [{ role: 'user', content: prompt }], }); res.json(completion); });
❌ Not handling streaming errors:
// WRONG - no error handling for await (const chunk of stream) { console.log(chunk.choices[0].delta.content); }
✅ Correct - with error handling:
try { for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || ''; process.stdout.write(content); } } catch (error) { console.error('Stream error:', error); // Implement retry or fallback }
❌ Ignoring rate limits:
// WRONG - no rate limiting const promises = prompts.map(prompt => chat(prompt)); await Promise.all(promises); // May hit rate limits
✅ Correct - with rate limiting:
const results = []; for (let i = 0; i < prompts.length; i += 5) { const batch = prompts.slice(i, i + 5); const batchResults = await Promise.all(batch.map(chat)); results.push(...batchResults); await new Promise(r => setTimeout(r, 1000)); // Delay between batches }
Performance Optimization
Caching Responses
const responseCache = new Map<string, string>(); async function cachedChat(prompt: string, model: string) { const cacheKey = `${model}:${prompt}`; if (responseCache.has(cacheKey)) { console.log('Cache hit'); return responseCache.get(cacheKey)!; } const completion = await client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], }); const response = completion.choices[0].message.content || ''; responseCache.set(cacheKey, response); return response; }
Parallel Processing
async function parallelChat(prompts: string[], model: string) { const results = await Promise.all( prompts.map(prompt => client.chat.completions.create({ model, messages: [{ role: 'user', content: prompt }], }) ) ); return results.map(r => r.choices[0].message.content); }
Resources
- Documentation: https://openrouter.ai/docs
- API Reference: https://openrouter.ai/docs/api-reference
- Model List: https://openrouter.ai/models
- Pricing: https://openrouter.ai/docs/pricing
- Status Page: https://status.openrouter.ai
Related Skills
- MCP Servers: Integration with Model Context Protocol (when built)
- TypeScript API Integration: Type-safe OpenRouter clients
- Python API Integration: Python SDK usage patterns
Summary
- OpenRouter provides unified access to 200+ LLMs
- OpenAI-compatible API for easy migration
- Cost optimization through model selection and token management
- Streaming for responsive user experiences
- Function calling for tool integration
- Vision models for image understanding
- Fallback strategies for reliability
- Rate limiting and error handling essential
- Perfect for multi-model apps, cost-sensitive deployments, avoiding vendor lock-in