Paperresearchagent nvidia-api
NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration.
git clone https://github.com/rish2jain/paperresearchagent
T=$(mktemp -d) && git clone --depth=1 https://github.com/rish2jain/paperresearchagent "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/nvidia-api" ~/.claude/skills/rish2jain-paperresearchagent-nvidia-api-f56451 && rm -rf "$T"
.claude/skills/nvidia-api/SKILL.mdNvidia-Api Skill
Comprehensive assistance with NVIDIA API development, focusing on NIM (NVIDIA Inference Microservices) and cloud-hosted AI endpoints for building prototype and production applications.
When to Use This Skill
This skill should be triggered when:
Primary Use Cases
- LLM Integration: Working with Large Language Models via NVIDIA NIM API (Llama, Mistral, Gemma, etc.)
- Chat Completions: Implementing chat interfaces, chatbots, or conversational AI using NVIDIA-hosted models
- Code Generation: Using code-specialized models (CodeLlama, StarCoder, Codestral, Granite)
- Multimodal AI: Integrating visual design, image understanding, or vision-language models
- Retrieval Systems: Building RAG (Retrieval-Augmented Generation) applications with NVIDIA retrieval APIs
- Healthcare AI: Implementing medical AI solutions with NVIDIA healthcare-specific APIs
- Weather & Simulation: Working with Earth-2 weather prediction APIs
Technical Scenarios
- Setting up authentication with NVIDIA API keys
- Migrating from OpenAI API to NVIDIA NIM (OpenAI-compatible endpoints)
- Choosing between different LLM models for specific tasks
- Implementing streaming responses for chat applications
- Building production AI applications with NVIDIA cloud endpoints
- Prototyping AI features before self-hosting NIMs
Quick Reference
Authentication Setup
Get your API key: Visit NVIDIA API Catalog to obtain your API key.
# Set environment variable export NVIDIA_API_KEY="nvapi-your-key-here"
# Python - Using environment variable import os api_key = os.environ.get("NVIDIA_API_KEY")
Basic Chat Completion (Python)
import requests url = "https://integrate.api.nvidia.com/v1/chat/completions" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "model": "meta/llama3-70b-instruct", "messages": [ {"role": "user", "content": "Explain quantum computing in simple terms"} ], "max_tokens": 150, "temperature": 0.7 } response = requests.post(url, json=payload, headers=headers) result = response.json() print(result["choices"][0]["message"]["content"])
Streaming Chat Response
import requests import json url = "https://integrate.api.nvidia.com/v1/chat/completions" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } payload = { "model": "mistralai/mixtral-8x7b-instruct", "messages": [{"role": "user", "content": "Write a short story"}], "stream": True } response = requests.post(url, json=payload, headers=headers, stream=True) for line in response.iter_lines(): if line: line = line.decode('utf-8') if line.startswith('data: '): data = line[6:] if data != '[DONE]': chunk = json.loads(data) content = chunk["choices"][0]["delta"].get("content", "") print(content, end="", flush=True)
OpenAI-Compatible Integration
from openai import OpenAI # Drop-in replacement for OpenAI client client = OpenAI( base_url="https://integrate.api.nvidia.com/v1", api_key=api_key ) completion = client.chat.completions.create( model="meta/llama3-70b-instruct", messages=[{"role": "user", "content": "Hello!"}], temperature=0.5, max_tokens=100 ) print(completion.choices[0].message.content)
Code Generation Example
# Using code-specialized models payload = { "model": "bigcode/starcoder2-15b", "messages": [ {"role": "system", "content": "You are an expert programmer."}, {"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"} ], "temperature": 0.2, # Lower temperature for code generation "max_tokens": 500 } response = requests.post(url, json=payload, headers=headers) code = response.json()["choices"][0]["message"]["content"]
Multi-Turn Conversation
conversation = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is machine learning?"}, ] # First response response1 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers) assistant_reply = response1.json()["choices"][0]["message"]["content"] # Continue conversation conversation.append({"role": "assistant", "content": assistant_reply}) conversation.append({"role": "user", "content": "Can you give me an example?"}) response2 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers)
JavaScript/Node.js Example
const axios = require('axios'); const url = 'https://integrate.api.nvidia.com/v1/chat/completions'; const apiKey = process.env.NVIDIA_API_KEY; async function chatCompletion() { const response = await axios.post(url, { model: "mistralai/mistral-7b-instruct", messages: [ { role: "user", content: "What are the benefits of renewable energy?" } ], max_tokens: 200 }, { headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' } }); console.log(response.data.choices[0].message.content); } chatCompletion();
cURL Example
curl https://integrate.api.nvidia.com/v1/chat/completions \ -H "Authorization: Bearer $NVIDIA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama3-8b-instruct", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "max_tokens": 50, "temperature": 0.7 }'
Error Handling
try: response = requests.post(url, json=payload, headers=headers) response.raise_for_status() # Raise exception for 4xx/5xx status codes result = response.json() except requests.exceptions.HTTPError as e: if response.status_code == 401: print("Authentication failed. Check your API key.") elif response.status_code == 429: print("Rate limit exceeded. Please slow down requests.") else: print(f"HTTP Error: {e}") except requests.exceptions.RequestException as e: print(f"Request failed: {e}")
Available Model Categories
Large Language Models (LLMs)
Access to top language models for various tasks:
Meta Models
- High-performance instruction-followingmeta/llama3-70b-instruct
- Efficient smaller modelmeta/llama3-8b-instruct
- Specialized for code generationmeta/codellama-70b
Mistral Models
- High-quality mixture-of-expertsmistralai/mixtral-8x7b-instruct
- Latest large modelmistralai/mistral-large-2-instruct
- Code generation specialistmistralai/codestral-22b-instruct-v0.1
Google Models
- Instruction-tuned Gemmagoogle/gemma-2-27b-it
- Code understanding and generationgoogle/codegemma-7b
- Safety and content filteringgoogle/shieldgemma-9b
Other Notable Models
- Optimized for Q&Anvidia/llama3-chatqa-1.5-70b
- Enterprise code modelibm/granite-34b-code-instruct
- Advanced reasoning modeldeepseek-ai/deepseek-r1
Code Generation Models
- Open-source code completionbigcode/starcoder2-15b
- Instruction-following for codemistralai/codestral-22b-instruct-v0.1
- Google's code specialistgoogle/codegemma-7b
Other API Categories
- Visual Design - Image generation and visual models
- Multimodal - Vision-language models
- Retrieval - Embedding and retrieval APIs for RAG
- Healthcare - Medical AI specialized models
- Weather Prediction - Earth-2 climate simulation
Key Concepts
NVIDIA NIM (NVIDIA Inference Microservices)
Cloud-hosted inference endpoints that provide:
- Simple REST API access to leading AI models
- OpenAI API compatibility for easy migration
- Prototype-friendly with production capabilities
- Support for both cloud endpoints and downloadable containers
Endpoint Structure
Base URL: https://integrate.api.nvidia.com Endpoint: POST /v1/chat/completions
API Compatibility
NVIDIA NIM APIs follow OpenAI's API specification, making it easy to:
- Migrate existing OpenAI-based applications
- Use OpenAI client libraries with minimal changes
- Maintain familiar request/response patterns
Authentication
- Uses Bearer token authentication
- API keys obtained from NVIDIA API Catalog
- Key format:
nvapi-*
Response Format
Standard OpenAI-compatible response structure:
{ "id": "chatcmpl-...", "object": "chat.completion", "created": 1234567890, "model": "meta/llama3-70b-instruct", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Response text here" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 10, "completion_tokens": 50, "total_tokens": 60 } }
Reference Files
This skill includes comprehensive documentation in
references/:
getting_started.md
- Overview of NVIDIA API platform
- Getting started guide for cloud endpoints
- Authentication and setup instructions
- Links to main API categories
other.md
- Additional NVIDIA API documentation
- Related services and microservices
- Integration patterns and best practices
Note: For detailed API specifications, parameter descriptions, and model-specific information, refer to the official documentation at docs.api.nvidia.com
Working with This Skill
For Beginners
- Start with authentication: Get your API key from NVIDIA API Catalog
- Try simple examples: Use the Quick Reference curl or Python examples above
- Test different models: Experiment with various models to understand their strengths
- Read the docs: Check
for foundational conceptsreferences/getting_started.md
For Intermediate Users
- Implement streaming: Add real-time response streaming for better UX
- Optimize parameters: Experiment with temperature, max_tokens, and other settings
- Build conversations: Maintain context across multiple turns
- Handle errors gracefully: Implement robust error handling and retry logic
- Choose optimal models: Select models based on task requirements (speed vs accuracy)
For Advanced Users
- Production deployment: Move from prototypes to production-ready applications
- Batch processing: Implement efficient batch inference patterns
- Self-hosted NIMs: Download and deploy container images for local inference
- Multi-modal integration: Combine LLM, vision, and retrieval APIs
- Performance optimization: Fine-tune requests for latency and cost efficiency
- RAG implementation: Build retrieval-augmented generation systems
Navigation Tips
- Model selection: Choose based on task complexity, latency needs, and cost
- OpenAI migration: Use the OpenAI-compatible client for seamless migration
- API documentation: Access detailed specs at docs.api.nvidia.com/nim
- Model catalog: Browse available models at build.nvidia.com
Common Use Cases
Chatbots & Conversational AI
Use models like
meta/llama3-70b-instruct or mistralai/mixtral-8x7b-instruct for building intelligent conversational interfaces.
Code Assistants
Use specialized models like
bigcode/starcoder2-15b, mistralai/codestral-22b-instruct, or ibm/granite-34b-code-instruct.
Question Answering
Use
nvidia/llama3-chatqa-1.5-70b optimized specifically for Q&A tasks.
Content Generation
Use creative models like
mistralai/mistral-large-2-instruct with higher temperature settings.
RAG Applications
Combine LLM APIs with NVIDIA's Retrieval APIs for knowledge-grounded responses.
Best Practices
API Key Security
- Never commit API keys to version control
- Use environment variables for key storage
- Rotate keys regularly for security
- Monitor usage to detect unauthorized access
Parameter Tuning
- Temperature: Lower (0.1-0.3) for factual/code, higher (0.7-1.0) for creative
- Max tokens: Set appropriate limits to control costs and response length
- Top-p: Alternative to temperature for controlling randomness
- Streaming: Enable for better user experience in interactive applications
Error Handling
- Implement retry logic with exponential backoff
- Handle rate limits gracefully (HTTP 429)
- Validate responses before using in production
- Log errors for debugging and monitoring
Model Selection
- Small models (7B-8B): Fast, cost-effective for simple tasks
- Medium models (13B-34B): Balance of performance and efficiency
- Large models (70B+): Best quality for complex reasoning and generation
Resources
Official Documentation
- API Docs: docs.api.nvidia.com
- NIM Reference: docs.api.nvidia.com/nim
- Model Catalog: build.nvidia.com
Getting Started
- Get API key at NVIDIA API Catalog
- Browse available models and try them in the playground
- Read getting started guides in
references/getting_started.md
Community & Support
- Check NVIDIA Developer Forums for community support
- Review example applications and integration patterns
- Explore NVIDIA AI Enterprise documentation for production deployments
Notes
- NVIDIA NIM APIs follow OpenAI API specifications for easy integration
- Models are cloud-hosted for prototyping; downloadable containers available for production
- API is designed for both prototyping and production workloads
- Multiple language support: Works with Python, JavaScript, Java, Ruby, PHP, and any HTTP client
- Streaming support: Real-time response generation for interactive applications
- Select models available as self-hosted NIMs with NVIDIA AI Enterprise entitlement
Updating
To refresh this skill with updated documentation:
- Re-run the scraper with the same configuration
- The skill will be rebuilt with the latest model information and API updates
- Check for new models and API endpoints in the official documentation