Claude-skill-registry azure-openai-2025
Azure OpenAI Service 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/azure-openai-2025" ~/.claude/skills/majiayu000-claude-skill-registry-azure-openai-2025 && rm -rf "$T"
skills/data/azure-openai-2025/SKILL.mdAzure OpenAI Service - 2025 Models and Features
Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
Overview
Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
Latest Models (2025)
GPT-5 Series (GA August 2025)
Registration Required Models:
: Highest capability, complex reasoninggpt-5-pro
: Balanced performance and costgpt-5
: Optimized for code generationgpt-5-codex
No Registration Required:
: Faster, more affordablegpt-5-mini
: Ultra-fast for simple tasksgpt-5-nano
: Optimized for conversational usegpt-5-chat
GPT-4.1 Series
: 1 million token context windowgpt-4.1
: Efficient version with 1M contextgpt-4.1-mini
: Fastest variantgpt-4.1-nano
Key Improvements:
- 1,000,000 token context (vs 128K in GPT-4 Turbo)
- Better instruction following
- Reduced hallucinations
- Improved multilingual support
Reasoning Models
o4-mini: Lightweight reasoning model
- Faster inference
- Lower cost
- Suitable for structured reasoning tasks
o3: Advanced reasoning model
- Complex problem solving
- Mathematical reasoning
- Scientific analysis
o1: Original reasoning model
- General-purpose reasoning
- Step-by-step explanations
o1-mini: Efficient reasoning
- Balanced cost and performance
Image Generation
GPT-image-1 (2025-04-15)
- DALL-E 3 successor
- Higher quality images
- Better prompt understanding
- Improved safety filters
Video Generation
Sora (2025-05-02)
- Text-to-video generation
- Realistic and imaginative scenes
- Up to 60 seconds of video
- Multiple camera angles and styles
Audio Models
gpt-4o-transcribe: Speech-to-text powered by GPT-4o
- High accuracy transcription
- Multiple languages
- Speaker diarization
gpt-4o-mini-transcribe: Faster, more affordable transcription
- Good accuracy
- Lower latency
- Cost-effective
Deploying Azure OpenAI
Create Azure OpenAI Resource
# Create OpenAI account az cognitiveservices account create \ --name myopenai \ --resource-group MyRG \ --kind OpenAI \ --sku S0 \ --location eastus \ --custom-domain myopenai \ --public-network-access Disabled \ --identity-type SystemAssigned # Get endpoint and key az cognitiveservices account show \ --name myopenai \ --resource-group MyRG \ --query "properties.endpoint" \ --output tsv az cognitiveservices account keys list \ --name myopenai \ --resource-group MyRG \ --query "key1" \ --output tsv
Deploy GPT-5 Model
# Deploy gpt-5 az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --model-name gpt-5 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100 \ --scale-type Standard # Deploy gpt-5-pro (requires registration) az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5-pro \ --model-name gpt-5-pro \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 50
Deploy Reasoning Models
# Deploy o3 reasoning model az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name o3-reasoning \ --model-name o3 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 50 # Deploy o4-mini az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name o4-mini \ --model-name o4-mini \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100
Deploy GPT-4.1 with 1M Context
az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-4-1 \ --model-name gpt-4.1 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100
Deploy Image Generation Model
az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name image-gen \ --model-name gpt-image-1 \ --model-version 2025-04-15 \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 10
Deploy Sora Video Generation
az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name sora \ --model-name sora \ --model-version 2025-05-02 \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 5
Using Azure OpenAI Models
Python SDK (GPT-5)
from openai import AzureOpenAI import os # Initialize client client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2025-02-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") ) # GPT-5 completion response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 ) print(response.choices[0].message.content)
Python SDK (o3 Reasoning Model)
# o3 reasoning with chain-of-thought response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks ) print(response.choices[0].message.content)
Python SDK (GPT-4.1 with 1M Context)
# Read a large document with open('large_document.txt', 'r') as f: document = f.read() # GPT-4.1 can handle up to 1M tokens response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 ) print(response.choices[0].message.content)
Image Generation (GPT-image-1)
# Generate image with DALL-E 3 successor response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 ) image_url = response.data[0].url print(f"Generated image: {image_url}")
Video Generation (Sora)
# Generate video with Sora response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 ) video_url = response.data[0].url print(f"Generated video: {video_url}")
Audio Transcription
# Transcribe audio file audio_file = open("meeting_recording.mp3", "rb") response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" ) print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s") # Speaker diarization for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}")
Azure AI Foundry Integration
Model Router (Automatic Model Selection)
from azure.ai.foundry import ModelRouter # Initialize model router router = ModelRouter( endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), credential=os.getenv("AZURE_OPENAI_API_KEY") ) # Automatically select optimal model response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] ) print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}")
Benefits:
- Automatic model selection based on prompt complexity
- Balance quality vs cost
- Reduce costs by up to 40% while maintaining quality
Agentic Retrieval (Azure AI Search Integration)
from azure.search.documents import SearchClient from azure.core.credentials import AzureKeyCredential # Initialize search client search_client = SearchClient( endpoint=os.getenv("SEARCH_ENDPOINT"), index_name="documents", credential=AzureKeyCredential(os.getenv("SEARCH_KEY")) ) # Agentic retrieval with Azure OpenAI response = client.chat.completions.create( model="gpt-5", messages=[ {"role": "system", "content": "You have access to a document search system."}, {"role": "user", "content": "What are the company's revenue projections for Q3?"} ], tools=[{ "type": "function", "function": { "name": "search_documents", "description": "Search company documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }], tool_choice="auto" ) # Process tool calls if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer
Improvements:
- 40% better on complex, multi-part questions
- Automatic query decomposition
- Relevance ranking
- Citation generation
Foundry Observability (Preview)
from azure.ai.foundry import FoundryObservability # Enable observability observability = FoundryObservability( workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"), enable_tracing=True, enable_metrics=True ) # Monitor agent execution with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages ) trace.log_tool_call("search_kb", {"query": "refund policy"}) trace.log_reasoning_step("Retrieved refund policy document") trace.log_token_usage(response.usage.total_tokens) # View in Azure AI Foundry portal: # - End-to-end trace logs # - Reasoning steps and tool calls # - Performance metrics # - Cost analysis
Capacity and Quota Management
Check Quota
# List deployments with usage az cognitiveservices account deployment list \ --resource-group MyRG \ --name myopenai \ --output table # Check usage metrics az monitor metrics list \ --resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \ --metric "TokenTransaction" \ --start-time 2025-01-01T00:00:00Z \ --end-time 2025-01-31T23:59:59Z \ --interval PT1H \ --aggregation Total
Update Capacity
# Scale up deployment capacity az cognitiveservices account deployment update \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --sku-capacity 200 # Scale down during off-peak az cognitiveservices account deployment update \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --sku-capacity 50
Request Quota Increase
- Navigate to Azure Portal → Azure OpenAI resource
- Go to "Quotas" blade
- Select model and region
- Click "Request quota increase"
- Provide justification and target capacity
Security and Networking
Private Endpoint
# Create private endpoint az network private-endpoint create \ --name openai-private-endpoint \ --resource-group MyRG \ --vnet-name MyVNet \ --subnet PrivateEndpointSubnet \ --private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \ --group-id account \ --connection-name openai-connection # Create private DNS zone az network private-dns zone create \ --resource-group MyRG \ --name privatelink.openai.azure.com # Link to VNet az network private-dns link vnet create \ --resource-group MyRG \ --zone-name privatelink.openai.azure.com \ --name openai-dns-link \ --virtual-network MyVNet \ --registration-enabled false # Create DNS zone group az network private-endpoint dns-zone-group create \ --resource-group MyRG \ --endpoint-name openai-private-endpoint \ --name default \ --private-dns-zone privatelink.openai.azure.com \ --zone-name privatelink.openai.azure.com
Managed Identity Access
# Enable system-assigned identity az cognitiveservices account identity assign \ --name myopenai \ --resource-group MyRG # Grant role to managed identity PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv) az role assignment create \ --assignee $PRINCIPAL_ID \ --role "Cognitive Services OpenAI User" \ --scope /subscriptions/<sub-id>/resourceGroups/MyRG
Content Filtering
# Configure content filtering az cognitiveservices account update \ --name myopenai \ --resource-group MyRG \ --set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }'
Cost Optimization
Model Selection Strategy
Use GPT-5-mini or GPT-5-nano for:
- Simple questions
- Classification tasks
- Content moderation
- Summarization
Use GPT-5 or GPT-4.1 for:
- Complex reasoning
- Long-form content generation
- Document analysis
- Code generation
Use Reasoning Models (o3, o4-mini) for:
- Mathematical problems
- Scientific analysis
- Step-by-step reasoning
- Logic puzzles
Implement Caching
# Use semantic cache to reduce duplicate requests from azure.ai.cache import SemanticCache cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 ) # Check cache before API call cached_response = cache.get(user_query) if cached_response: return cached_response response = client.chat.completions.create( model="gpt-5", messages=messages ) cache.set(user_query, response)
Token Management
import tiktoken # Count tokens before API call encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt)) if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!") # Use shorter max_tokens when appropriate response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens )
Monitoring and Alerts
Set Up Cost Alerts
# Create budget alert az consumption budget create \ --budget-name openai-monthly-budget \ --resource-group MyRG \ --amount 1000 \ --category Cost \ --time-grain Monthly \ --start-date 2025-01-01 \ --end-date 2025-12-31 \ --notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }'
Application Insights Integration
from opencensus.ext.azure.log_exporter import AzureLogHandler import logging # Configure logging logger = logging.getLogger(__name__) logger.addHandler(AzureLogHandler( connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING") )) # Log API calls logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } })
Best Practices
✓ Use Model Router for automatic cost optimization ✓ Implement caching to reduce duplicate requests ✓ Monitor token usage and set budgets ✓ Use private endpoints for production workloads ✓ Enable managed identity instead of API keys ✓ Configure content filtering for safety ✓ Right-size capacity based on actual demand ✓ Use Foundry Observability for monitoring ✓ Implement retry logic with exponential backoff ✓ Choose appropriate models for task complexity
References
- Azure OpenAI Documentation
- What's New in Azure OpenAI
- GPT-5 Announcement
- Azure AI Foundry
- Model Pricing
Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!