Awesome-omni-skill ai-agent-development
Build production-ready AI agents with Microsoft Foundry and Agent Framework. Use when creating AI agents, selecting LLM models, implementing agent orchestration, adding tracing/observability, or evaluating agent quality. Covers agent architecture, model selection, multi-agent workflows, and production deployment.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ai-agents/ai-agent-development" ~/.claude/skills/diegosouzapw-awesome-omni-skill-ai-agent-development && rm -rf "$T"
skills/ai-agents/ai-agent-development/SKILL.md- pip install
- references API keys
AI Agent Development
Purpose: Build production-ready AI agents with Microsoft Foundry and Agent Framework.
Scope: Agent architecture, model selection, orchestration, observability, evaluation.
When to Use This Skill
- Building AI agents with Microsoft Foundry or Agent Framework
- Selecting LLM models for agent scenarios
- Implementing multi-agent orchestration workflows
- Adding tracing and observability to AI agents
- Evaluating agent quality and response accuracy
Prerequisites
- Python 3.11+ or .NET 8+
- agent-framework-azure-ai package
- Microsoft Foundry workspace with deployed model
Quick Start
Installation
Python (Recommended):
pip install agent-framework-azure-ai --pre # --pre required during preview
.NET:
dotnet add package Microsoft.Agents.AI.AzureAI --prerelease dotnet add package Microsoft.Agents.AI.Workflows --prerelease
Model Selection
Top Production Models (Microsoft Foundry):
| Model | Best For | Context | Cost/1M |
|---|---|---|---|
| gpt-5.2 | Enterprise agents, structured outputs | 200K/100K | TBD |
| gpt-5.1-codex-max | Agentic coding workflows | 272K/128K | $3.44 |
| claude-opus-4-5 | Complex agents, coding, computer use | 200K/64K | $10 |
| gpt-5.1 | Multi-step reasoning | 200K/100K | $3.44 |
| o3 | Advanced reasoning | 200K/100K | $3.5 |
Deploy Model:
Ctrl+Shift+P → AI Toolkit: Deploy Model
Agent Patterns
Single Agent
from pathlib import Path from agent_framework.openai import OpenAIChatClient # Load prompt from file — NEVER embed prompts as inline strings prompt = Path("prompts/assistant.md").read_text(encoding="utf-8") client = OpenAIChatClient( model="gpt-5.1", api_key=os.getenv("FOUNDRY_API_KEY"), endpoint=os.getenv("FOUNDRY_ENDPOINT") ) agent = { "name": "Assistant", "instructions": prompt, # Loaded from prompts/assistant.md "tools": [] # Add tools as needed } response = await client.chat( messages=[{"role": "user", "content": "Hello"}], agent=agent )
Multi-Agent Orchestration
from pathlib import Path from agent_framework.workflows import SequentialWorkflow # Each agent loads its prompt from a dedicated file researcher = { "name": "Researcher", "instructions": Path("prompts/researcher.md").read_text(encoding="utf-8") } writer = { "name": "Writer", "instructions": Path("prompts/writer.md").read_text(encoding="utf-8") } workflow = SequentialWorkflow( agents=[researcher, writer], handoff_strategy="on_completion" ) result = await workflow.run(query="Write about AI agents")
Advanced Patterns: Search github.com/microsoft/agent-framework for:
- Group Chat, Concurrent, Conditional, Loop
- Human-in-the-Loop, Reflection, Fan-out/Fan-in
- MCP, Multimodal, Custom Executors
Best Practices
Prompt & Template File Management
RULE: NEVER embed prompts or output templates as inline strings in code. Always store them as separate files.
Why: Prompts are content, not code. Separating them enables:
- Version control diffs that show exactly what changed in a prompt
- Non-developer editing (PMs, prompt engineers) without touching code
- A/B testing different prompts without code changes
- Reuse across agents, languages, and test harnesses
- Clear separation of concerns (logic vs. content)
Directory Convention:
project/ prompts/ # All system/agent prompts assistant.md # One file per agent role researcher.md writer.md reviewer.md templates/ # Output templates used by agents report-template.md # Structured output templates email-template.md summary-template.md config/ models.yaml # Model configuration
Loading Pattern:
from pathlib import Path # Load prompt prompt = Path("prompts/assistant.md").read_text(encoding="utf-8") # Load output template and inject into prompt template = Path("templates/report-template.md").read_text(encoding="utf-8") prompt_with_template = f"{prompt}\n\n## Output Format\n{template}"
Rules:
- MUST store all system prompts in
directory asprompts/
or.md
files.txt - MUST store output format templates in
directorytemplates/ - MUST NOT embed prompt text longer than one sentence directly in code
- SHOULD use Markdown format for prompts (readable, supports structure)
- SHOULD name files after the agent role:
prompts/{agent-name}.md - SHOULD include a brief comment header in each prompt file (purpose, version, model target)
- MAY use template variables (
) for dynamic content injected at runtime{variable}
Development
✅ DO:
- Plan agent architecture before coding (Research → Design → Implement)
- Use Microsoft Foundry models for production
- Implement tracing from day one
- Test with evaluation datasets before deployment
- Use structured outputs for reliable agent responses
- Implement error handling and retry logic
- Version your agents and track changes
- Store all prompts as separate files in
directoryprompts/ - Store output templates as separate files in
directorytemplates/
❌ DON'T:
- Hardcode API keys or endpoints
- Embed prompts or output templates as multi-line strings in code
- Skip tracing setup (critical for debugging)
- Deploy without evaluation
- Use GitHub models in production (free tier has limits)
- Ignore token limits and context windows
- Mix agent logic with business logic
Security
- Store credentials in environment variables or Azure Key Vault
- Validate all tool inputs and outputs
- Implement rate limiting for agent APIs
- Log agent actions for audit trails
- Use role-based access control (RBAC) for Foundry resources
- Review OWASP Top 10 for AI: owasp.org/AI-Security-and-Privacy-Guide
Performance
- Cache model responses when appropriate
- Use batch processing for multiple requests
- Monitor token usage and costs
- Implement timeout handling
- Use async/await for I/O operations
- Consider model size vs. latency tradeoffs
Monitoring
- Track key metrics: latency, success rate, token usage, cost
- Set up alerts for failures and anomalies
- Use structured logging with context
- Integrate with Azure Monitor / Application Insights
- Review traces regularly for optimization opportunities
Production Checklist
Development
- Agent architecture documented
- Model selected and deployed
- Tools/plugins implemented and tested
- Error handling with retries
- Structured outputs configured
- No hardcoded secrets
- All prompts stored as separate files in
(not inline in code)prompts/ - All output templates stored in
(not inline in code)templates/
Model Change Management (MANDATORY)
- Model version pinned explicitly (e.g.,
)gpt-5.1-2026-01-15 - Model version configurable via environment variable
- Evaluation baseline saved for current model
- A/B evaluation run before any model switch
- Structured output schema verified after model change
- Tool/function-calling accuracy verified after model change
- Model change documented in changelog with eval results
- Weekly evaluation monitoring configured for drift detection
- Alert threshold set for score drops > 10% from baseline
Model Change Test Automation (MANDATORY)
- Agent designed as model-agnostic (model injected via config)
-
defines model test matrix with thresholdsconfig/models.yaml - Tested against ≥2 models (primary + fallback from different provider)
- Multi-model comparison pipeline in CI/CD (weekly + on model config change)
- Deployment gated on threshold checks (CI fails on regression)
- Validated fallback model designated and documented
- Comparison report generated per run (JSON + human-readable)
- Cost and latency evaluators included alongside quality metrics
Observability
- OpenTelemetry tracing enabled
- Trace viewer tested
- Structured logging implemented
- Metrics collection configured
Evaluation
- Evaluation dataset created
- Evaluators defined (built-in + custom)
- Evaluation runs passing
- Results meet quality thresholds
- Multi-model comparison run (2+ models tested)
- Fallback model validated and documented
- Model comparison baseline saved
Security & Compliance
- Credentials in Key Vault/env vars
- Input validation implemented
- RBAC configured
- Audit logging enabled
- OWASP AI Top 10 reviewed
Operations
- Health checks implemented
- Rate limiting configured
- Monitoring alerts set up
- Deployment strategy defined
- Rollback plan documented
- Cost monitoring enabled
Resources
Official Documentation:
- Agent Framework: github.com/microsoft/agent-framework
- Microsoft Foundry: ai.azure.com
- Azure AI Projects SDK: learn.microsoft.com/python/api/overview/azure/ai-projects
- OpenTelemetry: opentelemetry.io
AI Toolkit:
- Model Catalog:
→Ctrl+Shift+PAI Toolkit: Model Catalog - Trace Viewer:
→Ctrl+Shift+PAI Toolkit: Open Trace Viewer - Playground:
→Ctrl+Shift+PAI Toolkit: Model Playground
Security:
- OWASP AI Security: owasp.org/AI-Security-and-Privacy-Guide
- Azure Security Best Practices: learn.microsoft.com/azure/security
Related: AGENTS.md for agent behavior guidelines • Skills.md for general production practices
Last Updated: January 17, 2026
Scripts
| Script | Purpose | Usage |
|---|---|---|
| Scaffold AI agent project (Python/.NET) with tracing & eval | |
| Validate agent project against production checklist | |
| Validate model pinning, data drift signals, and judge LLM readiness | |
| Run eval suite against multiple models and generate comparison report | |
Troubleshooting
| Issue | Solution |
|---|---|
| Model not found | Verify model deployment in Foundry portal and check endpoint URL |
| Tracing not appearing | Ensure AIInferenceInstrumentor().instrument() called before agent creation |
| Agent loops indefinitely | Set max_turns limit and add termination conditions |