Awesome-omni-skill azure-ai-contentunderstanding-py
Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/content-media/azure-ai-contentunderstanding-py" ~/.claude/skills/diegosouzapw-awesome-omni-skill-azure-ai-contentunderstanding-py && rm -rf "$T"
manifest:
skills/content-media/azure-ai-contentunderstanding-py/SKILL.mdsource content
Azure AI Content Understanding SDK for Python
Multimodal AI service that extracts semantic content from documents, video, audio, and image files for RAG and automated workflows.
Installation
pip install azure-ai-contentunderstanding
Environment Variables
CONTENTUNDERSTANDING_ENDPOINT=https://<resource>.cognitiveservices.azure.com/
Authentication
import os from azure.ai.contentunderstanding import ContentUnderstandingClient from azure.identity import DefaultAzureCredential endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] credential = DefaultAzureCredential() client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)
Core Workflow
Content Understanding operations are asynchronous long-running operations:
- Begin Analysis — Start the analysis operation with
(returns a poller)begin_analyze() - Poll for Results — Poll until analysis completes (SDK handles this with
).result() - Process Results — Extract structured results from
AnalyzeResult.contents
Prebuilt Analyzers
| Analyzer | Content Type | Purpose |
|---|---|---|
| Documents | Extract markdown for RAG applications |
| Images | Extract content from images |
| Audio | Transcribe audio with timing |
| Video | Extract frames, transcripts, summaries |
| Documents | Extract invoice fields |
Analyze Document
import os from azure.ai.contentunderstanding import ContentUnderstandingClient from azure.ai.contentunderstanding.models import AnalyzeInput from azure.identity import DefaultAzureCredential endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] client = ContentUnderstandingClient( endpoint=endpoint, credential=DefaultAzureCredential() ) # Analyze document from URL poller = client.begin_analyze( analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url="https://example.com/document.pdf")] ) result = poller.result() # Access markdown content (contents is a list) content = result.contents[0] print(content.markdown)
Access Document Content Details
from azure.ai.contentunderstanding.models import MediaContentKind, DocumentContent content = result.contents[0] if content.kind == MediaContentKind.DOCUMENT: document_content: DocumentContent = content # type: ignore print(document_content.start_page_number)
Analyze Image
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-imageSearch", inputs=[AnalyzeInput(url="https://example.com/image.jpg")] ) result = poller.result() content = result.contents[0] print(content.markdown)
Analyze Video
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-videoSearch", inputs=[AnalyzeInput(url="https://example.com/video.mp4")] ) result = poller.result() # Access video content (AudioVisualContent) content = result.contents[0] # Get transcript phrases with timing for phrase in content.transcript_phrases: print(f"[{phrase.start_time} - {phrase.end_time}]: {phrase.text}") # Get key frames (for video) for frame in content.key_frames: print(f"Frame at {frame.time}: {frame.description}")
Analyze Audio
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-audioSearch", inputs=[AnalyzeInput(url="https://example.com/audio.mp3")] ) result = poller.result() # Access audio transcript content = result.contents[0] for phrase in content.transcript_phrases: print(f"[{phrase.start_time}] {phrase.text}")
Custom Analyzers
Create custom analyzers with field schemas for specialized extraction:
# Create custom analyzer analyzer = client.create_analyzer( analyzer_id="my-invoice-analyzer", analyzer={ "description": "Custom invoice analyzer", "base_analyzer_id": "prebuilt-documentSearch", "field_schema": { "fields": { "vendor_name": {"type": "string"}, "invoice_total": {"type": "number"}, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "amount": {"type": "number"} } } } } } } ) # Use custom analyzer from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="my-invoice-analyzer", inputs=[AnalyzeInput(url="https://example.com/invoice.pdf")] ) result = poller.result() # Access extracted fields print(result.fields["vendor_name"]) print(result.fields["invoice_total"])
Analyzer Management
# List all analyzers analyzers = client.list_analyzers() for analyzer in analyzers: print(f"{analyzer.analyzer_id}: {analyzer.description}") # Get specific analyzer analyzer = client.get_analyzer("prebuilt-documentSearch") # Delete custom analyzer client.delete_analyzer("my-custom-analyzer")
Async Client
import asyncio import os from azure.ai.contentunderstanding.aio import ContentUnderstandingClient from azure.ai.contentunderstanding.models import AnalyzeInput from azure.identity.aio import DefaultAzureCredential async def analyze_document(): endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] credential = DefaultAzureCredential() async with ContentUnderstandingClient( endpoint=endpoint, credential=credential ) as client: poller = await client.begin_analyze( analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url="https://example.com/doc.pdf")] ) result = await poller.result() content = result.contents[0] return content.markdown asyncio.run(analyze_document())
Content Types
| Class | For | Provides |
|---|---|---|
| PDF, images, Office docs | Pages, tables, figures, paragraphs |
| Audio, video files | Transcript phrases, timing, key frames |
Both derive from
MediaContent which provides basic info and markdown representation.
Model Imports
from azure.ai.contentunderstanding.models import ( AnalyzeInput, AnalyzeResult, MediaContentKind, DocumentContent, AudioVisualContent, )
Client Types
| Client | Purpose |
|---|---|
| Sync client for all operations |
(aio) | Async client for all operations |
Best Practices
- Use
withbegin_analyze
— this is the correct method signatureAnalyzeInput - Access results via
— results are returned as a listresult.contents[0] - Use prebuilt analyzers for common scenarios (document/image/audio/video search)
- Create custom analyzers only for domain-specific field extraction
- Use async client for high-throughput scenarios with
credentialsazure.identity.aio - Handle long-running operations — video/audio analysis can take minutes
- Use URL sources when possible to avoid upload overhead
When to Use
This skill is applicable to execute the workflow or actions described in the overview.