Awesome-omni-skill sarvam-ai-skills

Guide for building AI applications with Sarvam AI APIs for Indian languages. Use when working with speech-to-text transcription, text-to-speech synthesis, text translation, chat completion, or document intelligence. Covers models saarika:v2.5, saaras:v2.5/v3, bulbul:v3, mayura:v1, sarvam-translate:v1, sarvam-m, and sarvam-vision for 11-23 Indian languages. Trigger when user asks about Indian language AI, STT, TTS, translation, multilingual chatbots, voice assistants, or document processing.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/sarvam-ai-skills" ~/.claude/skills/diegosouzapw-awesome-omni-skill-sarvam-ai-skills && rm -rf "$T"
manifest: skills/data-ai/sarvam-ai-skills/SKILL.md
source content

Sarvam AI Skills

Build AI applications with Sarvam AI APIs for Indian languages.

Overview

Sarvam AI provides specialized models for Indian language processing:

  • Speech-to-Text (saarika:v2.5, saaras:v3) - Transcribe audio in 11 languages (saaras:v3 has 5 modes)
  • Speech-to-Text-Translate (saaras:v2.5) - Transcribe and auto-translate to English
  • Text-to-Speech (bulbul:v3) - Natural speech with 45 voice options, temperature/pace control
  • Text Translation (mayura:v1, sarvam-translate:v1) - Translate between 11-22 Indian languages
  • Chat Completion (sarvam-m) - 24B parameter multilingual model
  • Document Intelligence (sarvam-vision) - 3B parameter VLM for document processing in 23 languages

Anatomy of a Skill

Every skill consists of a required SKILL.md file and optional bundled resources:

sarvam-ai-skills/
├── SKILL.md (required)
│   ├── YAML frontmatter metadata (required)
│   │   ├── name: sarvam-ai-skills
│   │   └── description: Guide for building AI applications...
│   └── Markdown instructions (required)
│       ├── API Overview
│       ├── Setup instructions
│       ├── Quick Start examples
│       └── Best practices
│
└── Bundled Resources (optional)
    ├── examples/          - Working Python code (STT, TTS, Translation, Chat)
    ├── templates/         - Documentation guides for skill creation
    └── assets/            - Configuration files (.env, requirements.txt)

SKILL.md (required)

The entrypoint file containing:

Frontmatter (YAML):

  • name
    - Skill identifier
  • description
    - When to use this skill
  • license
    - MIT license

Body (Markdown):

  • API overview and endpoints
  • Setup instructions
  • Quick start code examples
  • Best practices and patterns

Bundled Resources (optional)

examples/ - Executable Python scripts demonstrating each API:

  • speech_to_text.py
    - Transcription examples
  • text_to_speech.py
    - Speech generation examples
  • text_translation.py
    - Translation examples
  • chat_completion.py
    - Conversational AI examples
  • end_to_end_example.py
    - Multi-API workflow examples

templates/ - Reference documentation for skill creation:

  • speech-to-text-template.md
    - Complete STT guide
  • text-to-speech-template.md
    - Complete TTS guide
  • text-translation-template.md
    - Complete translation guide
  • chat-completion-template.md
    - Complete chat guide
  • skill-template.md
    - General skill framework

assets/ - Configuration and dependencies:

  • .env
    - API key storage
  • .env.example
    - Environment template
  • requirements.txt
    - Python dependencies

Setup

pip install sarvamai
export SARVAM_API_KEY="your_key"
from sarvamai import SarvamAI
import os
client = SarvamAI(api_subscription_key=os.getenv("SARVAM_API_KEY"))

Quick Start

Speech to Text

# File: examples/speech_to_text.py
with open("audio.wav", "rb") as f:
    response = client.speech_to_text.transcribe(
        file=f, language_code="hi-IN"
    )
print(response.transcript)

Speech to Text Translate

# File: examples/speech_to_text_translate.py
with open("audio.wav", "rb") as f:
    response = client.speech_to_text.translate(file=f, )
print(response.translation)  # English output

Text to Speech

# File: examples/text_to_speech.py
# bulbul:v3 (default) - 45 speakers: aditya, shubh, ritu, priya, neha, rahul, pooja, and more
response = client.text_to_speech.convert(
    text="नमस्ते", target_language_code="hi-IN",
    speaker="shubh", pace=1.0, temperature=0.6
)
# Decode: base64.b64decode(response.audios[0])

Text Translation

# File: examples/text_translation.py
response = client.text.translate(
    input="Hello", source_language_code="en-IN", target_language_code="hi-IN"
)
print(response.translated_text)

Chat Completion

# File: examples/chat_completion.py
response = client.chat.completions(
    messages=[{"role": "user", "content": "What is AI?"}],
    temperature=0.7
)
print(response.choices[0].message.content)

Document Intelligence

# File: examples/document_intelligence.py
# Process documents in 23 languages (22 Indian + English)
# Supports PDF, ZIP; outputs HTML or Markdown only (as ZIP)
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key=os.getenv("SARVAM_API_KEY"))

# Step 1: Create a document intelligence job
job = client.document_intelligence.create_job(
    language="hi-IN",
    output_format="md"
)
print(f"Job created: {job.job_id}")

# Step 2: Upload document
job.upload_file("document.pdf")
print("File uploaded")

# Step 3: Start processing
job.start()
print("Job started")

# Step 4: Wait for completion
status = job.wait_until_complete()
print(f"Job completed with state: {status.job_state}")

# Step 5: Get processing metrics
metrics = job.get_page_metrics()
print(f"Page metrics: {metrics}")

# Step 6: Download output (ZIP file containing the processed document)
job.download_output("./output.zip")
print("Output saved to ./output.zip")

Document Intelligence - Batch Processing (Large PDFs)

# File: examples/document_intelligence_batch.py
# Process large PDFs by splitting into chunks and merging results
# Automatically handles PDFs of any size (small, medium, large)
from document_intelligence_batch import process_large_pdf

# Automatically chooses best strategy:
# ≤5 pages: Direct processing (no splitting)
# >5 pages: Split into 5-page chunks, process, and merge
output = process_large_pdf(
    input_pdf="large_document.pdf",  # 25 pages → 5 chunks
    language="hi-IN",
    output_format="md",  # or "html"
    pages_per_chunk=5,
    cleanup=True
)
# Output: large_document_merged.md (all chunks merged in order)

Vision

# File: examples/vision.py
# Image analysis: captioning, OCR, markdown extraction
# Supports 23 languages (22 Indian + English)
import requests

# Option 1: Generate caption in Hindi
files = {"file": ("image.jpg", open("image.jpg", "rb"), "image/jpeg")}
data = {"prompt_type": "caption_in", "language": "hi-IN"}
response = requests.post(
    "https://api.sarvam.ai/vision",
    headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")},
    files=files,
    data=data
)
print(response.json()['content'])  # "एक सुंदर पहाड़ी दृश्य"

# Option 2: Extract text (OCR)
files = {"file": ("document.jpg", open("document.jpg", "rb"), "image/jpeg")}
data = {"prompt_type": "default_ocr"}
response = requests.post(
    "https://api.sarvam.ai/vision",
    headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")},
    files=files,
    data=data
)
print(response.json()['content'])

# Option 3: Convert to markdown
files = {"file": ("slide.jpg", open("slide.jpg", "rb"), "image/jpeg")}
data = {"prompt_type": "extract_as_markdown"}
response = requests.post(
    "https://api.sarvam.ai/vision",
    headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")},
    files=files,
    data=data
)
print(response.json()['content'])

Supported Languages

Core 11 languages (all models): hi-IN, en-IN, bn-IN, gu-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN

Extended 22 languages (sarvam-translate:v1): + as-IN, brx-IN, doi-IN, kok-IN, ks-IN, mai-IN, mni-IN, ne-IN, sa-IN, sat-IN, sd-IN, ur-IN

Document Intelligence & Vision (23 languages) - sarvam-vision model supports all 22 Indian languages + English with their native scripts:

LanguageCodeScriptLanguageCodeScript
Hindihi-INDevanagariAssameseas-INAssamese
Bengalibn-INBengaliUrduur-INPerso-Arabic
Tamilta-INTamilSanskritsa-INDevanagari
Telugute-INTeluguNepaline-INDevanagari
Marathimr-INDevanagariKonkanikok-INDevanagari
Gujaratigu-INGujaratiMaithilimai-INDevanagari
Kannadakn-INKannadaSindhisd-INDevanagari/Arabic
Malayalamml-INMalayalamKashmiriks-INPerso-Arabic
Odiaod-INOdiaDogridoi-INDevanagari
Punjabipa-INGurmukhiManipurimni-INMeetei Mayek
Englishen-INLatinBodobrx-INDevanagari
Santalisat-INOl Chiki

Repository Structure

sarvam-skills/
├── SKILL.md                          # This file - Skill definition
├── FILE_STRUCTURE.md                 # Detailed file tree
├── README.md                         # Main documentation
├── CONTRIBUTING.md                   # Contribution guidelines
├── requirements.txt                  # Python dependencies
├── .env                             # Your API key (configured)
├── .env.example                     # Environment template
│
├── examples/                        # Working code examples
│   ├── speech_to_text.py            # STT: Transcribe audio (saarika:v2.5, saaras:v3)
│   ├── speech_to_text_translate.py  # STT-Translate: Audio → English (saaras:v2.5)
│   ├── text_to_speech.py            # TTS: Text → Audio (bulbul:v3)
│   ├── text_translation.py          # Translation: 11-22 languages
│   ├── chat_completion.py           # Chat: sarvam-m model
│   ├── document_intelligence.py     # Document: Process small docs (≤5 pages)
│   ├── document_intelligence_batch.py  # Document: Batch processing for large PDFs
│   ├── end_to_end_example.py        # Multi-API workflows
│   └── README.md                    # Examples documentation
│
└── templates/                       # Skill creation guides
    ├── API_VERSIONS.md              # ✅ Cross-verified API endpoints (Feb 2026)
    ├── speech-to-text-template.md   # STT skill creation
    ├── text-to-speech-template.md   # TTS skill creation
    ├── text-translation-template.md # Translation skills
    ├── chat-completion-template.md  # Chat AI skills
    ├── document-intelligence-template.md  # Document processing skills
    ├── skill-template.md            # General skill template
    └── README.md                    # Templates overview

Key locations:

  • examples/
    - 6 working Python scripts demonstrating each API
  • templates/
    - 6 comprehensive guides (including API_VERSIONS.md)
  • templates/API_VERSIONS.md
    - Cross-verified API endpoints and versions
  • .env
    - Your API key (already configured)
  • requirements.txt
    - Python dependencies

Common Workflows

Multilingual Chatbot (STT → Translate → Chat → Translate → TTS)

See:

examples/end_to_end_example.py
(lines 87-130)

Content Localization (Multi-language + Audio)

See:

examples/text_translation.py
(Example 2, lines 60-90)

Voice Assistant (Voice → Text → AI → Voice)

See:

examples/end_to_end_example.py
(lines 133-180)

Model Selection

Speech-to-Text: saarika:v2.5 (standard), saaras:v3 (5 modes: transcribe, translate, verbatim, translit, codemix)
Text-to-Speech: bulbul:v3 (45 speakers, temperature/pace control)
Translation: mayura:v1 (11 languages), sarvam-translate:v1 (22 languages)
Chat: sarvam-m only (24B parameters) Document Intelligence: sarvam-vision (3B VLM, 23 languages, PDF/PNG/JPG input)

Best Practices

API Key: Use environment variables (

os.getenv("SARVAM_API_KEY")
)
Error Handling: Wrap API calls in try-except
Audio: Use WAV format, keep under 25MB
Translation: Use auto-detection when source unknown
Performance: Cache translations, batch requests

Bundled Resources

Examples (
examples/
)

Working Python scripts for each API endpoint:

  • speech_to_text.py
    - Audio transcription
  • speech_to_text_translate.py
    - Transcribe + translate
  • text_to_speech.py
    - Speech generation
  • text_translation.py
    - Text translation
  • chat_completion.py
    - Conversational AI
  • document_intelligence.py
    - Document processing (small PDFs, ≤5 pages)
  • document_intelligence_batch.py
    - Batch processing for large PDFs (any size)
  • end_to_end_example.py
    - Multi-API workflows

See examples/README.md for detailed documentation.

Templates (
templates/
)

Comprehensive guides for building skills:

  • API_VERSIONS.md
    - ✅ Cross-verified API endpoints (Feb 2026)
  • speech-to-text-template.md
    - STT skill creation
  • text-to-speech-template.md
    - TTS skill creation
  • text-translation-template.md
    - Translation skills
  • chat-completion-template.md
    - Chat AI skills
  • document-intelligence-template.md
    - Document processing skills
  • skill-template.md
    - General skill template

See templates/README.md and templates/API_VERSIONS.md for details.

Structure Reference

See FILE_STRUCTURE.md for complete file tree with sizes and use cases.

Resources