Awesome-omni-skill sarvam-ai-skills
Guide for building AI applications with Sarvam AI APIs for Indian languages. Use when working with speech-to-text transcription, text-to-speech synthesis, text translation, chat completion, or document intelligence. Covers models saarika:v2.5, saaras:v2.5/v3, bulbul:v3, mayura:v1, sarvam-translate:v1, sarvam-m, and sarvam-vision for 11-23 Indian languages. Trigger when user asks about Indian language AI, STT, TTS, translation, multilingual chatbots, voice assistants, or document processing.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/sarvam-ai-skills" ~/.claude/skills/diegosouzapw-awesome-omni-skill-sarvam-ai-skills && rm -rf "$T"
skills/data-ai/sarvam-ai-skills/SKILL.mdSarvam AI Skills
Build AI applications with Sarvam AI APIs for Indian languages.
Overview
Sarvam AI provides specialized models for Indian language processing:
- Speech-to-Text (saarika:v2.5, saaras:v3) - Transcribe audio in 11 languages (saaras:v3 has 5 modes)
- Speech-to-Text-Translate (saaras:v2.5) - Transcribe and auto-translate to English
- Text-to-Speech (bulbul:v3) - Natural speech with 45 voice options, temperature/pace control
- Text Translation (mayura:v1, sarvam-translate:v1) - Translate between 11-22 Indian languages
- Chat Completion (sarvam-m) - 24B parameter multilingual model
- Document Intelligence (sarvam-vision) - 3B parameter VLM for document processing in 23 languages
Anatomy of a Skill
Every skill consists of a required SKILL.md file and optional bundled resources:
sarvam-ai-skills/ ├── SKILL.md (required) │ ├── YAML frontmatter metadata (required) │ │ ├── name: sarvam-ai-skills │ │ └── description: Guide for building AI applications... │ └── Markdown instructions (required) │ ├── API Overview │ ├── Setup instructions │ ├── Quick Start examples │ └── Best practices │ └── Bundled Resources (optional) ├── examples/ - Working Python code (STT, TTS, Translation, Chat) ├── templates/ - Documentation guides for skill creation └── assets/ - Configuration files (.env, requirements.txt)
SKILL.md (required)
The entrypoint file containing:
Frontmatter (YAML):
- Skill identifiername
- When to use this skilldescription
- MIT licenselicense
Body (Markdown):
- API overview and endpoints
- Setup instructions
- Quick start code examples
- Best practices and patterns
Bundled Resources (optional)
examples/ - Executable Python scripts demonstrating each API:
- Transcription examplesspeech_to_text.py
- Speech generation examplestext_to_speech.py
- Translation examplestext_translation.py
- Conversational AI exampleschat_completion.py
- Multi-API workflow examplesend_to_end_example.py
templates/ - Reference documentation for skill creation:
- Complete STT guidespeech-to-text-template.md
- Complete TTS guidetext-to-speech-template.md
- Complete translation guidetext-translation-template.md
- Complete chat guidechat-completion-template.md
- General skill frameworkskill-template.md
assets/ - Configuration and dependencies:
- API key storage.env
- Environment template.env.example
- Python dependenciesrequirements.txt
Setup
pip install sarvamai export SARVAM_API_KEY="your_key"
from sarvamai import SarvamAI import os client = SarvamAI(api_subscription_key=os.getenv("SARVAM_API_KEY"))
Quick Start
Speech to Text
# File: examples/speech_to_text.py with open("audio.wav", "rb") as f: response = client.speech_to_text.transcribe( file=f, language_code="hi-IN" ) print(response.transcript)
Speech to Text Translate
# File: examples/speech_to_text_translate.py with open("audio.wav", "rb") as f: response = client.speech_to_text.translate(file=f, ) print(response.translation) # English output
Text to Speech
# File: examples/text_to_speech.py # bulbul:v3 (default) - 45 speakers: aditya, shubh, ritu, priya, neha, rahul, pooja, and more response = client.text_to_speech.convert( text="नमस्ते", target_language_code="hi-IN", speaker="shubh", pace=1.0, temperature=0.6 ) # Decode: base64.b64decode(response.audios[0])
Text Translation
# File: examples/text_translation.py response = client.text.translate( input="Hello", source_language_code="en-IN", target_language_code="hi-IN" ) print(response.translated_text)
Chat Completion
# File: examples/chat_completion.py response = client.chat.completions( messages=[{"role": "user", "content": "What is AI?"}], temperature=0.7 ) print(response.choices[0].message.content)
Document Intelligence
# File: examples/document_intelligence.py # Process documents in 23 languages (22 Indian + English) # Supports PDF, ZIP; outputs HTML or Markdown only (as ZIP) from sarvamai import SarvamAI client = SarvamAI(api_subscription_key=os.getenv("SARVAM_API_KEY")) # Step 1: Create a document intelligence job job = client.document_intelligence.create_job( language="hi-IN", output_format="md" ) print(f"Job created: {job.job_id}") # Step 2: Upload document job.upload_file("document.pdf") print("File uploaded") # Step 3: Start processing job.start() print("Job started") # Step 4: Wait for completion status = job.wait_until_complete() print(f"Job completed with state: {status.job_state}") # Step 5: Get processing metrics metrics = job.get_page_metrics() print(f"Page metrics: {metrics}") # Step 6: Download output (ZIP file containing the processed document) job.download_output("./output.zip") print("Output saved to ./output.zip")
Document Intelligence - Batch Processing (Large PDFs)
# File: examples/document_intelligence_batch.py # Process large PDFs by splitting into chunks and merging results # Automatically handles PDFs of any size (small, medium, large) from document_intelligence_batch import process_large_pdf # Automatically chooses best strategy: # ≤5 pages: Direct processing (no splitting) # >5 pages: Split into 5-page chunks, process, and merge output = process_large_pdf( input_pdf="large_document.pdf", # 25 pages → 5 chunks language="hi-IN", output_format="md", # or "html" pages_per_chunk=5, cleanup=True ) # Output: large_document_merged.md (all chunks merged in order)
Vision
# File: examples/vision.py # Image analysis: captioning, OCR, markdown extraction # Supports 23 languages (22 Indian + English) import requests # Option 1: Generate caption in Hindi files = {"file": ("image.jpg", open("image.jpg", "rb"), "image/jpeg")} data = {"prompt_type": "caption_in", "language": "hi-IN"} response = requests.post( "https://api.sarvam.ai/vision", headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")}, files=files, data=data ) print(response.json()['content']) # "एक सुंदर पहाड़ी दृश्य" # Option 2: Extract text (OCR) files = {"file": ("document.jpg", open("document.jpg", "rb"), "image/jpeg")} data = {"prompt_type": "default_ocr"} response = requests.post( "https://api.sarvam.ai/vision", headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")}, files=files, data=data ) print(response.json()['content']) # Option 3: Convert to markdown files = {"file": ("slide.jpg", open("slide.jpg", "rb"), "image/jpeg")} data = {"prompt_type": "extract_as_markdown"} response = requests.post( "https://api.sarvam.ai/vision", headers={"API-Subscription-Key": os.getenv("SARVAM_API_KEY")}, files=files, data=data ) print(response.json()['content'])
Supported Languages
Core 11 languages (all models): hi-IN, en-IN, bn-IN, gu-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN
Extended 22 languages (sarvam-translate:v1): + as-IN, brx-IN, doi-IN, kok-IN, ks-IN, mai-IN, mni-IN, ne-IN, sa-IN, sat-IN, sd-IN, ur-IN
Document Intelligence & Vision (23 languages) - sarvam-vision model supports all 22 Indian languages + English with their native scripts:
| Language | Code | Script | Language | Code | Script |
|---|---|---|---|---|---|
| Hindi | hi-IN | Devanagari | Assamese | as-IN | Assamese |
| Bengali | bn-IN | Bengali | Urdu | ur-IN | Perso-Arabic |
| Tamil | ta-IN | Tamil | Sanskrit | sa-IN | Devanagari |
| Telugu | te-IN | Telugu | Nepali | ne-IN | Devanagari |
| Marathi | mr-IN | Devanagari | Konkani | kok-IN | Devanagari |
| Gujarati | gu-IN | Gujarati | Maithili | mai-IN | Devanagari |
| Kannada | kn-IN | Kannada | Sindhi | sd-IN | Devanagari/Arabic |
| Malayalam | ml-IN | Malayalam | Kashmiri | ks-IN | Perso-Arabic |
| Odia | od-IN | Odia | Dogri | doi-IN | Devanagari |
| Punjabi | pa-IN | Gurmukhi | Manipuri | mni-IN | Meetei Mayek |
| English | en-IN | Latin | Bodo | brx-IN | Devanagari |
| Santali | sat-IN | Ol Chiki |
Repository Structure
sarvam-skills/ ├── SKILL.md # This file - Skill definition ├── FILE_STRUCTURE.md # Detailed file tree ├── README.md # Main documentation ├── CONTRIBUTING.md # Contribution guidelines ├── requirements.txt # Python dependencies ├── .env # Your API key (configured) ├── .env.example # Environment template │ ├── examples/ # Working code examples │ ├── speech_to_text.py # STT: Transcribe audio (saarika:v2.5, saaras:v3) │ ├── speech_to_text_translate.py # STT-Translate: Audio → English (saaras:v2.5) │ ├── text_to_speech.py # TTS: Text → Audio (bulbul:v3) │ ├── text_translation.py # Translation: 11-22 languages │ ├── chat_completion.py # Chat: sarvam-m model │ ├── document_intelligence.py # Document: Process small docs (≤5 pages) │ ├── document_intelligence_batch.py # Document: Batch processing for large PDFs │ ├── end_to_end_example.py # Multi-API workflows │ └── README.md # Examples documentation │ └── templates/ # Skill creation guides ├── API_VERSIONS.md # ✅ Cross-verified API endpoints (Feb 2026) ├── speech-to-text-template.md # STT skill creation ├── text-to-speech-template.md # TTS skill creation ├── text-translation-template.md # Translation skills ├── chat-completion-template.md # Chat AI skills ├── document-intelligence-template.md # Document processing skills ├── skill-template.md # General skill template └── README.md # Templates overview
Key locations:
- 6 working Python scripts demonstrating each APIexamples/
- 6 comprehensive guides (including API_VERSIONS.md)templates/
- Cross-verified API endpoints and versionstemplates/API_VERSIONS.md
- Your API key (already configured).env
- Python dependenciesrequirements.txt
Common Workflows
Multilingual Chatbot (STT → Translate → Chat → Translate → TTS)
See:
examples/end_to_end_example.py (lines 87-130)
Content Localization (Multi-language + Audio)
See:
examples/text_translation.py (Example 2, lines 60-90)
Voice Assistant (Voice → Text → AI → Voice)
See:
examples/end_to_end_example.py (lines 133-180)
Model Selection
Speech-to-Text: saarika:v2.5 (standard), saaras:v3 (5 modes: transcribe, translate, verbatim, translit, codemix)
Text-to-Speech: bulbul:v3 (45 speakers, temperature/pace control)
Translation: mayura:v1 (11 languages), sarvam-translate:v1 (22 languages)
Chat: sarvam-m only (24B parameters)
Document Intelligence: sarvam-vision (3B VLM, 23 languages, PDF/PNG/JPG input)
Best Practices
API Key: Use environment variables (
os.getenv("SARVAM_API_KEY"))Error Handling: Wrap API calls in try-except
Audio: Use WAV format, keep under 25MB
Translation: Use auto-detection when source unknown
Performance: Cache translations, batch requests
Bundled Resources
Examples (examples/
)
examples/Working Python scripts for each API endpoint:
- Audio transcriptionspeech_to_text.py
- Transcribe + translatespeech_to_text_translate.py
- Speech generationtext_to_speech.py
- Text translationtext_translation.py
- Conversational AIchat_completion.py
- Document processing (small PDFs, ≤5 pages)document_intelligence.py
- Batch processing for large PDFs (any size)document_intelligence_batch.py
- Multi-API workflowsend_to_end_example.py
See examples/README.md for detailed documentation.
Templates (templates/
)
templates/Comprehensive guides for building skills:
- ✅ Cross-verified API endpoints (Feb 2026)API_VERSIONS.md
- STT skill creationspeech-to-text-template.md
- TTS skill creationtext-to-speech-template.md
- Translation skillstext-translation-template.md
- Chat AI skillschat-completion-template.md
- Document processing skillsdocument-intelligence-template.md
- General skill templateskill-template.md
See templates/README.md and templates/API_VERSIONS.md for details.
Structure Reference
See FILE_STRUCTURE.md for complete file tree with sizes and use cases.
Resources
- Docs: https://docs.sarvam.ai
- Dashboard: https://dashboard.sarvam.ai
- Discord: https://discord.com/invite/5rAsykttcs
- Python SDK: https://pypi.org/project/sarvamai/