Indic-ai-skills sarvam-ai

Indian AI toolkit powered by Sarvam AI — text-to-speech, speech-to-text, document intelligence, translation, transliteration, language detection, and chat completion across 23 Indian languages. Use when working with Indian languages, Hindi/Tamil/Bengali text, Sarvam AI, or when the user needs translation, transcription, or TTS for South Asian languages.

install
source · Clone the upstream repo
git clone https://github.com/ankitjh4/indic-ai-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ankitjh4/indic-ai-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/sarvam-ai" ~/.claude/skills/ankitjh4-indic-ai-skills-sarvam-ai && rm -rf "$T"
manifest: skills/sarvam-ai/SKILL.md
source content

Sarvam AI — Indian Language Toolkit

Comprehensive AI toolkit for 23 Indian languages: TTS, STT, Document Intelligence, Translation, Transliteration, Language Detection, and Chat.

Setup

  1. Get a free API key at https://dashboard.sarvam.ai
  2. Set environment variable:
    export SARVAM_API_KEY="your-api-key"

Supported Languages

hi-IN
Hindi,
en-IN
English,
bn-IN
Bengali,
gu-IN
Gujarati,
kn-IN
Kannada,
ml-IN
Malayalam,
mr-IN
Marathi,
or-IN
/
od-IN
Odia,
pa-IN
Punjabi,
ta-IN
Tamil,
te-IN
Telugu,
ur-IN
Urdu,
as-IN
Assamese,
bodo-IN
/
brx-IN
Bodo,
doi-IN
Dogri,
ks-IN
Kashmiri,
kok-IN
Konkani,
mai-IN
Maithili,
mni-IN
Manipuri,
ne-IN
Nepali,
sa-IN
Sanskrit,
sat-IN
Santali,
sd-IN
Sindhi


Text-to-Speech

python3 scripts/tts.py "नमस्ते, आप कैसे हैं?" --language hi-IN --speaker meera
ParameterDefaultDescription
text
Text to convert (max 2500 chars)
--language
hi-INLanguage code
--speaker
meeraVoice name
--output
output.wavOutput file
--sample-rate
24000Audio sample rate

Speakers — Female: Meera, Priya, Neha, Simran, Kavya, Ishita, Shreya, and more. Male: Shubh, Aditya, Rahul, Amit, Dev, Arjun, and more.


Speech-to-Text

Three modes: REST (quick, <30s), WebSocket (real-time streaming), Batch (long audio, diarization).

# REST — quick transcription
python3 scripts/speech_to_text.py rest audio.mp3

# WebSocket — real-time streaming
python3 scripts/speech_to_text.py websocket audio.wav

# Batch — multiple files with speaker diarization
python3 scripts/speech_to_text.py batch audio1.mp3 audio2.mp3 --diarization --num-speakers 3 --output-dir ./transcripts/

Batch workflow: create job → upload files → start → poll status (Accepted → Pending → Running → Completed) → download results.

Formats: WAV, MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4/M4A, AMR, WMA, WebM, PCM


Document Intelligence

Extract text from PDFs and images (JPEG/PNG).

python3 scripts/document_intelligence.py document.pdf --language hi-IN --format md
python3 scripts/document_intelligence.py --job-id <id> --download -o ./output/

Formats:

md
(default),
html
,
json
. Max 200 MB, 500 pages.


Translation

# Auto-detect source, translate to Hindi
python3 scripts/text_processing.py translate "Hello, how are you?" --target hi-IN

# Mayura model with colloquial mode
python3 scripts/text_processing.py translate "What's up?" --target hi-IN --model mayura:v1 --mode modern-colloquial

Models:

sarvam-translate:v1
(23 languages),
mayura:v1
(12 languages, supports modes and transliteration)

Modes (mayura only):

formal
,
modern-colloquial
,
classic-colloquial
,
code-mixed


Transliteration

python3 scripts/text_processing.py transliterate "नमस्ते" --source hi-IN --target en-IN
python3 scripts/text_processing.py transliterate "namaste" --source en-IN --target hi-IN --spoken-form

Language Detection

python3 scripts/text_processing.py detect "नमस्ते दुনিয়া"
# Output: Language: hi-IN, Script: Deva

Chat Completion

Two models:

sarvam-105b
(flagship, complex reasoning) and
sarvam-m
(efficient, general chat).

python3 scripts/text_processing.py chat "Explain quantum computing" --model sarvam-105b
python3 scripts/text_processing.py chat "What is the capital of India?" --model sarvam-m --temperature 0.8

Resources