Skills smallest-ai

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abhishekmishragithub/smallest-ai" ~/.claude/skills/clawdbot-skills-smallest-ai && rm -rf "$T"

manifest: skills/abhishekmishragithub/smallest-ai/SKILL.md

source content

Smallest AI — Ultra-Fast Voice Suite

Text-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.

Setup

Get API key from https://waves.smallest.ai → click "API Key" in left panel
Set
```
SMALLEST_API_KEY
```
in your environment:

export SMALLEST_API_KEY="your_key_here"

Defaults

Default female voice:
```
sophia
```
(American English)
Default male voice:
```
robert
```
(American English)
Default language:
```
en
```
Default speed:
```
1.0
```
Default sample rate:
```
24000
```

Voice Selection Rules

Follow these rules to select the voice:

If user explicitly names a voice (e.g. "use advika"), use that voice.
If user asks for a male voice, use the configured
```
defaultVoiceMale
```
.
If user asks for a female voice, use the configured
```
defaultVoiceFemale
```
.
If no gender preference, use
```
defaultVoiceFemale
```
(sophia by default).
For Hindi content: use
```
advika
```
(female) or
```
vivaan
```
(male).
For Spanish content: use
```
camilla
```
(female) or
```
carlos
```
(male).
For Tamil content: use
```
anitha
```
(female) or
```
raju
```
(male).

Always pass the configured

defaultLanguage

defaultSpeed

, and

defaultSampleRate

--lang

--speed

, and

--rate

flags unless the user overrides them.

Text-to-Speech

Generate speech audio from text using Lightning v3.1 model.

Shell (preferred — zero dependencies)

{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en

Python (requires

pip install smallestai

or just

requests

)

python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav

Voices

Voice	Gender	Accent	Best For
sophia	Female	American	General use (default)
robert	Male	American	Professional, reports (default)
advika	Female	Indian	Hindi content, code-switch
vivaan	Male	Indian	Bilingual English/Hindi
camilla	Female	Mexican/Latin	Spanish content
zara	Female	American	Conversational
melody	Female	American	Storytelling, greetings
arjun	Male	Indian	English/Hindi bilingual
stella	Female	American	Expressive, warm

80+ more voices available. List all with:

{baseDir}/scripts/voices.sh

Options

```
--voice <id>
```
: Voice identifier (default: sophia)
```
--rate <hz>
```
: Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)
```
--speed <n>
```
: Playback speed 0.5–2.0 (default: 1.0)

--lang <code>

: Language code (default: en). See

{baseDir}/references/languages.md

```
--out <path>
```
: Output file (default: auto-named
```
media/tts_<timestamp>.wav
```
)

Output

Scripts print

MEDIA: <filepath>

on success. OpenClaw sends this as an audio attachment.

Multilingual

Supports 30+ languages. Pass

--lang

with ISO code:

{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es

Code-switching (mixing languages) works automatically — no flag needed:

{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi

Speech-to-Text

Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.

Shell

{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions

Python

python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en

Options

```
--lang <code>
```
: Language (default: en)
```
--diarize
```
: Identify different speakers
```
--timestamps
```
: Word-level timing
```
--emotions
```
: Detect emotional tone

Output

Returns JSON with

transcription

field. With

--diarize

, includes speaker labels per word.

When to Use

Trigger this skill when the user:

Asks to "say", "speak", "read aloud", or "generate speech/audio"
Wants a "voice message", "voice note", or "audio file"
Asks to "transcribe", "convert speech/audio to text"
Mentions "Smallest AI", "Lightning TTS", or "Pulse STT"
Needs fast or low-latency speech generation
Wants Hindi, Spanish, multilingual, or code-switched voice output
Asks to compare TTS providers or benchmark latency

Error Handling

Missing API key → tell user to set
```
SMALLEST_API_KEY
```
HTTP 401 → invalid or expired API key
HTTP 429 → rate limited, wait and retry
HTTP 400 → check text length (max ~5000 chars per request). Split long text into chunks.
Empty audio → verify voice_id is valid

Limits

Max text per request: ~5000 characters
For longer text: split into sentences, synthesize each, concatenate with sox or ffmpeg
Free tier: 30 minutes/month of TTS
Basic ($5/mo): 3 hours of TTS + 1 voice clone

Skills smallest-ai

Smallest AI — Ultra-Fast Voice Suite

Setup

Defaults

Voice Selection Rules

Text-to-Speech

Shell (preferred — zero dependencies)

Python (requires pip install smallestai or just requests)

Voices

Options

Output

Multilingual

Speech-to-Text

Shell

Python

Options

Output

When to Use

Error Handling

Limits

Python (requires
`pip install smallestai`
or just
`requests`
)