Awesome-omni-skill add-tts
Adds text-to-speech audio help to a feature using the TTS system. Use when adding voice narration, audio feedback, or spoken instructions to any part of the app.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/add-tts-antialias" ~/.claude/skills/diegosouzapw-awesome-omni-skill-add-tts && rm -rf "$T"
skills/development/add-tts-antialias/SKILL.mdAdding TTS Audio to a Feature
This skill walks you through adding text-to-speech audio to a feature in the app. The TTS system plays audio via a voice chain (pregenerated mp3 → on-demand generation via OpenAI → browser SpeechSynthesis → subtitles). Clips are collected at runtime, persisted to the database, and generated as high-quality OpenAI TTS mp3s — either on-the-fly during playback (if the
generate chain entry is configured) or in batch from the admin panel.
Before You Start
Read the integration guide:
apps/web/.claude/reference/tts-audio-system.md
It contains the full API reference, patterns, anti-patterns, and existing implementations.
Your Job
- Understand what text the feature needs spoken and when
- Create a feature-specific audio hook
- Wire it into the component
- Verify with TypeScript
Step 1: Design the Utterances
For each piece of audio the feature needs, determine:
- What text to speak (static string or dynamic from state/props)
- When to speak it (on mount, on state change, on user action, on completion)
- How it should sound (tone — write as voice-actor stage directions)
Step 2: Create a Feature Audio Hook
Create a hook in the feature's
hooks/ directory. This hook owns text construction, tone strings, auto-play logic, and cleanup.
Read the reference implementation first:
apps/web/src/components/practice/hooks/usePracticeAudioHelp.ts
Key rules:
- Tone strings must be module-level constants — never compute them dynamically per render
- Always clean up on unmount — call
in a cleanup effectstop() - Use refs to track previous values — prevents re-playing on every render
- Guard with
— respect the user's audio toggleisEnabled
Template:
'use client' import { useEffect, useRef } from 'react' import { useTTS } from '@/hooks/useTTS' import { useAudioManager } from '@/hooks/useAudioManager' // Stable tone constants — changing these creates new clips const INSTRUCTION_TONE = 'Patiently guiding a young child. Clear, slow, friendly.' const CELEBRATION_TONE = 'Warmly congratulating a child. Genuinely encouraging and happy.' interface UseMyFeatureAudioHelpOptions { currentStep: string isComplete: boolean } export function useMyFeatureAudioHelp({ currentStep, isComplete, }: UseMyFeatureAudioHelpOptions) { const { isEnabled, stop } = useAudioManager() // Declare utterances const sayInstruction = useTTS(currentStep, { tone: INSTRUCTION_TONE }) const sayCelebration = useTTS( isComplete ? 'Well done!' : '', { tone: CELEBRATION_TONE }, ) // Auto-play when step changes const prevStepRef = useRef<string>('') useEffect(() => { if (!isEnabled || !currentStep || currentStep === prevStepRef.current) return prevStepRef.current = currentStep sayInstruction() }, [isEnabled, currentStep, sayInstruction]) // Auto-play celebration on completion useEffect(() => { if (!isEnabled || !isComplete) return sayCelebration() }, [isEnabled, isComplete, sayCelebration]) // Stop audio on unmount useEffect(() => { return () => stop() }, [stop]) return { replay: sayInstruction } }
Step 3: Wire Into the Component
import { useMyFeatureAudioHelp } from './hooks/useMyFeatureAudioHelp' import { useAudioManager } from '@/hooks/useAudioManager' function MyFeature() { const { isEnabled, isPlaying } = useAudioManager() const { replay } = useMyFeatureAudioHelp({ currentStep: 'Tap the bead to move it up', isComplete: false, }) return ( <div> {isEnabled && ( <button onClick={replay} disabled={isPlaying}> {isPlaying ? 'Speaking...' : 'Replay'} </button> )} </div> ) }
Step 4: Verify
cd apps/web && npx tsc --noEmit
Common Patterns
Dynamic text from state
const text = useMemo( () => (terms ? termsToSentence(terms) : ''), [terms], ) const sayProblem = useTTS(text, { tone: MATH_TONE })
One-shot playback (play once, don't repeat)
const playedRef = useRef(false) useEffect(() => { if (!shouldPlay || playedRef.current) return playedRef.current = true sayIt() }, [shouldPlay, sayIt]) // Reset when trigger resets useEffect(() => { if (!shouldPlay) playedRef.current = false }, [shouldPlay])
Multiple utterances — play the right one
const sayStep1 = useTTS('First, look at the abacus', { tone: INST }) const sayStep2 = useTTS('Now tap the bead', { tone: INST }) // speak() stops previous before starting if (step === 0) sayStep1() if (step === 1) sayStep2()
Tone String Guidelines
Write tones as voice-actor stage directions. Be specific about emotion, pace, and audience.
Good examples:
'Speaking clearly and steadily, reading a math problem to a young child. Pause slightly between each number and operator.''Warmly congratulating a child. Genuinely encouraging and happy.''Gently guiding a child after a wrong answer. Kind, not disappointed.''Patiently guiding a young child through an abacus tutorial. Clear, slow, friendly.'
Bad examples:
— too vague'Read this text'
— dynamic per render, creates new clips every time`Speaking ${mood}`
Anti-Patterns to Avoid
- Never use raw
— always go throughspeechSynthesis
so the voice chain and collection workuseTTS - Never forget cleanup — always
useEffect(() => () => stop(), [stop]) - Never use dynamic tone strings — keep them as module-level constants
- Never call
unconditionally in render — always guard with refs andspeak()isEnabled
Key Files
| File | Role |
|---|---|
| Primary hook — declare (text, tone), get speak function |
| Reactive state — isEnabled, isPlaying, volume, subtitles, stop() |
| Core engine — voice chain, playback, collection, subtitles |
| Voice source class hierarchy — polymorphic per voice type |
| React context — singleton manager, boot-time manifest loading |
| → |
| Correct/incorrect feedback sentences |
| → |
Voice Chain
Audio plays through the voice chain in order. The typical chain is:
pregenerated voice (nova) → auto-generate → browser TTS → subtitles
- Pregenerated: instant playback from pre-generated mp3 on disk
- Auto-generate: calls OpenAI on-the-fly if the pregenerated mp3 is missing, caches result
- Browser TTS: uses the browser's built-in speech synthesis
- Subtitles: shows text on screen with a reading-time timer
You don't need to think about this when adding TTS to a feature — just use
useTTS() and the chain handles fallback automatically. The admin configures the chain at /admin/audio.
Reference Implementations
| Hook | Location | What it does |
|---|---|---|
| | Reads math problems, correct/incorrect feedback |
| | Speaks tutorial step instructions |
Follow
usePracticeAudioHelp as the most complete example.