Claude-skill-registry lobe-tts
LobeTTS - High-quality TypeScript TTS/STT toolkit with EdgeSpeech, Microsoft, OpenAI engines, React hooks, audio visualization components, and both server and browser support
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/lobe-tts" ~/.claude/skills/majiayu000-claude-skill-registry-lobe-tts && rm -rf "$T"
skills/data/lobe-tts/SKILL.mdLobeTTS Skill
LobeTTS (@lobehub/tts) is a high-quality TypeScript-based toolkit for text-to-speech (TTS) and speech-to-text (STT) functionality. It supports usage both on the server-side and in the browser, offering developers an open-source alternative to proprietary TTS solutions with quality comparable to OpenAI's TTS service.
Key Value Proposition: Generate high-quality speech with minimal code (~15 lines), supporting multiple TTS engines (Edge, Microsoft, OpenAI) with React hooks and audio visualization components for seamless frontend integration.
When to Use This Skill
- Implementing text-to-speech in Node.js applications
- Adding speech synthesis to React/Next.js applications
- Building voice-enabled chatbots or assistants
- Creating audio players with visualization
- Implementing speech-to-text functionality
- Comparing or switching between TTS providers
- Building accessible applications with audio output
When NOT to Use This Skill
- For native mobile TTS (use platform-specific APIs)
- For real-time voice streaming (use WebRTC solutions)
- For voice cloning or custom voice training
- For offline TTS without internet (Edge/Microsoft require connectivity)
Core Concepts
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐ │ @lobehub/tts │ │ (TypeScript Library) │ └─────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ TTS Core │ │ React Hooks │ │ Components │ ├───────────────┤ ├───────────────┤ ├───────────────┤ │ EdgeSpeechTTS │ │ useEdgeSpeech │ │ AudioPlayer │ │ MicrosoftTTS │ │ useMicrosoft │ │ AudioVisualzr │ │ OpenAITTS │ │ useOpenAITTS │ │ │ │ OpenAISTT │ │ useOpenAISTT │ │ │ │ SpeechSynth │ │ useTTS │ │ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Server │ │ Browser │ │ Styling │ ├───────────────┤ ├───────────────┤ ├───────────────┤ │ • Node.js │ │ • React │ │ • Waveforms │ │ • Edge/Vercel │ │ • Next.js │ │ • Progress │ │ • File output │ │ • SPA │ │ • Controls │ └───────────────┘ └───────────────┘ └───────────────┘
Project Statistics
| Metric | Value |
|---|---|
| GitHub Stars | 695+ |
| Forks | 93+ |
| Contributors | 13+ |
| Releases | 100+ |
| Dependents | ~1,500 projects |
| Primary Language | TypeScript (98.8%) |
| License | MIT |
Supported TTS/STT Engines
| Engine | Type | Provider | Quality | Cost |
|---|---|---|---|---|
| EdgeSpeechTTS | TTS | Microsoft Edge | High | Free |
| MicrosoftTTS | TTS | Azure Cognitive | Very High | Paid |
| OpenAITTS | TTS | OpenAI | Premium | Paid |
| OpenAISTT | STT | OpenAI Whisper | Premium | Paid |
| SpeechSynthesisTTS | TTS | Browser Native | Variable | Free |
Installation
Package Installation
# pnpm (recommended) pnpm add @lobehub/tts # bun bun add @lobehub/tts # npm npm install @lobehub/tts # yarn yarn add @lobehub/tts
Important: This is an ESM-only package.
Next.js Configuration
Add to
next.config.js:
/** @type {import('next').NextConfig} */ const nextConfig = { transpilePackages: ['@lobehub/tts'], }; module.exports = nextConfig;
Node.js WebSocket Polyfill
For server-side usage, polyfill WebSocket:
import WebSocket from 'ws'; global.WebSocket = WebSocket;
Server-Side Usage
EdgeSpeechTTS (Free, High Quality)
import { EdgeSpeechTTS } from '@lobehub/tts'; import { Buffer } from 'buffer'; import fs from 'fs'; import path from 'path'; // Polyfill WebSocket for Node.js import WebSocket from 'ws'; global.WebSocket = WebSocket; // Initialize TTS engine const tts = new EdgeSpeechTTS({ locale: 'en-US' }); // Create speech payload const payload = { input: 'Hello! This is a speech demonstration using LobeTTS.', options: { voice: 'en-US-GuyNeural', }, }; // Generate speech const response = await tts.create(payload); // Save to file const mp3Buffer = Buffer.from(await response.arrayBuffer()); const speechFile = path.resolve('./speech.mp3'); fs.writeFileSync(speechFile, mp3Buffer); console.log(`Speech saved to: ${speechFile}`);
MicrosoftTTS (Azure)
import { MicrosoftSpeechTTS } from '@lobehub/tts'; const tts = new MicrosoftSpeechTTS({ locale: 'en-US', subscriptionKey: process.env.AZURE_SPEECH_KEY, region: process.env.AZURE_SPEECH_REGION, // e.g., 'eastus' }); const response = await tts.create({ input: 'Premium quality speech from Azure.', options: { voice: 'en-US-JennyNeural', style: 'cheerful', // emotional style rate: '1.0', pitch: '0%', }, });
OpenAI TTS
import { OpenAITTS } from '@lobehub/tts'; const tts = new OpenAITTS({ apiKey: process.env.OPENAI_API_KEY, }); const response = await tts.create({ input: 'OpenAI TTS provides natural-sounding speech.', options: { model: 'tts-1-hd', // 'tts-1' or 'tts-1-hd' voice: 'alloy', // alloy, echo, fable, onyx, nova, shimmer speed: 1.0, // 0.25 to 4.0 }, });
OpenAI STT (Speech-to-Text)
import { OpenAISTT } from '@lobehub/tts'; import fs from 'fs'; const stt = new OpenAISTT({ apiKey: process.env.OPENAI_API_KEY, }); // From file const audioBuffer = fs.readFileSync('./recording.mp3'); const result = await stt.create({ file: audioBuffer, options: { model: 'whisper-1', language: 'en', response_format: 'json', }, }); console.log('Transcription:', result.text);
API Routes (Next.js/Vercel)
Edge Speech API Route
// app/api/tts/edge/route.ts import { EdgeSpeechTTS } from '@lobehub/tts'; import { NextRequest, NextResponse } from 'next/server'; export async function POST(req: NextRequest) { const { text, voice = 'en-US-GuyNeural' } = await req.json(); const tts = new EdgeSpeechTTS({ locale: 'en-US' }); const response = await tts.create({ input: text, options: { voice }, }); const audioBuffer = await response.arrayBuffer(); return new NextResponse(audioBuffer, { headers: { 'Content-Type': 'audio/mpeg', 'Content-Length': audioBuffer.byteLength.toString(), }, }); }
OpenAI TTS API Route
// app/api/tts/openai/route.ts import { OpenAITTS } from '@lobehub/tts'; import { NextRequest, NextResponse } from 'next/server'; export async function POST(req: NextRequest) { const { text, voice = 'alloy', model = 'tts-1' } = await req.json(); const tts = new OpenAITTS({ apiKey: process.env.OPENAI_API_KEY!, }); const response = await tts.create({ input: text, options: { voice, model }, }); const audioBuffer = await response.arrayBuffer(); return new NextResponse(audioBuffer, { headers: { 'Content-Type': 'audio/mpeg', }, }); }
React Components
AudioPlayer Component
import { AudioPlayer, useAudioPlayer } from '@lobehub/tts/react'; interface TTSPlayerProps { audioUrl: string; } export default function TTSPlayer({ audioUrl }: TTSPlayerProps) { const { ref, isLoading, ...audio } = useAudioPlayer(audioUrl); return ( <AudioPlayer audio={audio} isLoading={isLoading} style={{ width: '100%' }} /> ); }
AudioVisualizer Component
import { AudioPlayer, AudioVisualizer, useAudioPlayer } from '@lobehub/tts/react'; import { Flexbox } from 'react-layout-kit'; interface AudioPlayerWithVisualizerProps { url: string; } export default function AudioPlayerWithVisualizer({ url }: AudioPlayerWithVisualizerProps) { const { ref, isLoading, ...audio } = useAudioPlayer(url); return ( <Flexbox align="center" gap={8}> <AudioPlayer audio={audio} isLoading={isLoading} style={{ width: '100%' }} /> <AudioVisualizer audioRef={ref} isLoading={isLoading} /> </Flexbox> ); }
Complete TTS Component
'use client'; import { useState } from 'react'; import { AudioPlayer, useAudioPlayer } from '@lobehub/tts/react'; export default function TextToSpeech() { const [text, setText] = useState(''); const [audioUrl, setAudioUrl] = useState<string | null>(null); const [loading, setLoading] = useState(false); const handleSpeak = async () => { setLoading(true); try { const response = await fetch('/api/tts/edge', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text }), }); const blob = await response.blob(); const url = URL.createObjectURL(blob); setAudioUrl(url); } finally { setLoading(false); } }; const { ref, isLoading, ...audio } = useAudioPlayer(audioUrl || ''); return ( <div> <textarea value={text} onChange={(e) => setText(e.target.value)} placeholder="Enter text to speak..." rows={4} /> <button onClick={handleSpeak} disabled={loading || !text}> {loading ? 'Generating...' : 'Speak'} </button> {audioUrl && ( <AudioPlayer audio={audio} isLoading={isLoading} style={{ width: '100%', marginTop: 16 }} /> )} </div> ); }
React Hooks
useEdgeSpeech
import { useEdgeSpeech } from '@lobehub/tts/react'; function EdgeSpeechComponent() { const { speak, stop, isLoading, error } = useEdgeSpeech({ locale: 'en-US', voice: 'en-US-AriaNeural', }); const handleSpeak = () => { speak('Hello from Edge Speech!'); }; return ( <div> <button onClick={handleSpeak} disabled={isLoading}> {isLoading ? 'Speaking...' : 'Speak'} </button> <button onClick={stop}>Stop</button> {error && <p>Error: {error.message}</p>} </div> ); }
useOpenAITTS
import { useOpenAITTS } from '@lobehub/tts/react'; function OpenAITTSComponent() { const { speak, stop, isLoading, audioUrl } = useOpenAITTS({ apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY, voice: 'nova', model: 'tts-1-hd', }); return ( <div> <button onClick={() => speak('Hello from OpenAI!')}> Speak </button> {audioUrl && <audio src={audioUrl} controls autoPlay />} </div> ); }
useOpenAISTT (Speech-to-Text)
import { useOpenAISTT } from '@lobehub/tts/react'; function SpeechToTextComponent() { const { startRecording, stopRecording, isRecording, transcript, error } = useOpenAISTT({ apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY, }); return ( <div> <button onClick={isRecording ? stopRecording : startRecording}> {isRecording ? 'Stop Recording' : 'Start Recording'} </button> {transcript && <p>Transcript: {transcript}</p>} {error && <p>Error: {error.message}</p>} </div> ); }
useAudioRecorder
import { useAudioRecorder } from '@lobehub/tts/react'; function AudioRecorderComponent() { const { startRecording, stopRecording, isRecording, audioBlob, audioUrl, duration, } = useAudioRecorder(); const handleSave = () => { if (audioBlob) { const a = document.createElement('a'); a.href = audioUrl!; a.download = 'recording.webm'; a.click(); } }; return ( <div> <button onClick={isRecording ? stopRecording : startRecording}> {isRecording ? `Stop (${duration}s)` : 'Record'} </button> {audioUrl && ( <> <audio src={audioUrl} controls /> <button onClick={handleSave}>Download</button> </> )} </div> ); }
useSpeechRecognition (Browser Native)
import { useSpeechRecognition } from '@lobehub/tts/react'; function BrowserSTTComponent() { const { startListening, stopListening, isListening, transcript, isSupported, } = useSpeechRecognition({ language: 'en-US', continuous: true, }); if (!isSupported) { return <p>Speech recognition not supported in this browser.</p>; } return ( <div> <button onClick={isListening ? stopListening : startListening}> {isListening ? 'Stop Listening' : 'Start Listening'} </button> <p>Transcript: {transcript}</p> </div> ); }
Voice Options
Edge Speech Voices (Free)
// Popular English voices const englishVoices = [ 'en-US-GuyNeural', // Male, casual 'en-US-AriaNeural', // Female, friendly 'en-US-JennyNeural', // Female, assistant 'en-US-DavisNeural', // Male, deep 'en-GB-SoniaNeural', // Female, British 'en-GB-RyanNeural', // Male, British 'en-AU-NatashaNeural', // Female, Australian ]; // Other languages const multilingualVoices = [ 'zh-CN-XiaoxiaoNeural', // Chinese, Female 'ja-JP-NanamiNeural', // Japanese, Female 'de-DE-KatjaNeural', // German, Female 'fr-FR-DeniseNeural', // French, Female 'es-ES-ElviraNeural', // Spanish, Female 'ko-KR-SunHiNeural', // Korean, Female ];
OpenAI Voices
const openaiVoices = [ 'alloy', // Neutral, versatile 'echo', // Warm, conversational 'fable', // Expressive, storytelling 'onyx', // Deep, authoritative 'nova', // Friendly, energetic 'shimmer', // Clear, professional ];
Configuration Options
EdgeSpeechTTS Options
interface EdgeSpeechTTSOptions { locale: string; // Language/region code pitch?: string; // Pitch adjustment (-50% to +50%) rate?: string; // Speed adjustment (0.5 to 2.0) volume?: string; // Volume adjustment (0 to 100) } const tts = new EdgeSpeechTTS({ locale: 'en-US', }); const payload = { input: 'Text to speak', options: { voice: 'en-US-AriaNeural', pitch: '+5%', rate: '1.2', volume: '80', }, };
MicrosoftTTS Options
interface MicrosoftSpeechTTSOptions { locale: string; subscriptionKey: string; region: string; } interface MicrosoftSpeakOptions { voice: string; style?: string; // cheerful, sad, angry, etc. styleDegree?: number; // 0.01 to 2 role?: string; // Girl, Boy, etc. rate?: string; pitch?: string; }
OpenAI TTS Options
interface OpenAITTSOptions { apiKey: string; baseUrl?: string; // Custom API endpoint } interface OpenAISpeakOptions { model: 'tts-1' | 'tts-1-hd'; voice: 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer'; speed?: number; // 0.25 to 4.0 response_format?: 'mp3' | 'opus' | 'aac' | 'flac'; }
Streaming Support
Stream TTS Response
import { EdgeSpeechTTS } from '@lobehub/tts'; const tts = new EdgeSpeechTTS({ locale: 'en-US' }); // Get stream instead of buffer const response = await tts.createStream({ input: 'Long text to stream...', options: { voice: 'en-US-GuyNeural' }, }); // Process stream const reader = response.body?.getReader(); if (reader) { while (true) { const { done, value } = await reader.read(); if (done) break; // Process chunk console.log('Received chunk:', value.length, 'bytes'); } }
Error Handling
import { EdgeSpeechTTS } from '@lobehub/tts'; async function generateSpeech(text: string) { try { const tts = new EdgeSpeechTTS({ locale: 'en-US' }); const response = await tts.create({ input: text, options: { voice: 'en-US-GuyNeural' }, }); return Buffer.from(await response.arrayBuffer()); } catch (error) { if (error instanceof Error) { if (error.message.includes('WebSocket')) { console.error('WebSocket connection failed. Check network.'); } else if (error.message.includes('voice')) { console.error('Invalid voice selection.'); } else { console.error('TTS Error:', error.message); } } throw error; } }
Integration Examples
With LobeChat
// LobeTTS is used internally by LobeChat for voice features import { EdgeSpeechTTS } from '@lobehub/tts'; // LobeChat voice assistant pattern async function speakAssistantResponse(message: string) { const tts = new EdgeSpeechTTS({ locale: 'en-US' }); const audio = await tts.create({ input: message, options: { voice: 'en-US-AriaNeural' }, }); return audio; }
With AI Chatbot
// Complete voice-enabled chatbot import { useOpenAITTS, useOpenAISTT } from '@lobehub/tts/react'; function VoiceChatbot() { const { speak, isLoading: isSpeaking } = useOpenAITTS({ apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY!, voice: 'nova', }); const { startRecording, stopRecording, isRecording, transcript, } = useOpenAISTT({ apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY!, }); const handleChat = async () => { stopRecording(); // Send transcript to AI const response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ message: transcript }), }); const { reply } = await response.json(); // Speak AI response speak(reply); }; return ( <div> <button onClick={isRecording ? handleChat : startRecording}> {isRecording ? 'Send' : 'Record'} </button> {transcript && <p>You said: {transcript}</p>} </div> ); }
Best Practices
Performance
- Reuse TTS instances: Create once, use multiple times
- Stream for long text: Use streaming for text >500 characters
- Cache audio: Store generated audio to avoid regeneration
- Preload voices: Initialize TTS early in app lifecycle
Quality
- Choose appropriate voice: Match voice to content tone
- Use HD models: OpenAI tts-1-hd for critical content
- Add SSML: For precise pronunciation control
- Test across browsers: Verify audio playback compatibility
Cost Optimization
- Use Edge TTS: Free and high quality for most use cases
- Batch requests: Combine short texts when possible
- Implement caching: Hash text → cached audio
- Rate limit: Prevent abuse in production
Troubleshooting
"WebSocket is not defined"
// Add to Node.js entry point import WebSocket from 'ws'; global.WebSocket = WebSocket;
"Module not found: @lobehub/tts"
// next.config.js module.exports = { transpilePackages: ['@lobehub/tts'], };
Audio not playing in browser
// Ensure user interaction before playing document.addEventListener('click', () => { audio.play(); }, { once: true });
CORS errors with API routes
// Add CORS headers return new NextResponse(audioBuffer, { headers: { 'Content-Type': 'audio/mpeg', 'Access-Control-Allow-Origin': '*', }, });
Resources
Official Links
- GitHub: https://github.com/lobehub/lobe-tts
- npm: https://www.npmjs.com/package/@lobehub/tts
- License: MIT
Related LobeHub Projects
- LobeChat: Extensible chatbot framework
- LobeUI: Component library for AI apps
- LobeIcons: AI/LLM brand logos
- Lobei18n: Internationalization tool
Voice Resources
Version History
- Current: 100+ releases on npm
- Tech Stack: TypeScript, React, ESM
- License: MIT © 2023 LobeHub
Last Updated: 2026-01-13 Skill Version: 1.0.0 Source: https://github.com/lobehub/lobe-tts