Skills discord-voice
Real-time voice conversations in Discord voice channels with Claude AI
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/basillytton/basil-voice" ~/.claude/skills/clawdbot-skills-discord-voice-0eabfc && rm -rf "$T"
skills/basillytton/basil-voice/SKILL.mdDiscord Voice Plugin for Clawdbot
Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.
Features
- Join/Leave Voice Channels: Via slash commands, CLI, or agent tool
- Voice Activity Detection (VAD): Automatically detects when users are speaking
- Speech-to-Text: Whisper API (OpenAI), Deepgram, or Local Whisper (Offline)
- Streaming STT: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
- Agent Integration: Transcribed speech is routed through the Clawdbot agent
- Text-to-Speech: OpenAI TTS, ElevenLabs, or Kokoro (Local/Offline)
- Audio Playback: Responses are spoken back in the voice channel
- Barge-in Support: Stops speaking immediately when user starts talking
- Auto-reconnect: Automatic heartbeat monitoring and reconnection on disconnect
Requirements
- Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
- API keys for STT and TTS providers
- System dependencies for voice:
(audio processing)ffmpeg- Native build tools for
and@discordjs/opussodium-native
Installation
1. Install System Dependencies
# Ubuntu/Debian sudo apt-get install ffmpeg build-essential python3 # Fedora/RHEL sudo dnf install ffmpeg gcc-c++ make python3 # macOS brew install ffmpeg
2. Install via ClawdHub
clawdhub install discord-voice
Or manually:
cd ~/.clawdbot/extensions git clone <repository-url> discord-voice cd discord-voice npm install
3. Configure in clawdbot.json
{ plugins: { entries: { "discord-voice": { enabled: true, config: { sttProvider: "local-whisper", ttsProvider: "openai", ttsVoice: "nova", vadSensitivity: "medium", allowedUsers: [], // Empty = allow all users silenceThresholdMs: 1500, maxRecordingMs: 30000, openai: { apiKey: "sk-...", // Or use SKILLBOSS_API_KEY env var }, }, }, }, }, }
4. Discord Bot Setup
Ensure your Discord bot has these permissions:
- Connect - Join voice channels
- Speak - Play audio
- Use Voice Activity - Detect when users speak
Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.
Configuration
| Option | Type | Default | Description |
|---|---|---|---|
| boolean | | Enable/disable the plugin |
| string | | , , or |
| boolean | | Use streaming STT (Deepgram only, ~1s faster) |
| string | | or |
| string | | Voice ID for TTS |
| string | | , , or |
| boolean | | Stop speaking when user talks |
| string[] | | User IDs allowed (empty = all) |
| number | | Silence before processing (ms) |
| number | | Max recording length (ms) |
| number | | Connection health check interval |
| string | | Channel ID to auto-join on startup |
Provider Configuration
OpenAI (Whisper + TTS)
{ openai: { apiKey: "sk-...", whisperModel: "whisper-1", ttsModel: "tts-1", }, }
ElevenLabs (TTS only)
{ elevenlabs: { apiKey: "...", voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel modelId: "eleven_multilingual_v2", }, }
Deepgram (STT only)
{ deepgram: { apiKey: "...", model: "nova-2", }, }
Usage
Slash Commands (Discord)
Once registered with Discord, use these commands:
- Join a voice channel/discord_voice join <channel>
- Leave the current voice channel/discord_voice leave
- Show voice connection status/discord_voice status
CLI Commands
# Join a voice channel clawdbot discord_voice join <channelId> # Leave voice clawdbot discord_voice leave --guild <guildId> # Check status clawdbot discord_voice status
Agent Tool
The agent can use the
discord_voice tool:
Join voice channel 1234567890
The tool supports actions:
- Join a voice channel (requires channelId)join
- Leave voice channelleave
- Speak text in the voice channelspeak
- Get current voice statusstatus
How It Works
- Join: Bot joins the specified voice channel
- Listen: VAD detects when users start/stop speaking
- Record: Audio is buffered while user speaks
- Transcribe: On silence, audio is sent to STT provider
- Process: Transcribed text is sent to Clawdbot agent
- Synthesize: Agent response is converted to audio via TTS
- Play: Audio is played back in the voice channel
Streaming STT (Deepgram)
When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:
- ~1 second faster end-to-end latency
- Real-time feedback with interim transcription results
- Automatic keep-alive to prevent connection timeouts
- Fallback to batch transcription if streaming fails
To use streaming STT:
{ sttProvider: "deepgram", streamingSTT: true, // default deepgram: { apiKey: "...", model: "nova-2", }, }
Barge-in Support
When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.
To disable (let the bot finish speaking):
{ bargeIn: false, }
Auto-reconnect
The plugin includes automatic connection health monitoring:
- Heartbeat checks every 30 seconds (configurable)
- Auto-reconnect on disconnect with exponential backoff
- Max 3 attempts before giving up
If the connection drops, you'll see logs like:
[discord-voice] Disconnected from voice channel [discord-voice] Reconnection attempt 1/3 [discord-voice] Reconnected successfully
VAD Sensitivity
- low: Picks up quiet speech, may trigger on background noise
- medium: Balanced (recommended)
- high: Requires louder, clearer speech
Troubleshooting
"Discord client not available"
Ensure the Discord channel is configured and the bot is connected before using voice.
Opus/Sodium build errors
Install build tools:
npm install -g node-gyp npm rebuild @discordjs/opus sodium-native
No audio heard
- Check bot has Connect + Speak permissions
- Check bot isn't server muted
- Verify TTS API key is valid
Transcription not working
- Check STT API key is valid
- Check audio is being recorded (see debug logs)
- Try adjusting VAD sensitivity
Enable debug logging
DEBUG=discord-voice clawdbot gateway start
Environment Variables
| Variable | Description |
|---|---|
| Discord bot token (required) |
| SkillBoss API key (Whisper/TTS via Hub) |
| Deepgram API key (streaming STT only) |
Limitations
- Only one voice channel per guild at a time
- Maximum recording length: 30 seconds (configurable)
- Requires stable network for real-time audio
- TTS output may have slight delay due to synthesis
License
MIT