Rei-skills podcast-generation
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio...
install
source · Clone the upstream repo
git clone https://github.com/rootcastleco/rei-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/rootcastleco/rei-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/podcast-generation" ~/.claude/skills/rootcastleco-rei-skills-podcast-generation && rm -rf "$T"
manifest:
skills/podcast-generation/SKILL.mdsource content
Podcast Generation with GPT Realtime Mini
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
Quick Start
- Configure environment variables for Realtime API
- Connect via WebSocket to Azure OpenAI Realtime endpoint
- Send text prompt, collect PCM audio chunks + transcript
- Convert PCM to WAV format
- Return base64-encoded audio to frontend for playback
Environment Configuration
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
Note: Endpoint should NOT include
/openai/v1/ - just the base URL.
Core Workflow
Backend Audio Generation
from openai import AsyncOpenAI import base64 # Convert HTTPS endpoint to WebSocket URL ws_url = endpoint.replace("https://", "wss://") + "/openai/v1" client = AsyncOpenAI( websocket_base_url=ws_url, api_key=api_key ) audio_chunks = [] transcript_parts = [] async with client.realtime.connect(model="gpt-realtime-mini") as conn: # Configure for audio-only output await conn.session.update(session={ "output_modalities": ["audio"], "instructions": "You are a narrator. Speak naturally." }) # Send text to narrate await conn.conversation.item.create(item={ "type": "message", "role": "user", "content": [{"type": "input_text", "text": prompt}] }) await conn.response.create() # Collect streaming events async for event in conn: if event.type == "response.output_audio.delta": audio_chunks.append(base64.b64decode(event.delta)) elif event.type == "response.output_audio_transcript.delta": transcript_parts.append(event.delta) elif event.type == "response.done": break # Convert PCM to WAV (see scripts/pcm_to_wav.py) pcm_audio = b''.join(audio_chunks) wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
Frontend Audio Playback
// Convert base64 WAV to playable blob const base64ToBlob = (base64, mimeType) => { const bytes = atob(base64); const arr = new Uint8Array(bytes.length); for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i); return new Blob([arr], { type: mimeType }); }; const audioBlob = base64ToBlob(response.audio_data, 'audio/wav'); const audioUrl = URL.createObjectURL(audioBlob); new Audio(audioUrl).play();
Voice Options
| Voice | Character |
|---|---|
| alloy | Neutral |
| echo | Warm |
| fable | Expressive |
| onyx | Deep |
| nova | Friendly |
| shimmer | Clear |
Realtime API Events
- Base64 audio chunkresponse.output_audio.delta
- Transcript textresponse.output_audio_transcript.delta
- Generation completeresponse.done
- Handle witherrorevent.error.message
Audio Format
- Input: Text prompt
- Output: PCM audio (24kHz, 16-bit, mono)
- Storage: Base64-encoded WAV
References
- Full architecture: See references/architecture.md for complete stack design
- Code examples: See references/code-examples.md for production patterns
- PCM conversion: Use scripts/pcm_to_wav.py for audio format conversion
When to Use
This skill is applicable to execute the workflow or actions described in the overview.
🏰 Rei Skills — Curated by Rootcastle Engineering & Innovation | Batuhan Ayrıbaş
Engineering Beyond Boundaries | admin@rootcastle.com