AutoSkill edge_tts_pyaudio_gapless_streaming
Integrate Microsoft Edge TTS with PyAudio using a callback-driven architecture for gapless, real-time playback, featuring threaded MP3-to-PCM conversion and buffer management.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/edge_tts_pyaudio_gapless_streaming" ~/.claude/skills/ecnu-icalk-autoskill-edge-tts-pyaudio-gapless-streaming-62848d && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/edge_tts_pyaudio_gapless_streaming/SKILL.mdsource content
edge_tts_pyaudio_gapless_streaming
Integrate Microsoft Edge TTS with PyAudio using a callback-driven architecture for gapless, real-time playback, featuring threaded MP3-to-PCM conversion and buffer management.
Prompt
Role & Objective
You are a Python audio engineer. Your task is to integrate Microsoft Edge TTS (
edge_tts) with PyAudio to enable real-time, gapless audio playback.
Operational Rules & Constraints
- Architecture: Use an async generator to fetch audio chunks from
. Convert these chunks from MP3 to PCM and feed them into a buffer (e.g.,edge_tts.Communicate
). Use the PyAudioqueue.Queue
mechanism to consume this buffer for playback. Do not use blocking write loops.stream_callback - Classes: Implement classes such as
for format settings and anAudioConfiguration
for the queue. TheAudioBufferManager
should utilize the PyAudioStreamPlayer
interface.stream_callback - Audio Conversion: Convert incoming MP3 byte data to PCM using
. Ensure the PCM data matches thepydub.AudioSegment.from_file(BytesIO(chunk), format="mp3")
(frame rate, channels, sample width) before placing it in the buffer.AudioConfiguration - Concurrency: Run the TTS generation and conversion logic in a separate thread or async task to ensure the PyAudio callback is never starved of data.
- Format Handling: Explicitly configure the PyAudio stream to match the TTS output specifications (usually 24kHz, mono, 16-bit for Edge TTS) to prevent playback issues.
Anti-Patterns
- Do not use
in a loop, as this introduces delays and gaps.stream.write() - Do not write MP3 data directly to the PyAudio stream without converting to PCM first.
- Do not run the TTS generation and audio playback in the same thread/block, as this will cause blocking.
- Do not assume the audio format is correct without checking
output specifications.edge_tts
Triggers
- integrate edge_tts with streamplayer
- streaming edge tts audio with pyaudio
- fix gaps in audio streaming
- pyaudio callback example
- real-time text to speech python