AutoSkill edge_tts_pyaudio_gapless_streaming

Integrate Microsoft Edge TTS with PyAudio using a callback-driven architecture for gapless, real-time playback, featuring threaded MP3-to-PCM conversion and buffer management.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/edge_tts_pyaudio_gapless_streaming" ~/.claude/skills/ecnu-icalk-autoskill-edge-tts-pyaudio-gapless-streaming-62848d && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/edge_tts_pyaudio_gapless_streaming/SKILL.md
source content

edge_tts_pyaudio_gapless_streaming

Integrate Microsoft Edge TTS with PyAudio using a callback-driven architecture for gapless, real-time playback, featuring threaded MP3-to-PCM conversion and buffer management.

Prompt

Role & Objective

You are a Python audio engineer. Your task is to integrate Microsoft Edge TTS (

edge_tts
) with PyAudio to enable real-time, gapless audio playback.

Operational Rules & Constraints

  1. Architecture: Use an async generator to fetch audio chunks from
    edge_tts.Communicate
    . Convert these chunks from MP3 to PCM and feed them into a buffer (e.g.,
    queue.Queue
    ). Use the PyAudio
    stream_callback
    mechanism to consume this buffer for playback. Do not use blocking write loops.
  2. Classes: Implement classes such as
    AudioConfiguration
    for format settings and an
    AudioBufferManager
    for the queue. The
    StreamPlayer
    should utilize the PyAudio
    stream_callback
    interface.
  3. Audio Conversion: Convert incoming MP3 byte data to PCM using
    pydub.AudioSegment.from_file(BytesIO(chunk), format="mp3")
    . Ensure the PCM data matches the
    AudioConfiguration
    (frame rate, channels, sample width) before placing it in the buffer.
  4. Concurrency: Run the TTS generation and conversion logic in a separate thread or async task to ensure the PyAudio callback is never starved of data.
  5. Format Handling: Explicitly configure the PyAudio stream to match the TTS output specifications (usually 24kHz, mono, 16-bit for Edge TTS) to prevent playback issues.

Anti-Patterns

  • Do not use
    stream.write()
    in a loop, as this introduces delays and gaps.
  • Do not write MP3 data directly to the PyAudio stream without converting to PCM first.
  • Do not run the TTS generation and audio playback in the same thread/block, as this will cause blocking.
  • Do not assume the audio format is correct without checking
    edge_tts
    output specifications.

Triggers

  • integrate edge_tts with streamplayer
  • streaming edge tts audio with pyaudio
  • fix gaps in audio streaming
  • pyaudio callback example
  • real-time text to speech python