Codymaster cm-readit

Turn any website into an audio-enabled experience. Covers TTS reading mode (SpeechSynthesis API), pre-recorded MP3 audio player, and Voice CRO trigger system. Zero dependencies, works on any static or dynamic site. Use when adding read-aloud, audio player, or voice-based conversion features.

install
source · Clone the upstream repo
git clone https://github.com/tody-agent/codymaster
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tody-agent/codymaster "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/cm-readit" ~/.claude/skills/tody-agent-codymaster-cm-readit && rm -rf "$T"
manifest: skills/cm-readit/SKILL.md
source content

CM ReadIt — Web Audio Experience Skill

Philosophy: Reading is passive. Listening is intimate. Voice builds trust faster than any headline. Core Principle: Zero dependencies. Progressive enhancement. Respect user's device and preferences.


🎯 Selective Reading Rule (MANDATORY)

FileStatusWhen to Read
tts-engine.md🔴 REQUIREDAdding TTS / read-aloud to any page
audio-player.md⚪ OptionalPre-recorded MP3 playback
voice-cro.md⚪ OptionalTrigger-based voice sales / CRO
ui-patterns.md⚪ OptionalPlayer bar & bottom sheet design

🔴 tts-engine.md = ALWAYS READ when implementing TTS. Others = only if relevant.


Quick Decision Tree

"I need audio on my website"
│
├─ Read article content aloud (text-to-speech)
│  └─ Use: TTS Engine → tts-engine.md
│     ├─ Blog / article pages → Content Reader pattern
│     ├─ Documentation → Section Reader pattern
│     └─ E-commerce → Product Description Reader pattern
│
├─ Play pre-recorded audio files (MP3/WAV)
│  └─ Use: Audio Player → audio-player.md
│     ├─ Podcasts / interviews → Playlist pattern
│     ├─ Sales pitch / welcome → Triggered playback
│     └─ Background ambient → Loop pattern
│
├─ Voice-based conversion optimization (CRO)
│  └─ Use: Voice CRO → voice-cro.md
│     ├─ Landing pages → Trigger-based bottom sheet
│     ├─ Service pages → Per-page audio scripts
│     └─ Course pages → Social proof audio
│
└─ Combination (TTS + CRO)
   └─ Read tts-engine.md + voice-cro.md
      └─ Ensure no conflict (TTS reader vs CRO player)

🧠 Core Principles (Internalize These)

1. The 3 Audio Engines

EngineAPISourceBest For
TTS Reader
SpeechSynthesis
Page text contentBlogs, articles, docs
Audio Player
HTMLAudioElement
Pre-recorded MP3Sales, podcasts, guides
Voice CRO
Audio
+ triggers
MP3 + behavior detectionLanding pages, sales

2. Progressive Enhancement

Feature detection → Graceful degradation → Never break the page

if (!('speechSynthesis' in window)) return;  // TTS
if (!window.Audio) return;                    // Audio

Rule: Audio features are ENHANCEMENTS. The page must function 100% without them.

3. Content Extraction Principle

Clone → Strip → Clean → Split → Speak

DON'T read the raw DOM.
DO clone, remove noise, extract clean text.

Strip list (always remove before speaking):

  • CTAs, promotions, ads
  • Navigation, footer, sidebar
  • Images, videos, iframes, SVGs
  • Scripts, styles, hidden elements
  • Tags, badges, metadata

4. The Chunking Problem

Browsers have a hard limit on utterance length (~3000-5000 chars depending on browser/OS). Long text must be split into chunks.

Split Strategy:
├─ Split on sentence boundaries (. ! ? \n)
├─ Max chunk: 2500 chars (safe across all browsers)
├─ Preserve sentence integrity (never split mid-sentence)
└─ Chain chunks via onend callback

5. Voice Selection Priority

Language voices:
1. Local service voice (faster, works offline)
2. Network voice (higher quality, needs internet)
3. Any voice matching language prefix
4. null (browser default)

6. Chrome Keep-Alive Bug

⚠️ CRITICAL: Chrome silently stops SpeechSynthesis after ~15 seconds of continuous speech. This is the #1 gotcha.

// Workaround: pause/resume every 10s
setInterval(() => {
    if (synth.speaking && !synth.paused) {
        synth.pause();
        synth.resume();
    }
}, 10000);

7. synth.cancel() Triggers onerror

⚠️ GOTCHA: Calling

synth.cancel()
fires the
onerror
event on any active utterance with error type
'canceled'
or
'interrupted'
.

Solution: Use a guard flag or check error type:

u.onerror = function(e) {
    if (e.error === 'canceled' || e.error === 'interrupted') return;
    stopReading();
};

🏗️ Architecture Pattern

Minimal TTS Reader (Copy-Paste Starting Point)

┌─────────────────────────────────────────┐
│                  IIFE                    │
│                                          │
│  ┌─ Feature Detection ─┐                │
│  │  speechSynthesis?    │                │
│  └──────────┬───────────┘                │
│             ▼                            │
│  ┌─ Content Extraction ─┐               │
│  │  Clone → Strip → Clean│              │
│  └──────────┬────────────┘               │
│             ▼                            │
│  ┌─ Chunking Engine ────┐               │
│  │  Split on sentences   │              │
│  │  Max 2500 chars       │              │
│  └──────────┬────────────┘               │
│             ▼                            │
│  ┌─ Utterance Builder ──┐               │
│  │  Set voice/rate/pitch │              │
│  │  Chain via onend      │              │
│  └──────────┬────────────┘               │
│             ▼                            │
│  ┌─ Player UI ──────────┐               │
│  │  Bar: play/pause/stop │              │
│  │  Progress indicator   │              │
│  │  Trigger button       │              │
│  └──────────┬────────────┘               │
│             ▼                            │
│  ┌─ Keep-Alive Timer ───┐               │
│  │  pause/resume @ 10s  │               │
│  └───────────────────────┘               │
└──────────────────────────────────────────┘

Lifecycle

Init → Detect → Inject Trigger Button
         │
   User clicks ▶
         │
   Extract Text → Chunk → Build Utterances
         │
   synth.speak(chunk[0])
         │
   chunk[0].onend → speak(chunk[1]) → ... → speak(chunk[N])
         │                                        │
   Keep-Alive Timer running                   chunk[N].onend
         │                                        │
   User clicks ⏸ → synth.pause()             stopReading()
   User clicks ▶ → synth.resume()            cleanup UI
   User clicks ✕ → synth.cancel()

📐 Implementation Checklist

For TTS Reader

  • Feature detection (
    speechSynthesis
    in window)
  • Content container identified (ID or selector)
  • Strip list defined (what to remove before reading)
  • Chunk size set (default 2500)
  • Voice selection logic (language-specific)
  • Player bar UI (play/pause/close + progress)
  • Trigger button injected (topbar or floating)
  • Chrome keep-alive timer (10s interval)
  • onerror
    guard (handle cancel/interrupted)
  • beforeunload
    cleanup
  • prefers-reduced-motion
    respect
  • Mobile safe-area padding

For Audio Player

  • Audio files hosted and accessible
  • Preload strategy (
    none
    → load on demand)
  • Play/pause toggle with state management
  • Progress bar with
    currentTime/duration
  • Error handling (network, format, autoplay policy)
  • Session state (dismissed = don't show again)

For Voice CRO

  • Per-page config object (delay, scroll threshold, audio URLs)
  • Trigger conditions (time + scroll AND/OR interaction)
  • Bottom sheet UI (icon, text, CTA, dismiss)
  • Player bar UI (toggle, progress, CTA button)
  • Session dismissal tracking
  • Stats tracking (shown/listened/dismissed)
  • No conflict with TTS Reader

⚠️ Common Pitfalls

PitfallSymptomFix
Chrome stops after 15sAudio cuts mid-sentenceKeep-alive timer (pause/resume)
synth.cancel()
fires onerror
Settings sheet closes immediatelyGuard flag or check error type
Voices not loadedNo voice availableListen for
voiceschanged
event
Chunk too largeUtterance fails silentlyMax 2500 chars per chunk
Reading CTA textTTS reads "Book Now" button textStrip non-content elements
Autoplay blockedAudio won't start on mobileRequire user interaction first
Multiple audio conflictsTTS + CRO play simultaneouslyMutual exclusion check
No cleanup on navAudio keeps playing
beforeunload
synth.cancel()

🌐 Multi-Language Support

Voice selection by language:
├─ Vietnamese: v.lang === 'vi-VN' || v.lang.startsWith('vi')
├─ English: v.lang === 'en-US' || v.lang.startsWith('en')
├─ Japanese: v.lang === 'ja-JP' || v.lang.startsWith('ja')
├─ Korean: v.lang === 'ko-KR' || v.lang.startsWith('ko')
└─ Any: Pass language code as config parameter

Set

utterance.lang
to match the content language for correct pronunciation.


📚 Reference Files

FileContent
tts-engine.mdComplete SpeechSynthesis API reference, chunking strategies, voice selection
audio-player.mdHTMLAudioElement patterns, preload strategies, error handling
voice-cro.mdTrigger system, bottom sheet patterns, CRO analytics
ui-patterns.mdPlayer bar CSS, bottom sheet CSS, animations, responsive design

🔗 Reference Implementations

FileDescription
examples/blog-reader.jsComplete TTS reader — Substack-style, 350 LOC
examples/voice-cro.jsComplete Voice CRO trigger system — 390 LOC

Remember: Voice is the most personal interface. A well-placed audio feature can increase engagement 3-5x. But unwanted audio is the fastest way to lose a user. Always require user initiation. Never autoplay.