Claude-blog blog-audio

install
source · Clone the upstream repo
git clone https://github.com/AgriciDaniel/claude-blog
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/AgriciDaniel/claude-blog "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/blog-audio" ~/.claude/skills/agricidaniel-claude-blog-blog-audio && rm -rf "$T"
manifest: skills/blog-audio/SKILL.md
source content

Blog Audio -- Gemini TTS Narration for Blog Posts

Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.

Quick Reference

CommandWhat it does
/blog audio generate <file>
Generate audio narration of a blog post
/blog audio voices
Show available voices with characteristics
/blog audio setup
Check/configure API key for Gemini TTS

Prerequisites

  • Python 3.11+ (venv managed automatically by
    run.py
    )
  • GOOGLE_AI_API_KEY
    environment variable (same key used by blog-image)
  • FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

Always Use run.py Wrapper

# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

# WRONG:
python3 scripts/generate_audio.py --text "..."  # Fails without venv

API Key Check (Gate Pattern)

Before generating audio, check for the API key:

echo $GOOGLE_AI_API_KEY
  • If set: proceed with generation
  • If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it:
    export GOOGLE_AI_API_KEY=your-key
    This is the same key used by
    /blog image
    -- if image generation works, audio works too."
  • When called internally (from blog-write): return silently if key is missing. Never block the writing workflow.

Setup

For

/blog audio setup
:

  1. Check if
    GOOGLE_AI_API_KEY
    is set in environment
  2. If blog-image is configured (check
    .mcp.json
    ), the key is already available
  3. If not, guide user to https://aistudio.google.com/apikey
  4. Verify with a dry run:
    python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

For

/blog audio voices
:

Load

references/voices.md
and present the voice catalog to the user.

Ask the user which voice they prefer, or recommend based on content type:

  • Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
  • Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
  • News/analysis: Rasalgethi (Informative) or Schedar (Even)
  • Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
  • Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
  • Dialogue expert: Kore (Firm) or Charon (Informative)

Generation Workflow

For

/blog audio generate <file>
:

Step 1: Read the Blog Post

Read the file and extract:

  • Title (from H1 or frontmatter)
  • Full content (markdown body)
  • Approximate word count

Step 2: Choose Mode

Ask the user (or auto-select if they specified

--mode
):

ModeWhen to useOutput
SummaryQuick audio overview (1-2 min)200-300 word spoken summary
FullComplete read-aloud (5-15 min)Full article as natural speech
DialoguePodcast-style (3-8 min)Two-person conversation about the article

Step 3: Prepare Text

CRITICAL: Claude prepares the text. The script does TTS only.

Summary mode: Write a 200-300 word spoken summary of the article. Rules:

  • Write as natural speech, not written text
  • Open with the article's key finding or answer
  • Cover 3-5 main takeaways
  • Close with actionable advice
  • No markdown, no "In this article...", no meta-commentary
  • Use conversational transitions ("Here's what matters...", "The key finding is...")

Full mode: Strip the markdown content to clean spoken text:

  • Headings become natural transitions ("Next, let's look at...")
  • Links become plain text (remove URLs, keep anchor text)
  • Images and charts: omit or briefly describe ("As the data shows...")
  • Code blocks: describe verbally ("The code uses a for-loop to...")
  • Lists: convert to natural sentences
  • Remove frontmatter, schema markup, HTML tags
  • Add brief intro: "This is [title], published on [date]."

Dialogue mode: Write a 2-person conversation script about the article:

  • Speaker1 = Host (curious, asks good questions)
  • Speaker2 = Expert (knowledgeable, gives clear answers)
  • Format each line as:
    [Speaker1] What's the key takeaway here?
  • Cover the article's main points conversationally
  • 15-25 exchanges (produces ~3-8 minutes)
  • Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")

Step 4: Select Voice

If the user chose a voice, use it. Otherwise, recommend based on mode:

  • Summary/Full: default to Charon (Informative)
  • Dialogue: default to Puck (Host) + Kore (Expert)

Step 5: Generate Audio

Write the prepared text to a temp file, then call:

# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_prepared.txt \
  --voice Charon \
  --model flash \
  --output /path/to/audio/post-slug.mp3 \
  --json

# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_dialogue.txt \
  --voice Puck \
  --voice2 Kore \
  --model pro \
  --output /path/to/audio/post-slug-dialogue.mp3 \
  --json

Model selection:

  • flash
    (default): Fast, cheap. Good for summaries and standard narration.
  • pro
    : Higher quality. Use for dialogue mode or premium content.

Step 6: Deliver

Present the result to the user:

  1. File path -- where the audio was saved
  2. Duration -- human-readable (e.g., "3:42")
  3. Embed code -- ready-to-paste HTML5 audio tag
  4. Cost -- estimated API cost
  5. Placement suggestion -- where to insert the embed in the blog post

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

[audio src="audio/post-slug.mp3"]

Placement

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".

Internal API (for blog-write)

When invoked internally from blog-write:

Input:

  • text
    : Prepared text (already cleaned by Claude)
  • voice
    : Voice name (default: Charon)
  • voice2
    : Second voice for dialogue (optional)
  • model
    : flash or pro
  • output_path
    : Where to save the file

Output:

### Audio Narration
- **Path:** /path/to/audio/post-slug.mp3
- **Duration:** 3:42
- **Voice:** Charon
- **Embed:** `<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>`

Graceful fallback: If

GOOGLE_AI_API_KEY
is not set, return immediately with no error. The writing workflow continues without audio. Never block blog-write because audio generation is unavailable.

Error Handling

ErrorResolution
GOOGLE_AI_API_KEY not setGet key at https://aistudio.google.com/apikey
FFmpeg not foundInstall:
sudo apt install ffmpeg
. Falls back to WAV output.
Rate limitedWait and retry. Check limits at https://aistudio.google.com/rate-limit
Text too long (>32k tokens)Split into sections, generate separately
Unknown voice nameRun
/blog audio voices
to see valid options
API errorCheck key validity, model availability (preview models)
API key missing (internal call)Return silently -- writing workflow continues

Reference Documentation

Load on-demand -- do NOT load all at startup:

  • references/voices.md
    -- Full 30-voice catalog, recommendations by content type, dialogue pairings