Claude-blog blog-audio
git clone https://github.com/AgriciDaniel/claude-blog
T=$(mktemp -d) && git clone --depth=1 https://github.com/AgriciDaniel/claude-blog "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/blog-audio" ~/.claude/skills/agricidaniel-claude-blog-blog-audio && rm -rf "$T"
skills/blog-audio/SKILL.mdBlog Audio -- Gemini TTS Narration for Blog Posts
Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
Quick Reference
| Command | What it does |
|---|---|
| Generate audio narration of a blog post |
| Show available voices with characteristics |
| Check/configure API key for Gemini TTS |
Prerequisites
- Python 3.11+ (venv managed automatically by
)run.py
environment variable (same key used by blog-image)GOOGLE_AI_API_KEY- FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)
Always Use run.py Wrapper
# CORRECT: python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json # WRONG: python3 scripts/generate_audio.py --text "..." # Fails without venv
API Key Check (Gate Pattern)
Before generating audio, check for the API key:
echo $GOOGLE_AI_API_KEY
- If set: proceed with generation
- If not set: guide the user:
"Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey
Then set it:
This is the same key used byexport GOOGLE_AI_API_KEY=your-key
-- if image generation works, audio works too."/blog image - When called internally (from blog-write): return silently if key is missing. Never block the writing workflow.
Setup
For
/blog audio setup:
- Check if
is set in environmentGOOGLE_AI_API_KEY - If blog-image is configured (check
), the key is already available.mcp.json - If not, guide user to https://aistudio.google.com/apikey
- Verify with a dry run:
python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json
Voice Selection
For
/blog audio voices:
Load
references/voices.md and present the voice catalog to the user.
Ask the user which voice they prefer, or recommend based on content type:
- Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
- Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
- News/analysis: Rasalgethi (Informative) or Schedar (Even)
- Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
- Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
- Dialogue expert: Kore (Firm) or Charon (Informative)
Generation Workflow
For
/blog audio generate <file>:
Step 1: Read the Blog Post
Read the file and extract:
- Title (from H1 or frontmatter)
- Full content (markdown body)
- Approximate word count
Step 2: Choose Mode
Ask the user (or auto-select if they specified
--mode):
| Mode | When to use | Output |
|---|---|---|
| Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| Full | Complete read-aloud (5-15 min) | Full article as natural speech |
| Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
Step 3: Prepare Text
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
- Write as natural speech, not written text
- Open with the article's key finding or answer
- Cover 3-5 main takeaways
- Close with actionable advice
- No markdown, no "In this article...", no meta-commentary
- Use conversational transitions ("Here's what matters...", "The key finding is...")
Full mode: Strip the markdown content to clean spoken text:
- Headings become natural transitions ("Next, let's look at...")
- Links become plain text (remove URLs, keep anchor text)
- Images and charts: omit or briefly describe ("As the data shows...")
- Code blocks: describe verbally ("The code uses a for-loop to...")
- Lists: convert to natural sentences
- Remove frontmatter, schema markup, HTML tags
- Add brief intro: "This is [title], published on [date]."
Dialogue mode: Write a 2-person conversation script about the article:
- Speaker1 = Host (curious, asks good questions)
- Speaker2 = Expert (knowledgeable, gives clear answers)
- Format each line as:
[Speaker1] What's the key takeaway here? - Cover the article's main points conversationally
- 15-25 exchanges (produces ~3-8 minutes)
- Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")
Step 4: Select Voice
If the user chose a voice, use it. Otherwise, recommend based on mode:
- Summary/Full: default to Charon (Informative)
- Dialogue: default to Puck (Host) + Kore (Expert)
Step 5: Generate Audio
Write the prepared text to a temp file, then call:
# Single voice (summary or full mode) python3 scripts/run.py generate_audio.py \ --text-file /tmp/blog_audio_prepared.txt \ --voice Charon \ --model flash \ --output /path/to/audio/post-slug.mp3 \ --json # Two voices (dialogue mode) python3 scripts/run.py generate_audio.py \ --text-file /tmp/blog_audio_dialogue.txt \ --voice Puck \ --voice2 Kore \ --model pro \ --output /path/to/audio/post-slug-dialogue.mp3 \ --json
Model selection:
(default): Fast, cheap. Good for summaries and standard narration.flash
: Higher quality. Use for dialogue mode or premium content.pro
Step 6: Deliver
Present the result to the user:
- File path -- where the audio was saved
- Duration -- human-readable (e.g., "3:42")
- Embed code -- ready-to-paste HTML5 audio tag
- Cost -- estimated API cost
- Placement suggestion -- where to insert the embed in the blog post
Embedding Guide
Standard HTML (Hugo, Jekyll, static sites)
<audio controls preload="metadata"> <source src="audio/post-slug.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio>
MDX (Next.js, Gatsby)
<audio controls preload="metadata"> <source src="/audio/post-slug.mp3" type="audio/mpeg" /> </audio>
WordPress
[audio src="audio/post-slug.mp3"]
Placement
Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
Internal API (for blog-write)
When invoked internally from blog-write:
Input:
: Prepared text (already cleaned by Claude)text
: Voice name (default: Charon)voice
: Second voice for dialogue (optional)voice2
: flash or promodel
: Where to save the fileoutput_path
Output:
### Audio Narration - **Path:** /path/to/audio/post-slug.mp3 - **Duration:** 3:42 - **Voice:** Charon - **Embed:** `<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>`
Graceful fallback: If
GOOGLE_AI_API_KEY is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.
Error Handling
| Error | Resolution |
|---|---|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: . Falls back to WAV output. |
| Rate limited | Wait and retry. Check limits at https://aistudio.google.com/rate-limit |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run to see valid options |
| API error | Check key validity, model availability (preview models) |
| API key missing (internal call) | Return silently -- writing workflow continues |
Reference Documentation
Load on-demand -- do NOT load all at startup:
-- Full 30-voice catalog, recommendations by content type, dialogue pairingsreferences/voices.md