Skills audio-script-writer
Convert written medical content into podcast or video scripts optimized
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aipoch-ai/audio-script-writer" ~/.claude/skills/clawdbot-skills-audio-script-writer && rm -rf "$T"
skills/aipoch-ai/audio-script-writer/SKILL.mdAudio Script Writer
Overview
Content transformation tool that converts written medical and scientific materials into professionally structured audio scripts suitable for podcasts, educational videos, audiobooks, and voiceover narration.
Key Capabilities:
- Format Conversion: Research papers → podcast scripts
- Spoken Word Optimization: Sentence restructuring for listening
- Pronunciation Guides: Medical terminology phonetic spelling
- Timing Estimation: Duration calculations for production planning
- Multi-Format Output: Podcast, video, lecture, audiobook templates
- Voice Direction: Tone, pace, and emphasis cues for narrators
When to Use
✅ Use this skill when:
- Creating medical education podcasts from journal articles
- Converting conference presentations to video scripts
- Developing audiobook versions of medical textbooks
- Scripting patient education audio materials
- Producing research summary videos for social media
- Adapting written case reports for audio case studies
- Creating voiceover scripts for e-learning modules
❌ Do NOT use when:
- Live presentation without script → Use improvisation
- Highly visual content (surgery videos) → Use visual-focused tools
- Interactive audio (Q&A format) → Use dialogue scripting tools
- Music or sound design planning → Use audio production software
- Voice recording itself → This creates scripts, not audio
Integration:
- Upstream:
(content condensation),abstract-summarizer
(patient-friendly language)lay-summary-gen - Downstream:
(multi-language scripts),medical-translation
(AI narration)voice-cloning-tool
Core Capabilities
1. Spoken Word Transformation
Convert written text to conversational audio style:
from scripts.audio_writer import AudioScriptWriter writer = AudioScriptWriter() # Transform written content script = writer.convert_to_audio( source_text=research_paper, format="podcast", # podcast, video, lecture, audiobook target_audience="medical_students", duration_minutes=15 ) print(script.spoken_text) # Converts: "The pathophysiology of diabetes mellitus involves..." # To: "So what exactly happens in diabetes? Well, it all starts when..."
Transformation Rules:
| Written Style | Audio Style | Example |
|---|---|---|
| "Furthermore" | "Plus" | Less formal transitions |
| " et al." | "and their colleagues" | Expand abbreviations |
| Numbers in text | Spoken numbers | "15%" → "15 percent" |
| Long sentences | 15-20 word max | Break into digestible chunks |
| Passive voice | Active voice | "was observed" → "we saw" |
| Citations | Omit or footnote | "(Smith et al., 2024)" → [reference tone] |
2. Pronunciation Guide Generation
Create phonetic spelling for medical terms:
# Generate pronunciation guide pronunciation = writer.create_pronunciation_guide( text=script, include_phonetic=True, include_syllables=True ) # Output: # "Hyperlipidemia: hi-per-lip-i-DEE-mee-uh" # "Metformin: met-FOR-min" # "Atherosclerosis: ath-er-oh-skleh-ROH-sis"
Guide Elements:
- Phonetic Spelling: IPA or simplified phonetics
- Syllable Breaks: hy-per-ten-sion
- Emphasis Marking: Primary stress (CAPS), secondary stress
- Alternative Pronunciations: Regional variations (UK vs US)
- Sound-Alikes: "rhymes with..." for difficult terms
3. Timing and Pacing
Calculate speaking duration and mark pacing cues:
# Analyze timing timing = writer.calculate_timing( script=script, speaking_rate="conversational", # slow, conversational, fast include_pauses=True ) print(f"Estimated duration: {timing.duration_minutes} minutes") print(f"Word count: {timing.word_count}") print(f"Pace: {timing.words_per_minute} WPM")
Speaking Rates:
| Style | WPM | Use Case |
|---|---|---|
| Slow/Educational | 120-130 | Patient education, complex topics |
| Conversational | 140-160 | Podcasts, general audience |
| Fast/News | 170-190 | Time-constrained content |
| Variable | Varies | Dynamic pacing with pauses |
Pacing Cues:
[BREATHE] - Brief pause for narrator [PAUSE 2s] - Two-second pause for emphasis [SLOW DOWN] - Reduce pace for key point [SPEED UP] - Increase energy/excitement [BEAT] - Dramatic pause
4. Multi-Format Templates
Generate scripts for different audio formats:
# Podcast episode podcast = writer.create_podcast_script( content=article, episode_format="interview", # solo, interview, panel include_intro_music=True, ad_breaks=[5, 12] # minutes ) # Educational video video = writer.create_video_script( content=lecture_slides, visual_cues=True, # Mark where visuals change b_roll_notes=True # Suggest supplemental footage )
Format Types:
| Format | Characteristics | Best For |
|---|---|---|
| Podcast | Conversational, segments, ads | Long-form content, interviews |
| Video | Visual cues, B-roll notes | YouTube, educational platforms |
| Lecture | Structured, Q&A breaks | Online courses, training |
| Audiobook | Chapter markers, consistent tone | Textbooks, memoirs |
| News | Tight, factual, quick | Research briefs, updates |
Common Patterns
Pattern 1: Research Paper to Podcast
Scenario: Convert published study to 15-minute podcast episode.
# Convert paper to podcast script python scripts/main.py \ --input paper.pdf \ --format podcast \ --duration 15 \ --style conversational \ --include-intro-outro \ --output podcast_script.txt # Generate pronunciation guide python scripts/main.py \ --input podcast_script.txt \ --generate-pronunciation \ --output pronunciation_guide.txt
Structure:
[INTRO MUSIC 5s] HOST: Welcome to Medical Research Today. I'm your host... [BREATHE] HOST: Today we're diving into a fascinating study about... [PAUSE] HOST: So what did the researchers find? Well... [BREATHE] HOST: Dr. Smith, one of the study authors, explains... [SOUND BITE: Interview clip] ... [OUTRO MUSIC]
Pattern 2: Medical Lecture Recording
Scenario: Convert lecture notes to video script for online course.
# Create lecture script lecture = writer.create_lecture_script( notes=lecture_content, duration=45, # minutes break_intervals=[15, 30], # minutes for student breaks interaction_points=True # "Pause and think..." prompts ) # Add visual cues script = writer.add_visual_cues( script=lecture, slide_transitions=True, animation_notes=True )
Lecture Elements:
- Learning objectives at start
- Periodic comprehension checks
- Break reminders
- Transition phrases between topics
- Summary and key takeaways
Pattern 3: Patient Education Audio
Scenario: Create audio guide for diabetes management.
# Patient-friendly script patient_script = writer.create_patient_script( medical_content=diabetes_guide, reading_level=6, # 6th grade empathetic_tone=True, key_points_highlighted=True ) # Slow, clear pacing patient_script.adjust_pacing( wpm=130, pause_after_sentences=1.5 # seconds )
Patient Script Features:
- Simple language (avoid medical jargon)
- Empathetic tone
- Clear action steps
- Reassuring statements
- Repetition of key points
Pattern 4: Conference Presentation to Video
Scenario: Adapt live presentation to YouTube video format.
# Convert presentation script python scripts/main.py \ --input presentation_transcript.txt \ --format video \ --platform youtube \ --include-hooks true \ --engagement-cues true \ --output youtube_script.txt
YouTube Optimization:
- Hook in first 30 seconds
- Engagement questions for comments
- Call to action (subscribe, like)
- Timestamp markers for chapters
- B-roll suggestions for visual interest
Complete Workflow Example
From research paper to published podcast:
# Step 1: Extract and summarize content python scripts/main.py \ --input paper.pdf \ --extract-key-points \ --output key_points.txt # Step 2: Convert to audio script python scripts/main.py \ --input key_points.txt \ --format podcast \ --duration 20 \ --output raw_script.txt # Step 3: Add production elements python scripts/main.py \ --input raw_script.txt \ --add-music-cues \ --add-sound-effects \ --add-pacing-marks \ --output production_script.txt # Step 4: Generate pronunciation guide python scripts/main.py \ --input production_script.txt \ --generate-pronunciation \ --output pronunciations.txt # Step 5: Create timing breakdown python scripts/main.py \ --input production_script.txt \ --calculate-timing \ --output timing_breakdown.txt
Python API:
from scripts.audio_writer import AudioScriptWriter from scripts.pronunciation import PronunciationGuide from scripts.timing import TimingCalculator # Initialize writer = AudioScriptWriter() pronouncer = PronunciationGuide() timing = TimingCalculator() # Read source material with open("research_article.txt", "r") as f: content = f.read() # Step 1: Convert to spoken format script = writer.convert_to_audio( text=content, format="podcast", target_duration=15, # minutes audience="general_medical" ) # Step 2: Add production elements script_with_cues = writer.add_production_cues( script=script, music_stings=True, transition_effects=True ) # Step 3: Generate pronunciation guide medical_terms = pronouncer.extract_terms(script_with_cues) pronunciation_guide = pronouncer.create_guide(medical_terms) # Step 4: Calculate timing timing_analysis = timing.calculate( script=script_with_cues, speaking_rate=150 # WPM ) # Export complete production package writer.export_production_package( script=script_with_cues, pronunciation=pronunciation_guide, timing=timing_analysis, output_dir="podcast_production/" )
Quality Checklist
Content Quality:
- Written content accurate and current
- Sources cited (even if not spoken)
- Medical facts verified by expert
- Appropriate for target audience level
- No confidential patient information
Audio Optimization:
- Sentences 15-20 words maximum
- Abbreviations expanded on first use
- Complex terms have pronunciation guides
- Active voice preferred over passive
- Transitions smooth and conversational
Production Quality:
- Timing realistic for content density
- Pacing cues appropriate for subject
- Music/sound cues marked clearly
- Pronunciation guide comprehensive
- Script formatted for easy reading
Before Recording:
- CRITICAL: Script read aloud for flow
- Difficult pronunciations practiced
- Timing tested with stopwatch
- Technical terms confirmed with subject expert
- Copyright cleared for any quoted material
Common Pitfalls
Content Issues:
-
❌ Too dense → Information overload for listeners
- ✅ Break complex topics into multiple episodes
-
❌ Visual dependencies → "As shown in Figure 3..."
- ✅ Describe visuals or omit visual-dependent content
-
❌ Citation overload → Every sentence has reference
- ✅ Save citations for show notes, not narration
Audio Issues:
-
❌ Written-style language → "Furthermore, the aforementioned..."
- ✅ Conversational: "Plus, this thing we talked about..."
-
❌ No pauses → Relentless information delivery
- ✅ Build in breathing room; let points sink in
-
❌ Ignoring pronunciation → Mispronounced medical terms
- ✅ Research and practice all technical terms
Production Issues:
-
❌ Underestimating time → 10 minutes of script takes 12+ to record
- ✅ Add 20% buffer for retakes and natural pacing
-
❌ Complex sentence structures → Tongue twisters for narrator
- ✅ Short sentences; avoid nested clauses
References
Available in
references/ directory:
- Broadcast writing guidelinesaudio_writing_best_practices.md
- Common terms phoneticsmedical_pronunciation_guide.md
- Industry format standardspodcast_production_standards.md
- Inclusive audio contentaccessibility_guidelines.md
- YouTube, Spotify, Apple specsplatform_requirements.md
- Narrator health and performancevoice_care_tips.md
Scripts
Located in
scripts/ directory:
- CLI interface for script conversionmain.py
- Core text-to-audio transformationaudio_writer.py
- Medical terminology phoneticspronunciation.py
- Duration calculation and pacingtiming.py
- Podcast, video, lecture templatesformat_templates.py
- Narrator cues and directionvoice_direction.py
- Alternative format generationaccessibility.py
Limitations
- Voice Performance: Script is text only; actual delivery varies by narrator
- Accent Variations: Pronunciation guides may not match all dialects
- Cultural Context: Humor and references may not translate across cultures
- Copyright: Cannot use copyrighted material without permission
- Technical Accuracy: Does not verify medical content (input-dependent)
- Live Elements: Cannot script unscripted interviews or Q&A
Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
, | string | - | No | Input text file path |
, | string | - | No | Output JSON file path (default: stdout) |
| string | - | No | Direct text input (alternative to --input) |
, | int | 5 | No | Target duration in minutes |
, | string | normal | No | Speaking pace (slow, normal, fast) |
, | string | conversational | No | Script style (conversational, formal, educational) |
Usage
Basic Usage
# Convert from file python scripts/main.py --input article.txt --duration 5 --output script.json # Direct text input python scripts/main.py --text "Medical research findings..." --duration 3 # From stdin cat article.txt | python scripts/main.py --duration 5 --style conversational # With specific style and pace python scripts/main.py --input paper.txt --style educational --pace slow
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python script executed locally | Low |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Low |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output saved only to specified location | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access
- Output does not expose sensitive information
- Prompt injection protections in place
- Input validation for file paths
- Output directory restricted to workspace
- Script execution in sandboxed environment
Prerequisites
# Python 3.7+ # No additional packages required (uses standard library)
Evaluation Criteria
Success Metrics
- Successfully converts text to audio-optimized script
- Expands abbreviations and converts numbers to words
- Calculates estimated duration based on word count
- Applies style-specific formatting
- Provides pronunciation notes for medical terms
Test Cases
- Basic Conversion: Convert text file → Returns audio script with metadata
- Abbreviation Handling: Text with "e.g., i.e., etc." → All expanded in output
- Number Conversion: Input with "1 in 4" → Output with "one in four"
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Add support for custom abbreviation dictionaries
- Integrate with text-to-speech engines
- Add multilingual support
🎙️ Pro Tip: The best audio scripts sound natural when spoken. Always read your script aloud before finalizing—if you stumble over a sentence, your narrator will too. Revise for the ear, not the eye.