Awesome-claude ppt-speech-script
Generate a speech script from an existing PowerPoint presentation. Use this skill when the user has a .pptx file and wants a speaker script, presentation notes, or talk script to accompany it. Triggers on: 'speech script', 'speaker notes', 'talk script', 'presentation script', '演讲稿', '讲稿'
git clone https://github.com/tsaol/awesome-claude
T=$(mktemp -d) && git clone --depth=1 https://github.com/tsaol/awesome-claude "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/ppt-speech-script" ~/.claude/skills/tsaol-awesome-claude-ppt-speech-script && rm -rf "$T"
skills/ppt-speech-script/SKILL.mdPPT Speech Script Generator
Generate a structured, oral-style speech script from an existing PowerPoint presentation.
When to Use
- User has an existing
file and wants a speech script to go with it.pptx - User says: "generate a speech script", "write speaker notes", "写演讲稿", "生成讲稿"
- This is NOT for creating PPT files — use the
skill for thatpptx
Workflow
Step 1: Extract Content
Extract text, visual layout, and embedded images from the presentation using three methods in parallel:
Text extraction:
python3 -m markitdown <pptx_file>
Visual layout (thumbnail grid):
python3 skills/pptx/scripts/thumbnail.py <pptx_file> <output_dir>/thumbnails --cols 5
Image extraction (unpack media files):
python3 skills/pptx/ooxml/scripts/unpack.py <pptx_file> <output_dir>/unpacked
Then:
- Read the thumbnail grid image(s) to understand the visual structure of each slide
- Build a slide-to-image mapping by reading the relationship files:
- Parse
for each slideppt/slides/_rels/slideN.xml.rels - Extract
references to find which images belong to which slideTarget="../media/imageX.ext"
- Parse
- For slides with important images (charts, architecture diagrams, screenshots), read the original image from
using the Read tool for detailed visual analysisppt/media/- Skip generic/decorative images (backgrounds, logos, icons) — focus on content-carrying images
- Prioritize: architecture diagrams, data charts, comparison tables, workflow diagrams, product screenshots
Image analysis guidelines:
- For architecture/flow diagrams: describe the components, data flow direction, and key relationships
- For data charts: read exact numbers, axis labels, and trends
- For comparison tables: extract the key differentiators
- For product screenshots: describe what the user interface shows
- For photos/decorative images: brief description only, don't over-analyze
This image analysis enables the speech script to accurately describe visual content that
markitdown cannot capture (since markitdown only extracts text, not image content).
Step 2: Analyze Presentation Structure
Before writing, analyze the deck:
- Identify sections — group slides into logical sections (intro, body sections, conclusion)
- Identify hidden slides — thumbnail.py reports hidden slides; exclude them from the script
- Identify slide types — title, content, data, diagram, comparison, section divider, closing
- Note visual elements — charts, images, diagrams that need verbal explanation
- Estimate timing — allocate time per slide based on content density
Timing guidelines:
| Slide type | Suggested time |
|---|---|
| Title/cover | 30s - 1min |
| Agenda | 30s - 1min |
| Section divider | 15 - 30s |
| Content (light) | 1 - 1.5min |
| Content (dense) | 1.5 - 2.5min |
| Data/chart | 1.5 - 2min |
| Diagram/architecture | 2 - 3min |
| Demo/code | 2 - 3min |
| Summary | 1 - 2min |
| Closing/Q&A | 30s |
Step 3: Write the Speech Script
Generate a complete speech script following these principles:
Output Format
--- title: "PPT Title — Speech Script" slides: <total visible slides> estimated_time: "XX-XX minutes" audience: "<target audience>" --- # PPT Title — Speech Script > Audience: <target audience> | Date: <date if available> Estimated duration: XX minutes (including Q&A). Suggested time per slide is noted in brackets. --- ## Slide 0 — Slide Title [30s] Speech content here... --- ## Slide 1 — Slide Title [1min] Speech content here... --- ... ## Predicted Q&A ### Q1: <likely question>? Answer key points... ### Q2: <likely question>? Answer key points... ### Q3: <likely question>? Answer key points...
Writing Principles
-
Oral style — Write as if speaking to the audience, not reading a document
- Use conversational connectors: "Let's look at...", "The key takeaway here is...", "Now, moving on to..."
- Avoid academic or written-style phrasing
-
Don't read the bullets — The script should EXPLAIN and EXPAND on slide content, not repeat it
- Slide says "Cost reduced 34.8%" → Script says "We brought per-query cost down by over a third — from 9 cents to under 6 cents. At million-user scale, that's millions of dollars saved per month."
-
Per-slide length: 80-200 words (Chinese) or 60-150 words (English)
- Section dividers and title pages: shorter (30-60 words)
- Dense data or architecture slides: longer (150-250 words)
-
Smooth transitions — Each slide's script should naturally flow from the previous one
- End of previous slide's conclusion → Beginning of next slide's topic
- Use transitional phrases: "That brings us to...", "With that context in mind...", "So how do we solve this?"
-
Highlight key points — Use bold or verbal cues for emphasis
- "The most important number on this slide is..."
- "If you remember one thing from today..."
-
Explain visuals — For charts, diagrams, and images, guide the audience through what they're seeing
- "Looking at this architecture diagram, data flows from left to right..."
- "The blue bars represent the baseline, and the orange bars are our optimized results..."
-
Audience awareness — Tailor depth and terminology to the stated audience
- CTO audience → focus on strategic impact, cost, and scalability
- Developer audience → focus on implementation details and code
- Business audience → focus on ROI, user impact, and market context
-
Q&A section — Prepare 3-5 predicted questions
- Include "tough but fair" questions the audience is likely to ask
- Provide concise answer key points (not full scripts)
- Consider the audience's perspective and concerns
Language
- Default: Match the language of the PPT content
- User override: If user specifies a language (e.g., "in Chinese", "in English"), use that
- Mixed content: If PPT has mixed languages, use the dominant language unless told otherwise
Step 4: Save Output
Save the speech script to the same directory as the PPT file:
# Output path: same directory as input, named speech-script.md <pptx_dir>/speech-script.md
If a
speech-script.md already exists, ask the user before overwriting.
Dependencies
These should already be available from the
pptx skill:
- markitdown:
— text extractionpip install "markitdown[pptx]" - thumbnail.py:
— visual layout analysisskills/pptx/scripts/thumbnail.py - unpack.py:
— PPTX unpacking for media extractionskills/pptx/ooxml/scripts/unpack.py - LibreOffice: for PDF conversion (used by thumbnail.py)
- Poppler: for PDF-to-image conversion (used by thumbnail.py)
Edge Cases
Very large presentations (50+ slides):
- Group consecutive similar slides into sections
- Summarize repetitive slides rather than scripting each individually
- Note in the script: "Slides X-Y cover [topic] — walk through highlights"
Image-heavy / text-light slides:
- Use the slide-to-image mapping to read original images from
at full resolutionppt/media/ - Describe what the audience sees based on direct image analysis
- For images that cannot be read (e.g., unsupported format), fall back to thumbnail grid analysis
- Flag slides where content is still unclear: "[Note: This slide contains a visual element — verify description against actual slide]"
Hidden slides:
- Exclude from the main script
- Optionally note them at the end: "Note: Slides X, Y, Z are hidden and not included in this script"
No text content (pure image deck):
- Extract all images via unpack and read each one directly for full-resolution analysis
- Use thumbnail grid for overall slide layout understanding
- Generate descriptive narration based on per-image analysis
- Clearly note which descriptions are based on visual interpretation
Quality Checklist
After generating the script, verify:
- Every visible slide has a corresponding section
- Slide numbering matches the PPT (0-indexed)
- Transitions between slides are smooth
- No bullet points are simply repeated verbatim
- Time estimates per slide are reasonable
- Total estimated time is realistic for the slide count
- Q&A section includes 3-5 relevant questions
- Language matches user's request or PPT's dominant language