Skills explainer
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/0xfango/explainer-video" ~/.claude/skills/clawdbot-skills-explainer && rm -rf "$T"
skills/0xfango/explainer-video/SKILL.mdWhen to Use
- User wants to create an explainer or tutorial video
- User asks to "explain" something in video form
- User wants narrated content with AI-generated visuals
- User says "explainer video", "解说视频", "tutorial video"
When NOT to Use
- User wants audio-only content without visuals (use
or/speech
)/podcast - User wants a podcast-style discussion (use
)/podcast - User wants to generate a standalone image (use
)/image-gen - User wants to read text aloud without video (use
)/speech
Purpose
Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read
for API key and headersshared/authentication.md - Follow
for polling, errors, and interaction patternsshared/common-patterns.md - Always read config following
before any interactionshared/config-pattern.md - Never hardcode speaker IDs — always fetch from the speakers API
- Never save files to
— use~/Downloads/
from config.listenhub/explainer/ - Explainer uses exactly 1 speaker
- Mode must be
(for Info style) orinfo
(for Story style) — neverstory
(useslides
skill instead)/slides
Step -1: API Key Check
Follow
shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Step 0: Config Setup
Follow
shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/explainer" echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json" CONFIG_PATH=".listenhub/explainer/config.json" # (or $HOME/.listenhub/explainer/config.json for global)
Then run Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (explainer): 输出方式:{inline / download / both} 语言偏好:{zh / en / 未设置} 默认风格:{info / story / 未设置} 默认主播:{speakerName / 未设置}
Ask: "使用已保存的配置?" → 确认,直接继续 / 重新配置
Setup Flow (first run or reconfigure)
Ask these questions in order, then save all answers to config at once:
-
outputMode: Follow
§ Setup Flow Question.shared/output-mode.md -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Style (optional): "默认风格?"
- "Info — 信息展示型"
- "Story — 故事叙述型"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
# Follow shared/output-mode.md § Save to Config NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH")
Note:
defaultSpeakers are saved after generation (see After Successful Generation section).
Interaction Flow
Step 1: Topic / Content
Free text input. Ask the user:
What would you like to explain or introduce?
Accept: topic description, text content, or concept to explain.
Step 2: Language
If
config.language is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What language?" Options: - "Chinese (zh)" — Content in Mandarin Chinese - "English (en)" — Content in English
Step 3: Style
If
config.defaultStyle is set, pre-fill and show in summary — skip this question.
Otherwise ask:
Question: "What style of explainer?" Options: - "Info" — Informational, factual presentation style - "Story" — Narrative, storytelling approach
Step 4: Speaker Selection
Follow
shared/speaker-selection.md for the full selection flow, including:
- Default from
(skip step if set)config.defaultSpeakers.{language} - Text table + free-text input
- Input matching and re-prompt on no match
Only 1 speaker is supported for explainer videos.
Step 5: Output Type
Question: "What output do you want?" Options: - "Text script only" — Generate narration script, no video - "Text + Video" — Generate full explainer video with AI visuals
Step 6: Confirm & Generate
Summarize all choices:
Ready to generate explainer: Topic: {topic} Language: {language} Style: {info/story} Speaker: {speaker name} Output: {text only / text + video} Proceed?
Wait for explicit confirmation before calling any API.
Workflow
-
Submit (foreground):
with content, speaker, language, mode → extractPOST /storybook/episodesepisodeId -
Tell the user the task is submitted
-
Poll (background): Run the following exact bash command with
andrun_in_background: true
. Do NOT use python3, awk, or any other JSON parser — usetimeout: 600000
as shown:jqEPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present script:
Read
from config. FollowOUTPUT_MODE
for behavior.shared/output-mode.md
orinline
: Present the script inline.bothPresent:
解说脚本已生成! 「{title}」 在线查看:https://listenhub.ai/app/explainer/{episodeId}
ordownload
: Also save the script file.both- Create
.listenhub/explainer/YYYY-MM-DD-{episodeId}/ - Write
from the generated script content{episodeId}.md - Present the download path in addition to the above summary.
- Create
-
If video requested:
(foreground) → poll again (background) using the exact bash command below withPOST /storybook/episodes/{episodeId}/video
andrun_in_background: true
. Poll fortimeout: 600000
, notvideoStatus
:processStatusEPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.videoStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:
Present result
Read
OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
or inline
: Display video URL and audio URL as clickable links.both
Present:
解说视频已生成! 视频链接:{videoUrl} 音频链接:{audioUrl} 时长:{duration}s 消耗积分:{credits}
or download
: Also download the audio file.both
DATE=$(date +%Y-%m-%d) JOB_DIR=".listenhub/explainer/${DATE}-{jobId}" mkdir -p "$JOB_DIR" curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"
Present the download path in addition to the above summary.
After Successful Generation
Update config with the choices made this session:
NEW_CONFIG=$(echo "$CONFIG" | jq \ --arg lang "{language}" \ --arg style "{info/story}" \ --arg speakerId "{speakerId}" \ '. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}') echo "$NEW_CONFIG" > "$CONFIG_PATH"
Estimated times:
- Text script only: 2-3 minutes
- Text + Video: 3-5 minutes
API Reference
- Speaker list:
shared/api-speakers.md - Speaker selection guide:
shared/speaker-selection.md - Episode creation:
shared/api-storybook.md - Polling:
§ Async Pollingshared/common-patterns.md - Config pattern:
shared/config-pattern.md
Composability
- Invokes: speakers API (for speaker selection); may invoke
for voiceover/speech - Invoked by: content-planner (Phase 3)
Example
User: "Create an explainer video introducing Claude Code"
Agent workflow:
- Topic: "Claude Code introduction"
- Ask language → "English"
- Ask style → "Info"
- Fetch speakers, user picks "cozy-man-english"
- Ask output → "Text + Video"
curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}], "speakers": [{"speakerId": "cozy-man-english"}], "language": "en", "mode": "info" }'
Poll until text is ready, then generate video if requested.