Skills yt2bb
Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms.
git clone https://github.com/openclaw/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agents365-ai/yt2bb" ~/.claude/skills/openclaw-skills-yt2bb && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/agents365-ai/yt2bb" ~/.openclaw/skills/openclaw-skills-yt2bb && rm -rf "$T"
skills/agents365-ai/yt2bb/SKILL.mdyt2bb — YouTube to Bilibili Video Repurposing
Overview
Six-step pipeline: download → transcribe → translate → merge → burn subtitles → generate publish info. Produces a video with hardcoded bilingual (EN/ZH) subtitles and a
publish_info.md with Bilibili upload metadata.
When to Use
- User provides a YouTube URL (single video or playlist) and wants a Bilibili-ready version
- User needs bilingual EN-ZH subtitles burned into video
- User wants to repurpose English video content for Chinese audience
Quick Reference
| Step | Tool | Command | Output |
|---|---|---|---|
| 0. Update | | Auto-check for skill updates | — |
| 1. Download | | | |
| 2. Transcribe | * | then transcribe | |
| 2.5 Validate | | | (fixed) |
| 3. Translate | AI | SRT-aware batch translation | |
| 4. Merge | | | |
| 4.5 Style | | | |
| 5. Burn | | | |
| 6. Publish | AI | Analyze content, generate metadata | |
Pre-flight: Auto Update
Run this BEFORE any pipeline step. Locates the skill directory and checks for updates. The
SKILL_DIR variable is reused by later steps for script paths.
# Find skill directory (works across Claude Code, OpenClaw, Hermes, Pi) SKILL_DIR="$(find ~/.claude/skills ~/.openclaw/skills ~/.hermes/skills ~/.pi/agent/skills ~/.agents/skills ~/myagents/myskills -maxdepth 2 -name 'yt2bb' -type d 2>/dev/null | head -1)" echo "yt2bb: SKILL_DIR=$SKILL_DIR" if [ -n "$SKILL_DIR" ] && [ -d "$SKILL_DIR/.git" ]; then git -C "$SKILL_DIR" fetch --quiet origin main 2>/dev/null LOCAL=$(git -C "$SKILL_DIR" rev-parse HEAD) REMOTE=$(git -C "$SKILL_DIR" rev-parse origin/main 2>/dev/null) if [ "$LOCAL" != "$REMOTE" ]; then echo "yt2bb: new version available. Run: git -C $SKILL_DIR pull origin main" else echo "yt2bb: up to date." fi fi
Note: Does not auto-pull — the current session already loaded the old SKILL.md. Notify the user and let them update between sessions.
Pipeline Details
Step 1: Download
Single video:
slug="video-name" # or: slug=$(python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title") mkdir -p "${slug}" yt-dlp --cookies-from-browser chrome \ -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \ -o "${slug}/${slug}.mp4" "https://www.youtube.com/watch?v=VIDEO_ID"
Playlist / series:
yt-dlp --cookies-from-browser chrome \ -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \ -o "%(playlist_index)03d-%(title)s/%(playlist_index)03d-%(title)s.mp4" \ "https://www.youtube.com/playlist?list=PLAYLIST_ID"
After downloading, rename each folder to a clean slug and run Steps 2–6 for each video sequentially.
: ensure mp4 output, avoid webm-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]"
: zero-padded index to preserve playlist order%(playlist_index)03d- If
fails, export cookies first — see Troubleshooting--cookies-from-browser
Step 2: Transcribe
First run the environment check to detect your platform and get a tailored whisper command:
python3 "$SKILL_DIR/srt_utils.py" check-whisper
This auto-detects OS, GPU (CUDA/Metal/CPU), memory, and installed backends, then recommends the best backend + model for your hardware. If memory detection is unavailable, it falls back conservatively instead of assuming a low-memory machine. Use the command it prints.
Manual fallback (openai-whisper, works everywhere):
src_lang="en" # Change to ja/ko/es/etc. based on source video whisper_model="medium" # check-whisper recommends the best model for your hardware whisper "${slug}/${slug}.mp4" \ --model "$whisper_model" \ --language "$src_lang" \ --word_timestamps True \ --condition_on_previous_text False \ --output_format srt \ --max_line_width 40 --max_line_count 1 \ --output_dir "${slug}" mv "${slug}/${slug}.srt" "${slug}/${slug}_${src_lang}.srt"
Supported backends:
| Backend | Best for | Install |
|---|---|---|
| macOS Apple Silicon (fastest) | |
| Windows/Linux CUDA, or CPU (~4x faster) | |
| Universal fallback | |
Model selection (auto-recommended by
check-whisper):
— fast draft, low accuracy, CPU-friendly (~1 GB)tiny
— default, good balance (~5 GB)medium
— best accuracy, recommended for JA/KO/ZH source (~10 GB)large-v3
Notes:
: explicitly set to avoid misdetection; supports--language
,en
,ja
,ko
, etc.es
: more precise subtitle timing--word_timestamps True
: prevent hallucination loops--condition_on_previous_text False- If output is garbled or repeated, add anti-hallucination flags — see Troubleshooting
Step 2.5: Validate & Fix (optional)
python3 "$SKILL_DIR/srt_utils.py" validate "${slug}/${slug}_${src_lang}.srt" # If issues found: python3 "$SKILL_DIR/srt_utils.py" fix "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_${src_lang}.srt"
Step 3: Translate
Read
{slug}_{src_lang}.srt and translate to Chinese. Critical rules:
These rules are modeled on the Netflix Simplified Chinese Timed Text Style Guide; follow them to produce broadcast-grade subtitles.
- Keep SRT format intact — preserve index numbers, timestamps (
lines) exactly as-is--> - 1:1 entry mapping — every source entry must produce exactly one translated entry (same count)
- Optimize for bottom subtitles — keep each Chinese entry to 1 line whenever possible so the final bilingual subtitle stays compact near the bottom of the frame
- Max 16 full-width characters per line (Netflix SC spec). Prefer 12–16; if a cue is very short (< 1 s) compress further so reading speed stays ≤ 9 characters/second
- Shorten with judgment, not mechanically — remove filler words, repeated subjects, weak interjections, and redundant politeness before dropping key meaning
- Match subtitle duration — the line must feel readable within the time on screen; if the cue is very short, compress more aggressively
- No trailing punctuation on Chinese cues — drop ending
,。
,!
; keep mid-sentence?
,,
,、
only when they add clarity; - Use full-width Chinese punctuation inside cues (
); use 「」 for inner quotes, not,。!?、;:
or""'' - Half-width digits and Latin — numbers, units, product names, and code identifiers stay half-width (
,GPT-4
,30fps
); only punctuation is full-width2026 - Line-break discipline — never break after function words (
,的
,了
,吗
,呢
,吧
); never split an English phrasal unit across a line break; keep modifiers with their heads啊 - Keep terminology consistent — technical terms, names, product names, and recurring phrases should be translated the same way across batches. Maintain an inline glossary if needed
- Adapt, don't transliterate — preserve register, tone, and intent over literal word matching; idioms become natural Chinese equivalents
- Translate in batches of 10 entries — output each batch in valid SRT format, then continue
- Do NOT merge or split entries — maintain original segmentation
- Save as
{slug}/{slug}_zh.srt
Step 4: Merge
python3 "$SKILL_DIR/srt_utils.py" merge \ "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_zh.srt" "${slug}/${slug}_bilingual.srt"
Step 4.25: Netflix Lint (recommended)
Run
lint on the merged bilingual SRT to catch Netflix Timed Text Style Guide violations that validate doesn't cover — reading speed (CPS), per-line length, inter-cue gaps, and line count.
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt"
Defaults (all overridable via flags):
| Rule | Threshold | Flag |
|---|---|---|
| Reading speed (English) | ≤ 17 CPS | |
| Reading speed (Simplified Chinese) | ≤ 9 CPS | |
| Min cue duration | 833 ms (5/6 s) | |
| Max cue duration | 7000 ms | |
| Min inter-cue gap | 83 ms (2 frames @ 24 fps) | |
| Max chars/line (English) | 42 | |
| Max chars/line (Chinese, full-width) | 16 | |
| Max lines per cue | 2 | — |
Severity model:
- Errors (exit code 2): duration out of bounds, CPS over limit, > 2 lines per cue. These break Netflix acceptance and should be fixed before burning.
- Warnings (exit 0 unless errors also exist): per-line length, tight gaps. These are recommendations — address if feasible, but they don't block delivery.
When CPS errors fire, the fix is almost always upstream — go back to Step 3 and rewrite the offending Chinese entry to fit the time window. Do not solve CPS by extending the cue past the source's spoken duration.
Agent-friendly output:
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt" --format json
Returns
{ok, error_count, warning_count, issues: [{index, code, severity, message}, ...]} for programmatic filtering.
Step 4.5: Style — Convert to ASS
Convert the bilingual SRT to an ASS file. ASS enables per-line color, font size, and glow effects that are impossible with SRT
force_style. Layout rule: subtitles always stay at the bottom. Default stack: ZH on the upper line of the bottom stack, EN on the lower line. The presets are tuned to keep the block readable while reducing overlap risk with lower-screen content.
IMPORTANT — Ask before proceeding. Present the preset table below to the user and ask which style they prefer. Do NOT silently pick a default. If the user has no preference, use
.clean
Available presets:
| Preset | Look | Best for |
|---|---|---|
| Pure white text, thin black outline, soft drop shadow, no box — modeled on the Netflix Timed Text Style Guide | Professional, broadcast-grade look. Best default for documentaries, interviews, long-form content, and anything that should feel "streaming-platform native". Use with on Linux / on macOS for closest Netflix Sans feel |
| Yellow text on gray box — golden ZH + light yellow EN, semi-transparent light gray background | Readability safety net for busy or mixed-brightness footage where 's outline-only text could get visually lost. The gray box guarantees a readable contrast pad |
| Yellow ZH + white EN with colored glow — bright yellow ZH + white EN, blurred outer glow, no background box | Entertainment, vlogs, energetic edits. Most eye-catching, but weakest on bright or busy backgrounds |
Example prompt to user:
字幕有三套样式可选:
— 纯白字体 + 细黑描边 + 柔和阴影(默认推荐,Netflix 专业观感,适合纪录片/访谈/长内容)netflix — 黄色字体 + 灰色半透明底框(亮背景或花背景的兜底选项,底框保证对比度)clean — 黄色/白色字体 + 彩色外发光(更抢眼,适合娱乐/Vlog)glow- 自定义 — 提供
样式文件,完全控制字体、颜色、大小(可用 Aegisub 可视化编辑).ass选哪个?默认推荐
;如果画面特别花哨或底部信息多,可改用netflix。clean
# Netflix-grade default (white + outline + soft shadow), ZH on top python3 "$SKILL_DIR/srt_utils.py" to_ass \ "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \ --preset netflix # Gray-box fallback for busy backgrounds, EN on top python3 "$SKILL_DIR/srt_utils.py" to_ass \ "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \ --preset clean --top en # Vibrant glow (B站 entertainment style) python3 "$SKILL_DIR/srt_utils.py" to_ass \ "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \ --preset glow
Custom style file — for full control, provide an external
.ass file with your own [V4+ Styles] section. It must contain styles named EN and ZH, or to_ass will fail early with a validation error. You can design styles visually with Aegisub and export.
python3 "$SKILL_DIR/srt_utils.py" to_ass \ "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \ --style-file my_styles.ass
Optionally add
; en_tag= and ; zh_tag={\blur5} comment lines in the .ass file to inject ASS override tags per language.
Font by platform (pass with
--font, ignored when using --style-file):
| Platform | Flag |
|---|---|
| macOS | (default) |
| Linux | |
| Windows | |
Other options:
— which language on the upper line of the bottom stack (default:--top zh|en
)zh
— video resolution (default:--res WxH
)1920x1080
Readability notes for all presets:
- Presets stay bottom-aligned at all times; they do not move to the top automatically
- Font size, outline, and vertical margins scale with
so 720p and 1080p keep similar visual balance--res
is the safest choice when you must keep subtitles at the bottom in every shotclean
Step 5: Burn Subtitles
Use the
ass= filter (not subtitles=) — all styling comes from the ASS file.
ffmpeg -i "${slug}/${slug}.mp4" \ -vf "ass='${slug}/${slug}_bilingual.ass'" \ -c:v libx264 -crf 23 -preset medium \ -c:a copy "${slug}/${slug}_bilingual.mp4"
: good quality with reasonable file size-c:v libx264 -crf 23
: balance between speed and compression (use-preset medium
for quicker encode)fast- No
needed — styles are embedded in the ASS fileforce_style
Step 6: Generate Publish Info
Based on the video content (from
{slug}_{src_lang}.srt and {slug}_zh.srt), generate {slug}/publish_info.md.
All output in this file must be in Chinese (targeting Bilibili audience).
# Publish Info ## Source {YouTube URL} ## Titles (5 variants) 1. {Suspense/question style — spark curiosity} 2. {Data/achievement driven — emphasize results} 3. {Controversial/opinion style — spark discussion} 4. {Tutorial/practical style — emphasize utility} 5. {Emotional/relatable style — connect with audience} ## Tags {~10 comma-separated keywords covering topic, technology, domain} ## Description {3-5 sentences summarizing core content and highlights} ## Chapter Timestamps 00:00 {chapter name} ...
Generation rules:
- Title style must match Bilibili conventions: conversational tone, suspense hooks, liberal use of symbols (【】, ?, !)
- Tags should cover both Chinese and English keywords for discoverability
- Timestamps extracted from
at topic transition points{slug}_bilingual.srt - Description needs a strong hook — first two sentences determine whether users expand to read
Output Structure
{slug}/ ├── {slug}.mp4 # Source video ├── {slug}_{src_lang}.srt # Source language subtitles ├── {slug}_zh.srt # Chinese subtitles ├── {slug}_bilingual.srt # Merged bilingual ├── {slug}_bilingual.mp4 # Final output └── publish_info.md # Bilibili upload metadata
Utility: srt_utils.py
python3 "$SKILL_DIR/srt_utils.py" merge en.srt zh.srt output.srt # Merge bilingual python3 "$SKILL_DIR/srt_utils.py" merge --dry-run en.srt zh.srt output.srt # Pre-check without writing python3 "$SKILL_DIR/srt_utils.py" validate input.srt # Check timing issues python3 "$SKILL_DIR/srt_utils.py" fix input.srt output.srt # Fix timing/overlaps (multi-pass) python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title" # Generate slug python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass # Convert to styled ASS (default: clean, ZH on top) python3 "$SKILL_DIR/srt_utils.py" to_ass --dry-run input.srt output.ass # Pre-check without writing python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --preset glow --top en python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --style-file custom.ass # User-defined styles python3 "$SKILL_DIR/srt_utils.py" check-whisper # Detect platform, recommend whisper backend + model
Common Mistakes
- Mismatched entry counts: Merge fails by default — fix translation or use
to pad--pad-missing - Font not found: Ensure PingFang SC is installed (macOS default) or substitute (see Troubleshooting)
Troubleshooting
yt-dlp: Cookie Auth Failure
--cookies-from-browser chrome requires Chrome to be closed (or uses a snapshot of the profile). If it fails:
# Export cookies once, then reuse the file yt-dlp --cookies-from-browser chrome --cookies cookies.txt --skip-download "URL" yt-dlp --cookies cookies.txt -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" -o "${slug}/${slug}.mp4" "URL"
For 429 / rate-limit errors, add
--sleep-interval 3 --max-sleep-interval 8.
whisper: Wrong Language or Hallucination Loops
Symptoms: repeated phrases, garbled characters, or near-empty SRT despite clear audio.
whisper "${slug}/${slug}.mp4" \ --model medium \ --language "$src_lang" \ --condition_on_previous_text False \ --no_speech_threshold 0.6 \ --logprob_threshold -1.0 \ --compression_ratio_threshold 2.0 \ --output_format srt \ --output_dir "${slug}"
If language is still misdetected, the audio likely has long silence or non-speech segments — add
--vad_filter True to suppress them.
ffmpeg: Font Not Found / CJK Boxes
Pass the correct font via
--font in the to_ass step (Step 4.5). The ASS file embeds the font name, so ffmpeg needs it installed at burn time.
| Platform | Font | Install |
|---|---|---|
| macOS | | pre-installed |
| Linux | | |
| Linux (alt) | | |
| Windows | | pre-installed |
Regenerate the ASS file with the correct
--font flag, then re-run the burn step.
Privacy & Data Flow
- Browser cookies: Step 1 uses
to access age-gated or private videos. This reads Chrome cookies locally — no cookies are transmitted beyond YouTube's own servers. To avoid this, export cookies to a file first (see Troubleshooting above).yt-dlp --cookies-from-browser chrome - Transcripts & translation: Step 3 (translate) and Step 6 (publish info) are performed by the AI agent in the conversation. Transcripts are sent to whatever model/service the agent uses (e.g. Claude API). If the video contains sensitive content, use a local model for those steps.
- Auto-update check: The pre-flight step runs
to check for skill updates. It does not auto-pull or execute remote code.git fetch - No telemetry:
makes no network requests. All processing (SRT parsing, merging, ASS generation, hardware detection) is fully local.srt_utils.py