Skills yt2bb

Name: yt2bb
Author: openclaw

Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms.

install

source · Clone the upstream repo

git clone https://github.com/openclaw/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agents365-ai/yt2bb" ~/.claude/skills/openclaw-skills-yt2bb && rm -rf "$T"

OpenClaw · Install into ~/.openclaw/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/agents365-ai/yt2bb" ~/.openclaw/skills/openclaw-skills-yt2bb && rm -rf "$T"

manifest: skills/agents365-ai/yt2bb/SKILL.md

yt2bb — YouTube to Bilibili Video Repurposing

Overview

Six-step pipeline: download → transcribe → translate → merge → burn subtitles → generate publish info. Produces a video with hardcoded bilingual (EN/ZH) subtitles and a

publish_info.md

with Bilibili upload metadata.

When to Use

User provides a YouTube URL (single video or playlist) and wants a Bilibili-ready version
User needs bilingual EN-ZH subtitles burned into video
User wants to repurpose English video content for Chinese audience

Quick Reference

Step	Tool	Command	Output
0. Update	`git`	Auto-check for skill updates	—
1. Download	`yt-dlp`	`yt-dlp --cookies-from-browser chrome -f ... -o ...`	`{slug}.mp4`
2. Transcribe	`whisper` *	`srt_utils.py check-whisper` then transcribe	`{slug}_{lang}.srt`
2.5 Validate	`srt_utils.py`	`srt_utils.py validate / fix`	`{slug}_{lang}.srt` (fixed)
3. Translate	AI	SRT-aware batch translation	`{slug}_zh.srt`
4. Merge	`srt_utils.py`	`srt_utils.py merge ...`	`{slug}_bilingual.srt`
4.5 Style	`srt_utils.py`	`srt_utils.py to_ass --preset netflix\|clean\|glow`	`{slug}_bilingual.ass`
5. Burn	`ffmpeg`	`ffmpeg -c:v libx264 -vf ass=...`	`{slug}_bilingual.mp4`
6. Publish	AI	Analyze content, generate metadata	`publish_info.md`

Pre-flight: Auto Update

Run this BEFORE any pipeline step. Locates the skill directory and checks for updates. The

SKILL_DIR

variable is reused by later steps for script paths.

# Find skill directory (works across Claude Code, OpenClaw, Hermes, Pi)
SKILL_DIR="$(find ~/.claude/skills ~/.openclaw/skills ~/.hermes/skills ~/.pi/agent/skills ~/.agents/skills ~/myagents/myskills -maxdepth 2 -name 'yt2bb' -type d 2>/dev/null | head -1)"
echo "yt2bb: SKILL_DIR=$SKILL_DIR"
if [ -n "$SKILL_DIR" ] && [ -d "$SKILL_DIR/.git" ]; then
  git -C "$SKILL_DIR" fetch --quiet origin main 2>/dev/null
  LOCAL=$(git -C "$SKILL_DIR" rev-parse HEAD)
  REMOTE=$(git -C "$SKILL_DIR" rev-parse origin/main 2>/dev/null)
  if [ "$LOCAL" != "$REMOTE" ]; then
    echo "yt2bb: new version available. Run: git -C $SKILL_DIR pull origin main"
  else
    echo "yt2bb: up to date."
  fi
fi

Note: Does not auto-pull — the current session already loaded the old SKILL.md. Notify the user and let them update between sessions.

Pipeline Details

Step 1: Download

Single video:

slug="video-name"  # or: slug=$(python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title")
mkdir -p "${slug}"
yt-dlp --cookies-from-browser chrome \
  -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
  -o "${slug}/${slug}.mp4" "https://www.youtube.com/watch?v=VIDEO_ID"

Playlist / series:

yt-dlp --cookies-from-browser chrome \
  -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
  -o "%(playlist_index)03d-%(title)s/%(playlist_index)03d-%(title)s.mp4" \
  "https://www.youtube.com/playlist?list=PLAYLIST_ID"

After downloading, rename each folder to a clean slug and run Steps 2–6 for each video sequentially.

-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]"

: ensure mp4 output, avoid webm

```
%(playlist_index)03d
```
: zero-padded index to preserve playlist order
If
```
--cookies-from-browser
```
fails, export cookies first — see Troubleshooting

Step 2: Transcribe

First run the environment check to detect your platform and get a tailored whisper command:

python3 "$SKILL_DIR/srt_utils.py" check-whisper

This auto-detects OS, GPU (CUDA/Metal/CPU), memory, and installed backends, then recommends the best backend + model for your hardware. If memory detection is unavailable, it falls back conservatively instead of assuming a low-memory machine. Use the command it prints.

Manual fallback (openai-whisper, works everywhere):

src_lang="en"      # Change to ja/ko/es/etc. based on source video
whisper_model="medium"  # check-whisper recommends the best model for your hardware
whisper "${slug}/${slug}.mp4" \
  --model "$whisper_model" \
  --language "$src_lang" \
  --word_timestamps True \
  --condition_on_previous_text False \
  --output_format srt \
  --max_line_width 40 --max_line_count 1 \
  --output_dir "${slug}"
mv "${slug}/${slug}.srt" "${slug}/${slug}_${src_lang}.srt"

Supported backends:

Backend	Best for	Install
`mlx-whisper`	macOS Apple Silicon (fastest)	`pip install mlx-whisper`
`whisper-ctranslate2`	Windows/Linux CUDA, or CPU (~4x faster)	`pip install whisper-ctranslate2`
`openai-whisper`	Universal fallback	`pip install openai-whisper`

Model selection (auto-recommended by

check-whisper

```
tiny
```
— fast draft, low accuracy, CPU-friendly (~1 GB)
```
medium
```
— default, good balance (~5 GB)
```
large-v3
```
— best accuracy, recommended for JA/KO/ZH source (~10 GB)

Notes:

```
--language
```
: explicitly set to avoid misdetection; supports
```
en
```
,
```
ja
```
,
```
ko
```
,
```
es
```
, etc.
```
--word_timestamps True
```
: more precise subtitle timing
```
--condition_on_previous_text False
```
: prevent hallucination loops
If output is garbled or repeated, add anti-hallucination flags — see Troubleshooting

Step 2.5: Validate & Fix (optional)

python3 "$SKILL_DIR/srt_utils.py" validate "${slug}/${slug}_${src_lang}.srt"
# If issues found:
python3 "$SKILL_DIR/srt_utils.py" fix "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_${src_lang}.srt"

Step 3: Translate

Read

{slug}_{src_lang}.srt

and translate to Chinese. Critical rules:

These rules are modeled on the Netflix Simplified Chinese Timed Text Style Guide; follow them to produce broadcast-grade subtitles.

Keep SRT format intact — preserve index numbers, timestamps (
```
-->
```
lines) exactly as-is
1:1 entry mapping — every source entry must produce exactly one translated entry (same count)
Optimize for bottom subtitles — keep each Chinese entry to 1 line whenever possible so the final bilingual subtitle stays compact near the bottom of the frame
Max 16 full-width characters per line (Netflix SC spec). Prefer 12–16; if a cue is very short (< 1 s) compress further so reading speed stays ≤ 9 characters/second
Shorten with judgment, not mechanically — remove filler words, repeated subjects, weak interjections, and redundant politeness before dropping key meaning
Match subtitle duration — the line must feel readable within the time on screen; if the cue is very short, compress more aggressively
No trailing punctuation on Chinese cues — drop ending
```
。
```
,
```
！
```
,
```
？
```
; keep mid-sentence
```
，
```
,
```
、
```
,
```
；
```
only when they add clarity
Use full-width Chinese punctuation inside cues (
```
，。！？、；：
```
); use 「」 for inner quotes, not
```
""
```
or
```
''
```
Half-width digits and Latin — numbers, units, product names, and code identifiers stay half-width (
```
GPT-4
```
,
```
30fps
```
,
```
2026
```
); only punctuation is full-width
Line-break discipline — never break after function words (
```
的
```
,
```
了
```
,
```
吗
```
,
```
呢
```
,
```
吧
```
,
```
啊
```
); never split an English phrasal unit across a line break; keep modifiers with their heads
Keep terminology consistent — technical terms, names, product names, and recurring phrases should be translated the same way across batches. Maintain an inline glossary if needed
Adapt, don't transliterate — preserve register, tone, and intent over literal word matching; idioms become natural Chinese equivalents
Translate in batches of 10 entries — output each batch in valid SRT format, then continue
Do NOT merge or split entries — maintain original segmentation
Save as
```
{slug}/{slug}_zh.srt
```

Step 4: Merge

python3 "$SKILL_DIR/srt_utils.py" merge \
  "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_zh.srt" "${slug}/${slug}_bilingual.srt"

Step 4.25: Netflix Lint (recommended)

Run

lint

on the merged bilingual SRT to catch Netflix Timed Text Style Guide violations that

validate

doesn't cover — reading speed (CPS), per-line length, inter-cue gaps, and line count.

python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt"

Defaults (all overridable via flags):

Rule	Threshold	Flag
Reading speed (English)	≤ 17 CPS	`--max-cps-en`
Reading speed (Simplified Chinese)	≤ 9 CPS	`--max-cps-zh`
Min cue duration	833 ms (5/6 s)	`--min-duration-ms`
Max cue duration	7000 ms	`--max-duration-ms`
Min inter-cue gap	83 ms (2 frames @ 24 fps)	`--min-gap-ms`
Max chars/line (English)	42	`--max-chars-en`
Max chars/line (Chinese, full-width)	16	`--max-chars-zh`
Max lines per cue	2	—

Severity model:

Errors (exit code 2): duration out of bounds, CPS over limit, > 2 lines per cue. These break Netflix acceptance and should be fixed before burning.
Warnings (exit 0 unless errors also exist): per-line length, tight gaps. These are recommendations — address if feasible, but they don't block delivery.

When CPS errors fire, the fix is almost always upstream — go back to Step 3 and rewrite the offending Chinese entry to fit the time window. Do not solve CPS by extending the cue past the source's spoken duration.

Agent-friendly output:

python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt" --format json

Returns

{ok, error_count, warning_count, issues: [{index, code, severity, message}, ...]}

for programmatic filtering.

Step 4.5: Style — Convert to ASS

Convert the bilingual SRT to an ASS file. ASS enables per-line color, font size, and glow effects that are impossible with SRT

force_style

. Layout rule: subtitles always stay at the bottom. Default stack: ZH on the upper line of the bottom stack, EN on the lower line. The presets are tuned to keep the block readable while reducing overlap risk with lower-screen content.

IMPORTANT — Ask before proceeding. Present the preset table below to the user and ask which style they prefer. Do NOT silently pick a default. If the user has no preference, use
clean
.

Available presets:

Preset Look Best for

netflix

Pure white text, thin black outline, soft drop shadow, no box — modeled on the Netflix Timed Text Style Guide

Professional, broadcast-grade look. Best default for documentaries, interviews, long-form content, and anything that should feel "streaming-platform native". Use with

--font "Source Han Sans SC"

on Linux /

"PingFang SC"

on macOS for closest Netflix Sans feel

clean

Yellow text on gray box — golden ZH + light yellow EN, semi-transparent light gray background

Readability safety net for busy or mixed-brightness footage where

netflix

's outline-only text could get visually lost. The gray box guarantees a readable contrast pad

glow

Yellow ZH + white EN with colored glow — bright yellow ZH + white EN, blurred outer glow, no background box

Entertainment, vlogs, energetic edits. Most eye-catching, but weakest on bright or busy backgrounds

Example prompt to user:

字幕有三套样式可选：
netflix
— 纯白字体 + 细黑描边 + 柔和阴影（默认推荐，Netflix 专业观感，适合纪录片/访谈/长内容）
clean
— 黄色字体 + 灰色半透明底框（亮背景或花背景的兜底选项，底框保证对比度）
glow
— 黄色/白色字体 + 彩色外发光（更抢眼，适合娱乐/Vlog）
自定义 — 提供
.ass
样式文件，完全控制字体、颜色、大小（可用 Aegisub 可视化编辑）
选哪个？默认推荐
netflix
；如果画面特别花哨或底部信息多，可改用
clean
。

# Netflix-grade default (white + outline + soft shadow), ZH on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
  "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
  --preset netflix

# Gray-box fallback for busy backgrounds, EN on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
  "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
  --preset clean --top en

# Vibrant glow (B站 entertainment style)
python3 "$SKILL_DIR/srt_utils.py" to_ass \
  "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
  --preset glow

Custom style file — for full control, provide an external

.ass

file with your own

[V4+ Styles]

section. It must contain styles named

EN

and

ZH

, or

to_ass

will fail early with a validation error. You can design styles visually with Aegisub and export.

python3 "$SKILL_DIR/srt_utils.py" to_ass \
  "${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
  --style-file my_styles.ass

Optionally add

; en_tag=

and

; zh_tag={\blur5}

comment lines in the

.ass

file to inject ASS override tags per language.

Font by platform (pass with

--font

, ignored when using

--style-file

Platform	Flag
macOS	`--font "PingFang SC"` (default)
Linux	`--font "Noto Sans CJK SC"`
Windows	`--font "Microsoft YaHei"`

Other options:

```
--top zh|en
```
— which language on the upper line of the bottom stack (default:
```
zh
```
)
```
--res WxH
```
— video resolution (default:
```
1920x1080
```
)

Readability notes for all presets:

Presets stay bottom-aligned at all times; they do not move to the top automatically
Font size, outline, and vertical margins scale with
```
--res
```
so 720p and 1080p keep similar visual balance
```
clean
```
is the safest choice when you must keep subtitles at the bottom in every shot

Step 5: Burn Subtitles

Use the

ass=

filter (not

subtitles=

) — all styling comes from the ASS file.

ffmpeg -i "${slug}/${slug}.mp4" \
  -vf "ass='${slug}/${slug}_bilingual.ass'" \
  -c:v libx264 -crf 23 -preset medium \
  -c:a copy "${slug}/${slug}_bilingual.mp4"

```
-c:v libx264 -crf 23
```
: good quality with reasonable file size
```
-preset medium
```
: balance between speed and compression (use
```
fast
```
for quicker encode)
No
```
force_style
```
needed — styles are embedded in the ASS file

Step 6: Generate Publish Info

Based on the video content (from

{slug}_{src_lang}.srt

and

{slug}_zh.srt

), generate

{slug}/publish_info.md

All output in this file must be in Chinese (targeting Bilibili audience).

# Publish Info

## Source
{YouTube URL}

## Titles (5 variants)
1. {Suspense/question style — spark curiosity}
2. {Data/achievement driven — emphasize results}
3. {Controversial/opinion style — spark discussion}
4. {Tutorial/practical style — emphasize utility}
5. {Emotional/relatable style — connect with audience}

## Tags
{~10 comma-separated keywords covering topic, technology, domain}

## Description
{3-5 sentences summarizing core content and highlights}

## Chapter Timestamps
00:00 {chapter name}
...

Generation rules:

Title style must match Bilibili conventions: conversational tone, suspense hooks, liberal use of symbols (【】, ?, !)
Tags should cover both Chinese and English keywords for discoverability
Timestamps extracted from
```
{slug}_bilingual.srt
```
at topic transition points
Description needs a strong hook — first two sentences determine whether users expand to read

Output Structure

{slug}/
├── {slug}.mp4              # Source video
├── {slug}_{src_lang}.srt   # Source language subtitles
├── {slug}_zh.srt           # Chinese subtitles
├── {slug}_bilingual.srt    # Merged bilingual
├── {slug}_bilingual.mp4    # Final output
└── publish_info.md         # Bilibili upload metadata

Utility: srt_utils.py

python3 "$SKILL_DIR/srt_utils.py" merge en.srt zh.srt output.srt          # Merge bilingual
python3 "$SKILL_DIR/srt_utils.py" merge --dry-run en.srt zh.srt output.srt # Pre-check without writing
python3 "$SKILL_DIR/srt_utils.py" validate input.srt                       # Check timing issues
python3 "$SKILL_DIR/srt_utils.py" fix input.srt output.srt                 # Fix timing/overlaps (multi-pass)
python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title"                    # Generate slug
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass              # Convert to styled ASS (default: clean, ZH on top)
python3 "$SKILL_DIR/srt_utils.py" to_ass --dry-run input.srt output.ass    # Pre-check without writing
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --preset glow --top en
python3 "$SKILL_DIR/srt_utils.py" to_ass input.srt output.ass --style-file custom.ass  # User-defined styles
python3 "$SKILL_DIR/srt_utils.py" check-whisper                    # Detect platform, recommend whisper backend + model

Common Mistakes

Mismatched entry counts: Merge fails by default — fix translation or use
```
--pad-missing
```
to pad
Font not found: Ensure PingFang SC is installed (macOS default) or substitute (see Troubleshooting)

Troubleshooting

yt-dlp: Cookie Auth Failure

--cookies-from-browser chrome

requires Chrome to be closed (or uses a snapshot of the profile). If it fails:

# Export cookies once, then reuse the file
yt-dlp --cookies-from-browser chrome --cookies cookies.txt --skip-download "URL"
yt-dlp --cookies cookies.txt -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" -o "${slug}/${slug}.mp4" "URL"

For 429 / rate-limit errors, add

--sleep-interval 3 --max-sleep-interval 8

whisper: Wrong Language or Hallucination Loops

Symptoms: repeated phrases, garbled characters, or near-empty SRT despite clear audio.

whisper "${slug}/${slug}.mp4" \
  --model medium \
  --language "$src_lang" \
  --condition_on_previous_text False \
  --no_speech_threshold 0.6 \
  --logprob_threshold -1.0 \
  --compression_ratio_threshold 2.0 \
  --output_format srt \
  --output_dir "${slug}"

If language is still misdetected, the audio likely has long silence or non-speech segments — add

--vad_filter True

to suppress them.

ffmpeg: Font Not Found / CJK Boxes

Pass the correct font via

--font

in the

to_ass

step (Step 4.5). The ASS file embeds the font name, so ffmpeg needs it installed at burn time.

Platform	Font	Install
macOS	`PingFang SC`	pre-installed
Linux	`Noto Sans CJK SC`	`sudo apt install fonts-noto-cjk`
Linux (alt)	`WenQuanYi Micro Hei`	`sudo apt install fonts-wqy-microhei`
Windows	`Microsoft YaHei`	pre-installed

Regenerate the ASS file with the correct

--font

flag, then re-run the burn step.

Privacy & Data Flow

Browser cookies: Step 1 uses
```
yt-dlp --cookies-from-browser chrome
```
to access age-gated or private videos. This reads Chrome cookies locally — no cookies are transmitted beyond YouTube's own servers. To avoid this, export cookies to a file first (see Troubleshooting above).
Transcripts & translation: Step 3 (translate) and Step 6 (publish info) are performed by the AI agent in the conversation. Transcripts are sent to whatever model/service the agent uses (e.g. Claude API). If the video contains sensitive content, use a local model for those steps.
Auto-update check: The pre-flight step runs
```
git fetch
```
to check for skill updates. It does not auto-pull or execute remote code.
No telemetry:
```
srt_utils.py
```
makes no network requests. All processing (SRT parsing, merging, ASS generation, hardware detection) is fully local.