Autosearch video-to-text-bcut
Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.
install
source · Clone the upstream repo
git clone https://github.com/0xmariowu/Autosearch
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/0xmariowu/Autosearch "$T" && mkdir -p ~/.claude/skills && cp -r "$T/autosearch/skills/tools/video-to-text-bcut" ~/.claude/skills/0xmariowu-autosearch-video-to-text-bcut && rm -rf "$T"
manifest:
autosearch/skills/tools/video-to-text-bcut/SKILL.mdsource content
Uses Bilibili's public Bcut ASR API (no login required):
- yt-dlp extracts audio from video URL
- ffmpeg converts to 16kHz mono WAV
- Bcut API: upload → create task → poll → get word-level timestamps
- SegmentBuilder aggregates chars into sentence-level segments
Output
{ "ok": true, "text": "完整转录文本", "segments": [{"start": 0.0, "end": 2.5, "text": "第一句话。"}], "duration_seconds": 120.5, "source": "https://..." }
Quality Bar
- Returns segments with
/start
in secondsend - Handles Bcut polling timeout gracefully (returns partial text)
- Falls back gracefully if Bcut fails