Autosearch video-to-text-bcut

Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.

install
source · Clone the upstream repo
git clone https://github.com/0xmariowu/Autosearch
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/0xmariowu/Autosearch "$T" && mkdir -p ~/.claude/skills && cp -r "$T/autosearch/skills/tools/video-to-text-bcut" ~/.claude/skills/0xmariowu-autosearch-video-to-text-bcut && rm -rf "$T"
manifest: autosearch/skills/tools/video-to-text-bcut/SKILL.md
source content

Uses Bilibili's public Bcut ASR API (no login required):

  1. yt-dlp extracts audio from video URL
  2. ffmpeg converts to 16kHz mono WAV
  3. Bcut API: upload → create task → poll → get word-level timestamps
  4. SegmentBuilder aggregates chars into sentence-level segments

Output

{
  "ok": true,
  "text": "完整转录文本",
  "segments": [{"start": 0.0, "end": 2.5, "text": "第一句话。"}],
  "duration_seconds": 120.5,
  "source": "https://..."
}

Quality Bar

  • Returns segments with
    start
    /
    end
    in seconds
  • Handles Bcut polling timeout gracefully (returns partial text)
  • Falls back gracefully if Bcut fails