Autosearch video-to-text-bcut

Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.

install

source · Clone the upstream repo

git clone https://github.com/0xmariowu/Autosearch

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/0xmariowu/Autosearch "$T" && mkdir -p ~/.claude/skills && cp -r "$T/autosearch/skills/tools/video-to-text-bcut" ~/.claude/skills/0xmariowu-autosearch-video-to-text-bcut && rm -rf "$T"

manifest: autosearch/skills/tools/video-to-text-bcut/SKILL.md

source content

Uses Bilibili's public Bcut ASR API (no login required):

yt-dlp extracts audio from video URL
ffmpeg converts to 16kHz mono WAV
Bcut API: upload → create task → poll → get word-level timestamps
SegmentBuilder aggregates chars into sentence-level segments

Output

{
  "ok": true,
  "text": "完整转录文本",
  "segments": [{"start": 0.0, "end": 2.5, "text": "第一句话。"}],
  "duration_seconds": 120.5,
  "source": "https://..."
}

Quality Bar

Returns segments with
```
start
```
/
```
end
```
in seconds
Handles Bcut polling timeout gracefully (returns partial text)
Falls back gracefully if Bcut fails