Autosearch video-to-text-local
Transcribe video/audio URL or local file to text + SRT using yt-dlp + local mlx-whisper (Apple Silicon). Free, offline, fastest on M-series Macs. Opt-in advanced path for users with Apple Silicon + mlx-whisper installed. Returns raw text and segments; summary is caller's responsibility.
install
source · Clone the upstream repo
git clone https://github.com/0xmariowu/Autosearch
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/0xmariowu/Autosearch "$T" && mkdir -p ~/.claude/skills && cp -r "$T/autosearch/skills/tools/video-to-text-local" ~/.claude/skills/0xmariowu-autosearch-video-to-text-local && rm -rf "$T"
manifest:
autosearch/skills/tools/video-to-text-local/SKILL.mdsource content
Transcribe video or audio entirely on-device using
yt-dlp audio extraction plus local mlx-whisper inference. Works offline, zero API cost, fastest on Apple Silicon M-series with MLX Metal acceleration.
Input Fit
- YouTube URLs.
- Bilibili URLs.
- Douyin URLs.
- Xiaoyuzhou podcast URLs.
- Local audio files such as MP3, M4A, WAV, AAC, FLAC, OGG, and OPUS.
- Local video files such as MP4, MOV, MKV, AVI, WEBM, FLV, and M4V.
Invocation
Call
transcribe.py's sync transcribe(url_or_path: str) function with the original URL or local file path:
result = transcribe("https://www.youtube.com/watch?v=example")
Successful calls return:
{ "ok": True, "raw_txt": "...", "subtitle_srt": "1\n00:00:00,000 --> 00:00:03,100\n...\n", "meta": { "language": "en", "duration_sec": 123.4, "model": "mlx-community/whisper-large-v3-turbo", "backend": "mlx-whisper", }, "audio_path": "/tmp/autosearch-video-to-text-local-.../audio.mp3", "source": "https://www.youtube.com/watch?v=example", }
Failure Modes
- Missing
package returnsmlx-whisper
(install withreason: mlx_whisper_unavailable
, Apple Silicon only).pip install mlx-whisper - Unsupported URLs, network failures, and non-zero
exits returnyt-dlp
.reason: yt_dlp_failed - Missing local
for video conversion returnsffmpeg
.reason: ffmpeg_missing - Model download failure (no network on first run) returns
.reason: model_download_failed - Other MLX / transcription runtime errors return
.reason: mlx_whisper_runtime_error
Downgrade chain: use
video-to-text-groq as the free cloud fallback (requires GROQ_API_KEY), then video-to-text-openai as the paid cloud fallback.
Limits
- Requires Apple Silicon (M1 / M2 / M3 / M4). Not compatible with Intel Macs, Linux, or Windows.
- Requires
package installed separately:mlx-whisper
(not a default autosearch dependency because it's macOS/ARM-only).pip install mlx-whisper - First run downloads the Whisper model to
(~3 GB for large-v3-turbo).~/.cache/huggingface/hub/ - Default model:
. Override viamlx-community/whisper-large-v3-turbo
env var to any HuggingFace repo or local model path.AUTOSEARCH_MLX_WHISPER_MODEL
This tool does not produce a summary. It returns
raw_txt and subtitles so the runtime AI can decide how to process, summarize, quote, or cite the transcript.
Quality Bar
- Evidence items have non-empty title and url.
- No crash on empty or malformed API response.
- Source channel field matches the channel name.