Learn-skills.dev speech-to-text
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation, multi-language, timestamps. Use for: meeting transcription, subtitles, podcast transcripts, voice notes. Triggers: speech to text, transcription, whisper, audio to text, transcribe audio, voice to text, stt, automatic transcription, subtitles generation, transcribe meeting, audio transcription, whisper ai
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1nfsh/s/speech-to-text" ~/.claude/skills/neversight-learn-skills-dev-speech-to-text-7fc9bd && rm -rf "$T"
manifest:
data/skills-md/1nfsh/s/speech-to-text/SKILL.mdsource content
Speech-to-Text
Transcribe audio to text via inference.sh CLI.

Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
Available Models
| Model | App ID | Best For |
|---|---|---|
| Fast Whisper V3 | | Fast transcription |
| Whisper V3 Large | | Highest accuracy |
Examples
Basic Transcription
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'
With Timestamps
infsh app sample infsh/fast-whisper-large-v3 --save input.json # { # "audio_url": "https://podcast.mp3", # "timestamps": true # } infsh app run infsh/fast-whisper-large-v3 --input input.json
Translation (to English)
infsh app run infsh/whisper-v3-large --input '{ "audio_url": "https://french-audio.mp3", "task": "translate" }'
From Video
# Extract audio from video first infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json # Transcribe the extracted audio infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'
Workflow: Video Subtitles
# 1. Transcribe video audio infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json # 2. Use transcript for captions infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
Supported Languages
Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.
Use Cases
- Meetings: Transcribe recordings
- Podcasts: Generate transcripts
- Subtitles: Create captions for videos
- Voice Notes: Convert to searchable text
- Interviews: Transcription for research
- Accessibility: Make audio content accessible
Output Format
Returns JSON with:
: Full transcriptiontext
: Timestamped segments (if requested)segments
: Detected languagelanguage
Related Skills
# Full platform skill (all 150+ apps) npx skills add inference-sh/skills@inference-sh # Text-to-speech (reverse direction) npx skills add inference-sh/skills@text-to-speech # Video generation (add captions) npx skills add inference-sh/skills@ai-video-generation # AI avatars (lipsync with transcripts) npx skills add inference-sh/skills@ai-avatar-video
Browse all audio apps:
infsh app list --category audio
Documentation
- Running Apps - How to run apps via CLI
- Audio Transcription Example - Complete transcription guide
- Apps Overview - Understanding the app ecosystem