Skilllibrary transcribe
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/11-ai-llm-runtime-and-integration/transcribe" ~/.claude/skills/merceralex397-collab-skilllibrary-transcribe && rm -rf "$T"
manifest:
11-ai-llm-runtime-and-integration/transcribe/SKILL.mdsource content
Source: https://github.com/openai/skills/tree/main/skills/.curated skills/.curated/transcribe
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
- Verify
is set. If missing, ask the user to set it locally (do not ask them to paste the key).OPENAI_API_KEY - Run the bundled
CLI with sensible defaults (fast text transcription).transcribe_diarize.py - Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
- Save outputs under
when working in this repo.output/transcribe/
Decision rules
- Default to
withgpt-4o-mini-transcribe
for fast transcription.--response-format text - If the user wants speaker labels or diarization, use
.--model gpt-4o-transcribe-diarize --response-format diarized_json - If audio is longer than ~30 seconds, keep
.--chunking-strategy auto - Prompting is not supported for
.gpt-4o-transcribe-diarize
Output conventions
- Use
for evaluation runs.output/transcribe/<job-id>/ - Use
for multiple files to avoid overwriting.--out-dir
Dependencies (install if missing)
Prefer
uv for dependency management.
uv pip install openai
If
uv is unavailable:
python3 -m pip install openai
Environment
must be set for live API calls.OPENAI_API_KEY- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
- Never ask the user to paste the full key in chat.
Skill path (set once)
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}" export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
User-scoped skills install under
$CODEX_HOME/skills (default: ~/.codex/skills).
CLI quick start
Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \ path/to/audio.wav \ --out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \ meeting.m4a \ --model gpt-4o-transcribe-diarize \ --known-speaker "Alice=refs/alice.wav" \ --known-speaker "Bob=refs/bob.wav" \ --response-format diarized_json \ --out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \ interview.mp3 \ --response-format text \ --out interview.txt
Reference map
: supported formats, limits, response formats, and known-speaker notes.references/api.md