Commonly-used-high-value-skills transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

install
source · Clone the upstream repo
git clone https://github.com/seaworld008/Commonly-used-high-value-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/seaworld008/Commonly-used-high-value-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/openclaw-skills/transcribe" ~/.claude/skills/seaworld008-commonly-used-high-value-skills-transcribe && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/seaworld008/Commonly-used-high-value-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/openclaw-skills/transcribe" ~/.openclaw/skills/seaworld008-commonly-used-high-value-skills-transcribe && rm -rf "$T"
manifest: openclaw-skills/transcribe/SKILL.md
source content

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

  1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
  2. Verify
    OPENAI_API_KEY
    is set. If missing, ask the user to set it locally (do not ask them to paste the key).
  3. Run the bundled
    transcribe_diarize.py
    CLI with sensible defaults (fast text transcription).
  4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
  5. Save outputs under
    output/transcribe/
    when working in this repo.

Decision rules

  • Default to
    gpt-4o-mini-transcribe
    with
    --response-format text
    for fast transcription.
  • If the user wants speaker labels or diarization, use
    --model gpt-4o-transcribe-diarize --response-format diarized_json
    .
  • If audio is longer than ~30 seconds, keep
    --chunking-strategy auto
    .
  • Prompting is not supported for
    gpt-4o-transcribe-diarize
    .

Output conventions

  • Use
    output/transcribe/<job-id>/
    for evaluation runs.
  • Use
    --out-dir
    for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer

uv
for dependency management.

uv pip install openai

If

uv
is unavailable:

python3 -m pip install openai

Environment

  • OPENAI_API_KEY
    must be set for live API calls.
  • If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
  • Never ask the user to paste the full key in chat.

Skill path (set once)

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under

$CODEX_HOME/skills
(default:
~/.codex/skills
).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

  • references/api.md
    : supported formats, limits, response formats, and known-speaker notes.