AGENTS-COLLECTION add-voice-transcription

Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

install

source · Clone the upstream repo

git clone https://github.com/mk-knight23/AGENTS-COLLECTION

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mk-knight23/AGENTS-COLLECTION "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SKILLS/NANOCLAW/ADD-VOICE-TRANSCRIPTION" ~/.claude/skills/mk-knight23-agents-collection-add-voice-transcription && rm -rf "$T"

manifest: SKILLS/NANOCLAW/ADD-VOICE-TRANSCRIPTION/SKILL.md

source content

Add Voice Transcription

This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as

[Voice: <transcript>]

Phase 1: Pre-flight

Check if already applied

Read

.nanoclaw/state.yaml

. If

voice-transcription

is in

applied_skills

, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

Use

AskUserQuestion

to collect information:

AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?

If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Run the skills engine to apply this skill's code package.

Initialize skills system (if needed)

.nanoclaw/

directory doesn't exist yet:

npx tsx scripts/apply-skill.ts --init

Apply the skill

npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription

This deterministically:

Adds
```
src/transcription.ts
```
(voice transcription module using OpenAI Whisper)
Three-way merges voice handling into
```
src/channels/whatsapp.ts
```
(isVoiceMessage check, transcribeAudioMessage call)
Three-way merges transcription tests into
```
src/channels/whatsapp.test.ts
```
(mock + 3 test cases)
Installs the
```
openai
```
npm dependency
Updates
```
.env.example
```
with
```
OPENAI_API_KEY
```
Records the application in
```
.nanoclaw/state.yaml
```

If the apply reports merge conflicts, read the intent files:

```
modify/src/channels/whatsapp.ts.intent.md
```
— what changed and invariants for whatsapp.ts

modify/src/channels/whatsapp.test.ts.intent.md

— what changed for whatsapp.test.ts

Validate code changes

npm test
npm run build

All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:
Go to https://platform.openai.com/api-keys

Click "Create new secret key"

Give it a name (e.g., "NanoClaw Transcription")
Copy the key (starts with
sk-
)
Cost: ~~$0.006 per minute of audio (~~$0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to

.env

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from

data/env/env

, not

.env

directly.

Build and restart

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw  # macOS
# Linux: systemctl --user restart nanoclaw

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]
and respond to its content.

Check logs if needed

tail -f logs/nanoclaw.log | grep -i voice

Look for:

```
Transcribed voice message
```
— successful transcription with character count
```
OPENAI_API_KEY not set
```
— key missing from
```
.env
```
```
OpenAI transcription failed
```
— API error (check key validity, billing)
```
Failed to download audio message
```
— media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

Check
```
OPENAI_API_KEY
```
is set in
```
.env
```
AND synced to
```
data/env/env
```

Verify key works:

curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200

Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Network timeout — transient, will work on next message
Invalid API key — regenerate at https://platform.openai.com/api-keys
Rate limiting — wait and retry

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.