Skills gemini-voice-assistant

Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI assistant powered by Google's Gemini models.

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alimostafaradwan/gemini-voice-assistant" ~/.claude/skills/openclaw-skills-gemini-voice-assistant && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/alimostafaradwan/gemini-voice-assistant" ~/.openclaw/skills/openclaw-skills-gemini-voice-assistant && rm -rf "$T"
manifest: skills/alimostafaradwan/gemini-voice-assistant/SKILL.md
source content

Gemini Voice Assistant

A voice-to-voice AI assistant powered by Google's Gemini Live API. Speak to the AI and it responds with natural-sounding voice.

Usage

Text Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py "Your question or message"

Voice Mode

cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py --audio /path/to/audio.ogg "optional context"

Response Format

The handler returns a JSON response:

{
  "message": "[[audio_as_voice]]\nMEDIA:/tmp/gemini_voice_xxx.ogg",
  "text": "Text response from Gemini"
}

Configuration

Set your Gemini API key:

export GEMINI_API_KEY="your-api-key-here"

Or create a

.env
file in the skill directory:

GEMINI_API_KEY=your-api-key-here

Model Options

The default model is

gemini-2.5-flash-native-audio-preview-12-2025
for audio support.

To use a different model, edit

handler.py
:

MODEL = "gemini-2.0-flash-exp"  # For text-only

Requirements

  • google-genai>=1.0.0
  • numpy>=1.24.0
  • soundfile>=0.12.0
  • librosa>=0.10.0
    (for audio input)
  • FFmpeg (for audio conversion)

Features

  • 🎙️ Voice input/output support
  • 💬 Text conversations
  • 🔧 Configurable system instructions
  • ⚡ Fast responses with Gemini Flash