Antigravity-awesome-skills pipecat-friday-agent

Build a low-latency, Iron Man-inspired tactical voice assistant (F.R.I.D.A.Y.) using Pipecat, Gemini, and OpenAI.

install

source · Clone the upstream repo

git clone https://github.com/sickn33/antigravity-awesome-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/sickn33/antigravity-awesome-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/antigravity-awesome-skills-claude/skills/pipecat-friday-agent" ~/.claude/skills/sickn33-antigravity-awesome-skills-pipecat-friday-agent && rm -rf "$T"

manifest: plugins/antigravity-awesome-skills-claude/skills/pipecat-friday-agent/SKILL.md

Pipecat Friday Agent

Overview

This skill provides a blueprint for building F.R.I.D.A.Y. (Replacement Integrated Digital Assistant Youth), a local voice assistant inspired by the tactical AI from the Iron Man films. It uses the Pipecat framework to orchestrate a low-latency pipeline:

STT: OpenAI Whisper (
```
whisper-1
```
) or
```
gpt-4o-transcribe
```
LLM: Google Gemini 2.5 Flash (via a compatibility shim)
TTS: OpenAI TTS (
```
nova
```
voice)
Transport: Local Audio (Hardware Mic/Speakers)

When to Use This Skill

Use when you want to build a real-time, conversational voice agent.
Use when working with the Pipecat framework for pipeline-based AI.
Use when you need to integrate multiple providers (Google and OpenAI) into a single voice loop.
Use when building Iron Man-themed or tactical-themed voice applications.

How It Works

Step 1: Install Dependencies

You will need the Pipecat framework and its service providers installed:

pip install pipecat-ai[openai,google,silero] python-dotenv

Step 2: Configure Environment

Create a

.env

file with your API keys:

OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key

Step 3: Run the Agent

Execute the provided Python script to start the interface:

python scripts/friday_agent.py

Core Concepts

Pipeline Architecture

The agent follows a linear pipeline:

Mic -> VAD -> STT -> LLM -> TTS -> Speaker

. This allows for granular control over each stage, unlike end-to-end speech-to-speech models.

Google Compatibility Shim

Since Google's Gemini API has a different message format than OpenAI's standard (which Pipecat aggregators expect), the script includes a

GoogleSafeContext

and

GoogleSafeMessage

class to bridge the gap.

Best Practices

✅ Use Silero VAD: It is robust for local hardware and prevents background noise from triggering the LLM.
✅ Concise Prompts: Tactical agents should give short, data-dense responses to minimize latency.
✅ Sample Rate Match: OpenAI TTS outputs at 24kHz; ensure your
```
audio_out_sample_rate
```
matches to avoid high-pitched or slowed audio.
❌ No Polite Fillers: Avoid "Hello, how can I help you today?" Instead, use "Systems nominal. Ready for commands."

Troubleshooting

Problem: Audio is choppy or delayed.
- Solution: Check your
```
OUTPUT_DEVICE
```
  index. Run a script like
```
test_audio_output.py
```
  to find the correct hardware index for your OS.
Problem: "Validation error" for message format.
- Solution: Ensure the
```
GoogleSafeContext
```
  shim is correctly translating OpenAI-style dicts to Gemini-style schema.

Related Skills

```
@voice-agents
```
- General principles of voice AI.
```
@agent-tool-builder
```
- Add tools (Search, Lights, etc.) to your Friday agent.
```
@llm-architect
```
- Optimizing the LLM layer.

Limitations

Use this skill only when the task clearly matches the scope described above.
Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.