Awesome-omni-skill gemini-api

Google Gemini API integration for building AI-powered applications. Use when working with Google's Gemini API, Python SDK (google-genai), TypeScript SDK (@google/genai), multimodal inputs (image, video, audio, PDF), thinking/reasoning features, streaming responses, structured outputs with JSON schemas, multi-turn chat, system instructions, image generation (Nano Banana), video generation (Veo), music generation (Lyria), embeddings, document/PDF processing, or any Gemini API integration task. Triggers on mentions of Gemini, Gemini 3, Gemini 2.5, Google AI, Nano Banana, Veo, Lyria, google-genai, or @google/genai SDK usage.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/gemini-api-diskd-ai" ~/.claude/skills/diegosouzapw-awesome-omni-skill-gemini-api-b1b4f4 && rm -rf "$T"

manifest: skills/development/gemini-api-diskd-ai/SKILL.md

source content

Gemini API

Generate text from text, images, video, and audio using Google's Gemini API.

Models

Model	Code	I/O	Context	Thinking
Gemini 3 Pro	`gemini-3-pro-preview`	Text/Image/Video/Audio/PDF -> Text	1M/64K	Yes
Gemini 3 Flash	`gemini-3-flash-preview`	Text/Image/Video/Audio/PDF -> Text	1M/64K	Yes
Gemini 2.5 Pro	`gemini-2.5-pro`	Text/Image/Video/Audio/PDF -> Text	1M/65K	Yes
Gemini 2.5 Flash	`gemini-2.5-flash`	Text/Image/Video/Audio -> Text	1M/65K	Yes
Nano Banana	`gemini-2.5-flash-image`	Text/Image -> Image	-	No
Nano Banana Pro	`gemini-3-pro-image-preview`	Text/Image -> Image (up to 4K)	65K/32K	Yes
Veo 3.1	`veo-3.1-generate-preview`	Text/Image/Video -> Video+Audio	-	-
Veo 3	`veo-3-generate-preview`	Text/Image -> Video+Audio	-	-
Veo 2	`veo-2.0-generate-001`	Text/Image -> Video (silent)	-	-
Lyria RealTime	`lyria-realtime-exp`	Text -> Music (streaming)	-	-
Embeddings	`gemini-embedding-001`	Text -> Embeddings	2K	No

Free Tier: Flash models only (no free tier for

gemini-3-pro-preview

in API). Default Temperature: 1.0 (do not change for Gemini 3).

Pricing (per 1M tokens):

Gemini 3 Pro: $2/$12 (<200k), $4/$18 (>200k)
Gemini 3 Flash: $0.50/$3
Nano Banana Pro: $2 (text) / $0.134 (image)

Basic Text Generation

Python

from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="How does AI work?"
)
print(response.text)

JavaScript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "How does AI work?",
});
console.log(response.text);

REST

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "How does AI work?"}]}]}'

System Instructions

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    config=types.GenerateContentConfig(
        system_instruction="You are a helpful assistant."
    ),
    contents="Hello"
)

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Hello",
  config: { systemInstruction: "You are a helpful assistant." },
});

Streaming

for chunk in client.models.generate_content_stream(
    model="gemini-3-flash-preview",
    contents="Tell me a story"
):
    print(chunk.text, end="")

const response = await ai.models.generateContentStream({
  model: "gemini-3-flash-preview",
  contents: "Tell me a story",
});
for await (const chunk of response) {
  console.log(chunk.text);
}

Multi-turn Chat

chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("I have 2 dogs.")
print(response.text)
response = chat.send_message("How many paws total?")
print(response.text)

const chat = ai.chats.create({ model: "gemini-3-flash-preview" });
const response = await chat.sendMessage({ message: "I have 2 dogs." });
console.log(response.text);

Multimodal (Image)

from PIL import Image

image = Image.open("/path/to/image.png")
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[image, "Describe this image"]
)

const image = await ai.files.upload({ file: "/path/to/image.png" });
const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: [
    createUserContent([
      "Describe this image",
      createPartFromUri(image.uri, image.mimeType),
    ]),
  ],
});

Document Processing (PDF)

Process PDFs with native vision understanding (up to 1000 pages).

from google.genai import types
import pathlib

filepath = pathlib.Path('document.pdf')
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[
        types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
        "Summarize this document"
    ]
)

import * as fs from 'fs';

const response = await ai.models.generateContent({
    model: "gemini-3-flash-preview",
    contents: [
        { text: "Summarize this document" },
        {
            inlineData: {
                mimeType: 'application/pdf',
                data: Buffer.from(fs.readFileSync("document.pdf")).toString("base64")
            }
        }
    ]
});

For large PDFs, use Files API (stored 48 hours):

uploaded_file = client.files.upload(file=pathlib.Path('large.pdf'))
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[uploaded_file, "Summarize this document"]
)

See references/documents.md for Files API, multiple PDFs, and best practices.

Image Generation (Nano Banana)

Generate and edit images conversationally.

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Create a picture of a sunset over mountains",
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("generated.png")

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image",
  contents: "Create a picture of a sunset over mountains",
});

for (const part of response.candidates[0].content.parts) {
  if (part.inlineData) {
    const buffer = Buffer.from(part.inlineData.data, "base64");
    fs.writeFileSync("generated.png", buffer);
  }
}

Nano Banana Pro (

gemini-3-pro-image-preview

): 4K output, Google Search grounding, up to 14 reference images, conversational editing with thought signatures.

See references/image-generation.md for editing, multi-turn, and advanced features. See references/gemini-3.md for Gemini 3 image capabilities.

Video Generation (Veo)

Generate 8-second 720p, 1080p, or 4K videos with native audio using Veo.

import time
from google import genai

client = genai.Client()

operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="A cinematic shot of a majestic lion in the savannah at golden hour",
)

# Poll until complete (video generation is async)
while not operation.done:
    time.sleep(10)
    operation = client.operations.get(operation)

# Download the video
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("lion.mp4")

let operation = await ai.models.generateVideos({
    model: "veo-3.1-generate-preview",
    prompt: "A cinematic shot of a majestic lion in the savannah at golden hour",
});

while (!operation.done) {
    await new Promise(resolve => setTimeout(resolve, 10000));
    operation = await ai.operations.getVideosOperation({ operation });
}

ai.files.download({
    file: operation.response.generatedVideos[0].video,
    downloadPath: "lion.mp4",
});

Veo 3.1 features: Portrait (9:16), video extension (up to 148s), 4K resolution, native audio with dialogue/SFX.

See references/veo.md for image-to-video, reference images, video extension, and prompting guide.

Music Generation (Lyria RealTime)

Generate continuous instrumental music in real-time with dynamic steering.

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def main():
    async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
        # Set prompts and config
        await session.set_weighted_prompts(
            prompts=[types.WeightedPrompt(text='minimal techno', weight=1.0)]
        )
        await session.set_music_generation_config(
            config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
        )

        # Start streaming
        await session.play()

        # Receive audio chunks
        async for message in session.receive():
            if message.server_content and message.server_content.audio_chunks:
                audio_data = message.server_content.audio_chunks[0].data
                # Process audio...

asyncio.run(main())

const session = await ai.live.music.connect({
    model: "models/lyria-realtime-exp",
    callbacks: {
        onmessage: (message) => {
            if (message.serverContent?.audioChunks) {
                for (const chunk of message.serverContent.audioChunks) {
                    const audioBuffer = Buffer.from(chunk.data, "base64");
                    // Process audio...
                }
            }
        },
    },
});

await session.setWeightedPrompts({
    weightedPrompts: [{ text: "minimal techno", weight: 1.0 }],
});

await session.setMusicGenerationConfig({
    musicGenerationConfig: { bpm: 90, temperature: 1.0 },
});

await session.play();

Output: 48kHz stereo 16-bit PCM. Instrumental only. Configurable BPM, scale, density, brightness.

See references/lyria.md for steering music, configuration, and prompting guide.

Embeddings

Generate text embeddings for semantic similarity, search, and classification.

result = client.models.embed_content(
    model="gemini-embedding-001",
    contents="What is the meaning of life?"
)
print(result.embeddings)

const response = await ai.models.embedContent({
    model: 'gemini-embedding-001',
    contents: 'What is the meaning of life?',
});
console.log(response.embeddings);

Task types:

SEMANTIC_SIMILARITY

CLASSIFICATION

CLUSTERING

RETRIEVAL_DOCUMENT

RETRIEVAL_QUERY

Output dimensions: 768, 1536, 3072 (default)

See references/embeddings.md for batch processing, task types, and normalization.

Thinking (Gemini 3)

Control reasoning depth with

thinking_level

minimal

(Flash only),

low

medium

(Flash only),

high

(default).

from google.genai import types

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Solve this math problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)

import { ThinkingLevel } from "@google/genai";

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Solve this math problem...",
  config: { thinkingConfig: { thinkingLevel: ThinkingLevel.HIGH } },
});

Note: Cannot mix

thinking_level

with legacy

thinking_budget

(returns 400 error).

For Gemini 2.5, use

thinking_budget

(0-32768) instead. See references/thinking.md.

For complete Gemini 3 features (thought signatures, media resolution, etc.), see references/gemini-3.md.

Structured Outputs

Generate JSON responses adhering to a schema.

from pydantic import BaseModel
from typing import List

class Recipe(BaseModel):
    name: str
    ingredients: List[str]

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract: chocolate chip cookies need flour, sugar, chips",
    config={
        "response_mime_type": "application/json",
        "response_json_schema": Recipe.model_json_schema(),
    },
)
recipe = Recipe.model_validate_json(response.text)

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const recipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
});

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "Extract: chocolate chip cookies need flour, sugar, chips",
  config: {
    responseMimeType: "application/json",
    responseJsonSchema: zodToJsonSchema(recipeSchema),
  },
});

See references/structured-outputs.md for advanced patterns.

Built-in Tools (Gemini 3)

Available: Google Search, File Search, Code Execution, URL Context, Function Calling

Not supported: Google Maps grounding, Computer Use (use Gemini 2.5 for these)

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="What's the latest news on AI?",
    config={"tools": [{"google_search": {}}]},
)

const response = await ai.models.generateContent({
  model: "gemini-3-pro-preview",
  contents: "What's the latest news on AI?",
  config: { tools: [{ googleSearch: {} }] },
});

Structured outputs + tools: Gemini 3 supports combining JSON schemas with built-in tools (Google Search, URL Context, Code Execution). See references/gemini-3.md.

See references/tools.md for all tool patterns.

Function Calling

Connect models to external tools and APIs. The model determines when to call functions and provides parameters.

from google.genai import types

# Define function
get_weather = {
    "name": "get_weather",
    "description": "Get weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
}

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="What's the weather in Tokyo?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(function_declarations=[get_weather])]
    ),
)

# Check for function call
if response.function_calls:
    fc = response.function_calls[0]
    print(f"Call {fc.name} with {fc.args}")

const response = await ai.models.generateContent({
  model: "gemini-3-flash-preview",
  contents: "What's the weather in Tokyo?",
  config: {
    tools: [{ functionDeclarations: [getWeather] }],
  },
});

if (response.functionCalls) {
  const { name, args } = response.functionCalls[0];
  // Execute function and send result back
}

Automatic function calling (Python): Pass functions directly as tools for automatic execution.

See references/function-calling.md for execution modes, compositional calling, multimodal responses, MCP integration, and best practices.

Quick Reference

Feature	Python	JavaScript
Generate	`generate_content()`	`generateContent()`
Stream	`generate_content_stream()`	`generateContentStream()`
Chat	`chats.create()`	`chats.create()`
Structured	`response_json_schema=`	`responseJsonSchema:`
Image Gen	`gemini-2.5-flash-image`	`gemini-2.5-flash-image`
Video Gen	`generate_videos()`	`generateVideos()`
Music Gen	`live.music.connect()`	`live.music.connect()`
Function Call	`function_declarations`	`functionDeclarations`
Embeddings	`embed_content()`	`embedContent()`
Files API	`files.upload()`	`files.upload()`

Gemini 3 Specific Features

For advanced Gemini 3 features, see references/gemini-3.md:

Thinking levels: Control reasoning depth (
```
minimal
```
,
```
low
```
,
```
medium
```
,
```
high
```
)
Media resolution: Fine-grained multimodal processing (
```
media_resolution_low
```
to
```
ultra_high
```
)
Thought signatures: Required for function calling and image editing context
Structured outputs + tools: Combine JSON schemas with Google Search, URL Context
Multimodal function responses: Return images in tool responses