Skills ai-provider-google-gemini-sdk
Official TypeScript SDK for Google Gemini — client setup, text generation, multimodal input, function calling, structured output, streaming, embeddings, context caching, and chat sessions
git clone https://github.com/agents-inc/skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/src/skills/ai-provider-google-gemini-sdk" ~/.claude/skills/agents-inc-skills-ai-provider-google-gemini-sdk-518428 && rm -rf "$T"
src/skills/ai-provider-google-gemini-sdk/SKILL.mdGoogle Gemini SDK Patterns
Quick Guide: Use the
package (the unified SDK, NOT the deprecated@google/genai) for all Gemini API interactions. All operations flow through a central@google/generative-aiclient with service accessors:GoogleGenAIfor generation,ai.modelsfor multi-turn,ai.chatsfor uploads,ai.filesfor context caching. Useai.cacheswithresponseMimeType: "application/json"for structured output. Access response text viaresponseJsonSchema(property, not method). Streaming usesresponse.textreturning an async iterable -- iterate withgenerateContentStream.for await
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use
(the new unified SDK) -- NOT the deprecated @google/genai
package)@google/generative-ai
(You MUST access response text via
(a property) -- NOT response.text
(the old SDK used a method call))response.text()
(You MUST pass
as a string parameter in every API call -- there is no model
step)getGenerativeModel()
(You MUST use
for all generation parameters (temperature, safetySettings, tools, systemInstruction) -- NOT top-level properties)config
(You MUST never hardcode API keys -- use environment variables via
or process.env.GEMINI_API_KEY
)GOOGLE_API_KEY
</critical_requirements>
Auto-detection: Gemini, gemini, GoogleGenAI, @google/genai, ai.models.generateContent, generateContentStream, ai.chats, ai.files, ai.caches, gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-3-flash, gemini-embedding, GEMINI_API_KEY, GOOGLE_API_KEY, FunctionCallingConfigMode, createUserContent, createPartFromUri, responseMimeType, responseJsonSchema
When to use:
- Building applications that call Google Gemini models directly (Gemini 2.x, 2.5, 3.x)
- Processing multimodal input: images, video, audio, PDFs
- Implementing function calling / tool use with custom functions or built-in tools (Google Search, code execution)
- Extracting structured JSON data from LLM responses using response schemas
- Streaming text generation for user-facing output
- Creating embeddings for RAG pipelines or semantic search (text and multimodal)
- Caching large context (documents, code) to reduce cost and latency across multiple requests
- Multi-turn chat sessions with automatic history management
Key patterns covered:
- Client initialization and environment-based configuration
- Text generation with
ai.models.generateContent() - Streaming with
andai.models.generateContentStream()for await - Multimodal input (inline base64, file upload, URIs)
- Function calling with
and manual tool loopsFunctionDeclaration - Structured output with
+responseMimeType
+ ZodresponseJsonSchema - Chat sessions with
andai.chats.create()sendMessage() - Embeddings with
(text and multimodal)ai.models.embedContent() - Context caching with
ai.caches.create() - Safety settings per-request via
config.safetySettings
When NOT to use:
- Multi-provider applications requiring provider switching -- use a unified provider SDK
- React-specific chat UI hooks (
) -- use a framework-integrated AI SDKuseChat - When you need features unique to another provider's API -- use that provider's SDK directly
Examples Index
- Core: Setup & Configuration -- Client init, text generation, system instructions, error handling
- Multimodal Input -- Inline images, file upload, video, audio, PDF,
createPartFromUri - Streaming --
,generateContentStream
, abort patternssendMessageStream - Function Calling / Tools --
,FunctionDeclaration
, manual tool loop, built-in toolsFunctionCallingConfigMode - Structured Output -- JSON mode, Zod schemas,
, enum extractionresponseJsonSchema - Chat Sessions --
, multi-turn, streaming chat, historyai.chats.create() - Advanced: Embeddings, Caching & Safety -- Embeddings, context caching, safety settings, token counting
- Quick API Reference -- Model IDs, method signatures, config parameters, safety enums
<philosophy>
Philosophy
The
@google/genai SDK is Google's unified client for the Gemini API and Vertex AI. It replaces the deprecated @google/generative-ai package with a cleaner, centralized architecture.
Core principles:
- Centralized client -- A single
instance provides all API services viaGoogleGenAI
,ai.models
,ai.chats
,ai.files
. No scattered manager classes.ai.caches - Model-per-call -- Pass the model ID string in every API call rather than binding to a model instance. This simplifies multi-model usage.
- Config object pattern -- All generation parameters (
,temperature
,systemInstruction
,tools
) go inside asafetySettings
object, keeping the top-level call clean.config - Native multimodal -- Images, video, audio, and PDFs are first-class inputs via inline data or file upload. Gemini models handle all modalities natively.
- Response as property -- Access
as a property (not a method). Accessresponse.text
for tool calls.response.functionCalls
When to use the Gemini SDK directly:
- You primarily use Google Gemini models
- You need multimodal input (images, video, audio, PDF) as a core feature
- You want built-in tools like Google Search and code execution
- You need context caching for large documents
- You want the simplest path to Gemini API features
When NOT to use:
- You need to switch between multiple providers -- use a unified SDK
- You want React-specific chat hooks -- use a framework-integrated AI SDK
- You need features unique to another provider's API -- use that provider's SDK directly
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize the
GoogleGenAI client. It can auto-read GOOGLE_API_KEY from the environment.
// lib/gemini.ts import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); export { ai };
// Auto-reads GOOGLE_API_KEY from environment const ai = new GoogleGenAI({});
Why good: Minimal setup, env var auto-detected, named export
// BAD: Using the old deprecated SDK import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI("hardcoded-key"); // WRONG const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
Why bad: Old deprecated package, hardcoded API key, model binding step no longer needed
See: examples/core.md for Vertex AI setup, environment variables, error handling
Pattern 2: Text Generation
Pass model and contents directly -- no
getGenerativeModel() step.
const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "Explain TypeScript generics briefly.", config: { systemInstruction: "You are a concise coding tutor.", temperature: 0.3, }, }); console.log(response.text);
Why good: Model specified per-call, system instruction in config,
response.text as property
// BAD: Old SDK patterns that don't work const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" }); const result = await model.generateContent("Hello"); console.log(result.response.text()); // text() was a method in old SDK
Why bad:
getGenerativeModel() doesn't exist in new SDK, text() is a property not a method
See: examples/core.md for system instructions, temperature, thinking config
Pattern 3: Streaming
Use
generateContentStream and iterate with for await.
const response = await ai.models.generateContentStream({ model: "gemini-2.5-flash", contents: "Write a short story about a robot.", }); for await (const chunk of response) { process.stdout.write(chunk.text ?? ""); }
Why good: Async iteration, progressive output for better UX, null-safe text access
// BAD: Not consuming the stream const stream = await ai.models.generateContentStream({ model: "gemini-2.5-flash", contents: "Hello", }); // Stream object returned but never iterated -- response lost
Why bad: Stream must be consumed via
for await, otherwise response is silently lost
See: examples/streaming.md for chat streaming, abort patterns
Pattern 4: Multimodal Input
Gemini natively handles images, video, audio, and PDFs. Use inline base64 or file upload.
import * as fs from "node:fs"; const imageBase64 = fs.readFileSync("photo.jpg", { encoding: "base64" }); const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: [ { text: "Describe this image in detail." }, { inlineData: { mimeType: "image/jpeg", data: imageBase64 } }, ], }); console.log(response.text);
Why good: Inline base64 for small files, explicit MIME type, mixed text + media content
For larger files (over ~20 MB), use
ai.files.upload() and reference via URI:
import { createUserContent, createPartFromUri } from "@google/genai"; const uploadedFile = await ai.files.upload({ file: "large-video.mp4", config: { mimeType: "video/mp4" }, }); const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: createUserContent([ "Summarize this video.", createPartFromUri(uploadedFile.uri, uploadedFile.mimeType), ]), });
See: examples/multimodal.md for audio, PDF, video timestamps, file lifecycle
Pattern 5: Function Calling / Tool Use
Define function declarations with
parametersJsonSchema, then handle tool calls in a loop.
import { GoogleGenAI, FunctionCallingConfigMode } from "@google/genai"; const getWeatherDeclaration = { name: "get_weather", description: "Get current weather for a city", parametersJsonSchema: { type: "object", properties: { location: { type: "string", description: "City name" }, }, required: ["location"], }, }; const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "What is the weather in Tokyo?", config: { tools: [{ functionDeclarations: [getWeatherDeclaration] }], toolConfig: { functionCallingConfig: { mode: FunctionCallingConfigMode.AUTO }, }, }, }); if (response.functionCalls && response.functionCalls.length > 0) { const call = response.functionCalls[0]; console.log(`Call: ${call.name}`, call.args); }
Why good: Typed declarations, explicit mode, args directly accessible
See: examples/tools.md for multi-turn tool loop, parallel calls, built-in Google Search
Pattern 6: Structured Output with Zod
Use
responseMimeType: "application/json" with responseJsonSchema for guaranteed JSON output.
import { z } from "zod"; import { zodToJsonSchema } from "zod-to-json-schema"; const EventSchema = z.object({ name: z.string(), date: z.string(), participants: z.array(z.string()), }); const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "Alice and Bob meet next Tuesday for lunch.", config: { responseMimeType: "application/json", responseJsonSchema: zodToJsonSchema(EventSchema), }, }); const event = EventSchema.parse(JSON.parse(response.text ?? "{}"));
Why good: Zod schema for type safety,
zodToJsonSchema converts automatically, parse validates response
See: examples/structured-output.md for enum output, complex schemas, raw JSON schema
Pattern 7: Chat Sessions
Use
ai.chats.create() for multi-turn with automatic history management.
const chat = ai.chats.create({ model: "gemini-2.5-flash", config: { systemInstruction: "You are a helpful coding assistant.", }, }); const r1 = await chat.sendMessage({ message: "What is TypeScript?" }); console.log(r1.text); const r2 = await chat.sendMessage({ message: "How do generics work?" }); console.log(r2.text); // Chat automatically maintains conversation history
Why good: Automatic history tracking, system instruction in config, simple
sendMessage API
See: examples/chat.md for pre-loaded history, streaming chat, function calling in chat
Pattern 8: Error Handling
Check
response.text for null and inspect finishReason for safety blocks.
const response = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: userPrompt, }); // Check if content was blocked by safety filters const candidate = response.candidates?.[0]; if (!candidate || candidate.finishReason === "SAFETY") { console.warn("Response blocked by safety filters"); console.warn("Safety ratings:", candidate?.safetyRatings); return null; } // Check for truncation if (candidate.finishReason === "MAX_TOKENS") { console.warn("Response was truncated -- consider increasing maxOutputTokens"); } console.log(response.text);
Why good: Checks safety filter blocks, handles truncation, inspects finish reason
See: examples/core.md for try/catch error handling, HTTP error codes
</patterns><performance>
Performance Optimization
Model Selection for Cost/Speed
General purpose -> gemini-2.5-flash (best price/performance balance) Complex reasoning -> gemini-2.5-pro (deep reasoning, 1M context) Budget / high-volume -> gemini-2.5-flash-lite (cheapest, fastest) Latest frontier -> gemini-3-flash-preview (most capable, preview) Embeddings (text) -> gemini-embedding-001 (text only, stable) Embeddings (multimodal) -> gemini-embedding-2-preview (text, image, video, audio, PDF)
Key Optimization Patterns
- Use context caching for repeated queries on the same large document -- reduces cost by up to 75%
- Batch embeddings by passing an array to
-- one API call for multiple inputsai.models.embedContent() - Set
for deterministic output (enables server-side caching)temperature: 0 - Use
for high-volume, latency-sensitive workloadsgemini-2.5-flash-lite - Reduce embedding dimensions via
for faster similarity searchconfig.outputDimensionality - Use streaming for user-facing responses to improve perceived latency
<decision_framework>
Decision Framework
Which Model to Choose
What is your task? +-- General text generation -> gemini-2.5-flash (best balance) +-- Complex reasoning / code -> gemini-2.5-pro (deep thinking, 1M context) +-- High-volume / budget -> gemini-2.5-flash-lite (cheapest) +-- Cutting-edge / experimental -> gemini-3-flash-preview +-- Text embeddings -> gemini-embedding-001 +-- Multimodal embeddings -> gemini-embedding-2-preview
File Input Method
How large is your file? +-- Under 20 MB -> Inline base64 (inlineData) +-- 20 MB - 2 GB -> ai.files.upload() + createPartFromUri() +-- Need to reuse across requests -> ai.files.upload() (persists 48 hours) +-- External URL (Gemini 3+) -> createPartFromUri() with HTTP URL
Structured Output vs Free Text
Do you need a specific output shape? +-- YES -> Set responseMimeType: "application/json" | +-- Have a Zod schema? -> zodToJsonSchema() into responseJsonSchema | +-- Raw JSON Schema? -> Pass directly to responseJsonSchema +-- NO -> Default text output (no config needed)
Chat vs Single-Turn
Is this a multi-turn conversation? +-- YES -> ai.chats.create() with sendMessage() | +-- Need streaming? -> chat.sendMessageStream() | +-- Need function calling? -> Pass tools in chat config +-- NO -> ai.models.generateContent() (single call)
When to Use Context Caching
Are you sending the same large prefix repeatedly? +-- YES -> Is it 1,024+ tokens (Flash) or 4,096+ (Pro)? | +-- YES -> Use ai.caches.create() to cache the prefix | +-- NO -> Too small to cache, just send inline +-- NO -> Standard generateContent() calls
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Using
(the deprecated package) -- migrate to@google/generative-ai@google/genai - Calling
as a method -- it is a property (response.text()
) in the new SDKresponse.text - Using
-- the new SDK passes model as a string in each API callgetGenerativeModel() - Hardcoding API keys instead of using environment variables
- Not checking
forfinishReason
or"SAFETY"
-- safety blocks silently return no content"MAX_TOKENS"
Medium Priority Issues:
- Putting
,temperature
, ortools
at the top level instead of insidesystemInstructionconfig - Not setting
when uploading files -- the API may reject or misinterpret the fileconfig.mimeType - Using
(old SDK) -- usemodel.startChat()
in the new SDKai.chats.create() - Not consuming
output -- the async iterable must be iteratedgenerateContentStream() - Ignoring uploaded file lifecycle -- files expire after 48 hours
Common Mistakes:
- Confusing
(for structured output) withresponseMimeType
(for file upload config)mimeType - Using
(old SDK parameter name) instead ofresponseSchema
(new SDK)responseJsonSchema - Mixing old SDK patterns (
) with new SDK (model.generateContent(prompt)
)ai.models.generateContent({ model, contents }) - Passing plain Zod schemas to
-- you must convert withresponseJsonSchema
firstzodToJsonSchema() - Using
(singular) instead ofcontent
(plural) incontentsgenerateContent()
Gotchas & Edge Cases:
returnsresponse.text
(not empty string) when content is blocked by safety filters -- always null-checknull- Uploaded files via
expire after 48 hours -- there is no way to extend this, re-upload if neededai.files.upload() - Context caching has minimum token requirements: 1,024 tokens for Flash models, 4,096 for Pro models -- smaller content cannot be cached
- Inline data has a 100 MB payload limit (50 MB for PDFs) -- use file upload for larger content
supports up to 2 GB per file, 20 GB per projectai.files.upload()- Embedding dimensions default to 3,072 -- use
to reduce for cost/speedconfig.outputDimensionality - Gemini embedding models are NOT compatible with each other -- switching models requires re-embedding all data
forces the model to always call a function -- useFunctionCallingConfigMode.ANY
to let the model decideAUTO- Chat history uses
(notrole: "model"
) for Gemini responses -- differs from other providers' convention"assistant" - Safety filter defaults changed: Gemini 2.5+ and 3.x models default to
(no blocking) unlike earlier modelsOFF - The
method takessendMessage()
(object with{ message: string }
key), not a plain stringmessage - When function calling returns multiple calls, process ALL of them before sending results back -- partial responses cause errors
comes from thezodToJsonSchema()
package (separate fromzod-to-json-schema
) -- requireszodnpm install zod-to-json-schema
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
, named constants)import type
(You MUST use
(the new unified SDK) -- NOT the deprecated @google/genai
package)@google/generative-ai
(You MUST access response text via
(a property) -- NOT response.text
(the old SDK used a method call))response.text()
(You MUST pass
as a string parameter in every API call -- there is no model
step)getGenerativeModel()
(You MUST use
for all generation parameters (temperature, safetySettings, tools, systemInstruction) -- NOT top-level properties)config
(You MUST never hardcode API keys -- use environment variables via
or process.env.GEMINI_API_KEY
)GOOGLE_API_KEY
Failure to follow these rules will produce broken integrations due to old SDK patterns or silent safety filter blocks.
</critical_reminders>