Claude-skill-registry google-gemini-file-search

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/google-gemini-file-search-dennislee928-smart-zone" ~/.claude/skills/majiayu000-claude-skill-registry-google-gemini-file-search-a7ba4b && rm -rf "$T"

manifest: skills/data/google-gemini-file-search-dennislee928-smart-zone/SKILL.md

Google Gemini File Search Setup

Overview

Google Gemini File Search is a fully managed RAG system. Upload documents (100+ formats: PDF, Word, Excel, code) and query with natural language—automatic chunking, embeddings, semantic search, and citations.

What This Skill Provides:

Complete @google/genai File Search API setup
8 documented errors with prevention strategies
Chunking best practices for optimal retrieval
Cost optimization ($0.15/1M tokens indexing, 3x storage multiplier)
Cloudflare Workers + Next.js integration templates

Prerequisites

1. Google AI API Key

Create an API key at https://aistudio.google.com/apikey

Free Tier Limits:

1 GB storage (total across all file search stores)
1,500 requests per day
1 million tokens per minute

Paid Tier Pricing:

Indexing: $0.15 per 1M input tokens (one-time)
Storage: Free (Tier 1: 10 GB, Tier 2: 100 GB, Tier 3: 1 TB)
Query-time embeddings: Free (retrieved context counts as input tokens)

2. Node.js Environment

Minimum Version: Node.js 18+ (v20+ recommended)

node --version  # Should be >=18.0.0

3. Install @google/genai SDK

npm install @google/genai
# or
pnpm add @google/genai
# or
yarn add @google/genai

Current Stable Version: 1.30.0+ (verify with

npm view @google/genai version

)

⚠️ Important: File Search API requires @google/genai v1.29.0 or later. Earlier versions do not support File Search. The API was added in v1.29.0 (November 5, 2025).

4. TypeScript Configuration (Optional but Recommended)

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true
  }
}

Common Errors Prevented

This skill prevents 12 common errors encountered when implementing File Search:

Error 1: Document Immutability

Symptom:

Error: Documents cannot be modified after indexing

Cause: Documents are immutable once indexed. There is no PATCH or UPDATE operation.

Prevention: Use the delete+re-upload pattern for updates:

// ❌ WRONG: Trying to update document (no such API)
await ai.fileSearchStores.documents.update({
  name: documentName,
  customMetadata: { version: '2.0' }
})

// ✅ CORRECT: Delete then re-upload
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
  await ai.fileSearchStores.documents.delete({
    name: oldDoc.name,
    force: true
  })
}

await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('manual-v2.pdf'),
  config: { displayName: 'manual.pdf' }
})

Source: https://ai.google.dev/api/file-search/documents

Error 2: Storage Quota Exceeded

Symptom:

Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.

Cause: Storage calculation includes input files + embeddings + metadata. Total storage ≈ 3x input size.

Prevention: Calculate storage before upload:

// ❌ WRONG: Assuming storage = file size
const fileSize = fs.statSync('data.pdf').size // 500 MB
// Expect 500 MB usage → WRONG

// ✅ CORRECT: Account for 3x multiplier
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB (embeddings + metadata)
console.log(`Estimated storage: ${estimatedStorage / 1e9} GB`)

// Check if within quota before upload
if (estimatedStorage > 1e9) {
  console.warn('⚠️ File may exceed free tier 1 GB limit')
}

Source: https://blog.google/technology/developers/file-search-gemini-api/

Error 3: Incorrect Chunking Configuration

Symptom: Poor retrieval quality, irrelevant results, or context cutoff mid-sentence.

Cause: Default chunking may not be optimal for your content type.

Prevention: Use recommended chunking strategy:

// ❌ WRONG: Using defaults without testing
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf')
  // Default chunking may be too large or too small
})

// ✅ CORRECT: Configure chunking for precision
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf'),
  config: {
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,  // Smaller chunks = more precise retrieval
        maxOverlapTokens: 50     // 10% overlap prevents context loss
      }
    }
  }
})

Chunking Guidelines:

Technical docs/code: 500 tokens/chunk, 50 overlap
Prose/articles: 800 tokens/chunk, 80 overlap
Legal/contracts: 300 tokens/chunk, 30 overlap (high precision)

Source: https://www.philschmid.de/gemini-file-search-javascript

Error 4: Metadata Limits Exceeded

Symptom:

Error: Maximum 20 custom metadata key-value pairs allowed

Cause: Each document can have at most 20 metadata fields.

Prevention: Design compact metadata schema:

// ❌ WRONG: Too many metadata fields
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author: 'John Doe',
      department: 'Engineering',
      created_date: '2025-01-01',
      // ... 18 more fields → Error!
    }
  }
})

// ✅ CORRECT: Use hierarchical keys or JSON strings
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author_dept: 'John Doe|Engineering',  // Combine related fields
      dates: JSON.stringify({                // Or use JSON for complex data
        created: '2025-01-01',
        updated: '2025-01-15'
      })
    }
  }
})

Source: https://ai.google.dev/api/file-search/documents

Error 5: Indexing Cost Surprises

Symptom: Unexpected bill for $375 after uploading 10 GB of documents.

Cause: Indexing costs are one-time but calculated per input token ($0.15/1M tokens).

Prevention: Estimate costs before indexing:

// ❌ WRONG: No cost estimation
await uploadAllDocuments(fileStore.name, './data') // 10 GB uploaded → $375 surprise

// ✅ CORRECT: Calculate costs upfront
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // Rough estimate: 1 token ≈ 4 bytes
const indexingCost = (estimatedTokens / 1e6) * 0.15

console.log(`Estimated indexing cost: $${indexingCost.toFixed(2)}`)
console.log(`Estimated storage: ${(totalSize * 3) / 1e9} GB`)

// Confirm before proceeding
const proceed = await confirm(`Proceed with indexing? Cost: $${indexingCost.toFixed(2)}`)
if (proceed) {
  await uploadAllDocuments(fileStore.name, './data')
}

Cost Examples:

1 GB text ≈ 250M tokens = $37.50 indexing
100 MB PDF ≈ 25M tokens = $3.75 indexing
10 MB code ≈ 2.5M tokens = $0.38 indexing

Source: https://ai.google.dev/pricing

Error 6: Not Polling Operation Status

Symptom: Query returns no results immediately after upload, or incomplete indexing.

Cause: File uploads are processed asynchronously. Must poll operation until

done: true

Prevention: Always poll operation status with timeout and fallback:

// ❌ WRONG: Assuming upload is instant
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})
// Immediately query → No results!

// ✅ CORRECT: Poll until indexing complete with timeout
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})

// Poll with timeout and fallback
const MAX_POLL_TIME = 60000 // 60 seconds
const POLL_INTERVAL = 1000
let elapsed = 0

while (!operation.done && elapsed < MAX_POLL_TIME) {
  await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
  elapsed += POLL_INTERVAL

  try {
    operation = await ai.operations.get({ name: operation.name })
    console.log(`Indexing progress: ${operation.metadata?.progress || 'processing...'}`)
  } catch (error) {
    console.warn('Polling failed, assuming complete:', error)
    break
  }
}

if (operation.error) {
  throw new Error(`Indexing failed: ${operation.error.message}`)
}

// ⚠️ Warning: operations.get() can be unreliable for large files
// If timeout reached, verify document exists manually
if (elapsed >= MAX_POLL_TIME) {
  console.warn('Polling timeout - verifying document manually')
  const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
  const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
  if (uploaded) {
    console.log('✅ Document found despite polling timeout')
  } else {
    throw new Error('Upload failed - document not found')
  }
}

console.log('✅ Indexing complete:', operation.response?.displayName)

Source: https://ai.google.dev/api/file-search/file-search-stores#uploadtofilesearchstore, GitHub Issue #1211

Error 7: Forgetting Force Delete

Symptom:

Error: Cannot delete store with documents. Set force=true.

Cause: Stores with documents require

force: true

to delete (prevents accidental deletion).

Prevention: Always use

force: true

when deleting non-empty stores:

// ❌ WRONG: Trying to delete store with documents
await ai.fileSearchStores.delete({
  name: fileStore.name
})
// Error: Cannot delete store with documents

// ✅ CORRECT: Use force delete
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true  // Deletes store AND all documents
})

// Alternative: Delete documents first
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
  await ai.fileSearchStores.documents.delete({
    name: doc.name,
    force: true
  })
}
await ai.fileSearchStores.delete({ name: fileStore.name })

Source: https://ai.google.dev/api/file-search/file-search-stores#delete

Error 8: Using Unsupported Models

Symptom:

Error: File Search is only supported for Gemini 3 Pro and Flash models

Cause: File Search requires Gemini 3 Pro or Gemini 3 Flash. Gemini 2.x and 1.5 models are not supported.

Prevention: Always use Gemini 3 models:

// ❌ WRONG: Using Gemini 1.5 model
const response = await ai.models.generateContent({
  model: 'gemini-1.5-pro',  // Not supported!
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

// ✅ CORRECT: Use Gemini 3 models
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',  // ✅ Supported (fast, cost-effective)
  // OR
  // model: 'gemini-3-pro',   // ✅ Supported (higher quality)
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

Source: https://ai.google.dev/gemini-api/docs/file-search

Error 9: displayName Not Preserved for Blob Sources (Fixed v1.34.0+)

Symptom:

groundingChunks[0].title === null  // No document source shown

Cause: In @google/genai versions prior to v1.34.0, when uploading files as

Blob

objects (not file paths), the SDK dropped the

displayName

and

customMetadata

configuration fields.

Prevention:

// ✅ CORRECT: Upgrade to v1.34.0+ for automatic fix
npm install @google/genai@latest  // v1.34.0+

await ai.fileSearchStores.uploadToFileSearchStore({
  name: storeName,
  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
  config: {
    displayName: 'Safety Manual.pdf',  // ✅ Now preserved
    customMetadata: { version: '1.0' }  // ✅ Now preserved
  }
})

// ⚠️ WORKAROUND for v1.33.0 and earlier: Use resumable upload
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`

// Step 1: Initiate with displayName in body
const initResponse = await fetch(uploadUrl, {
  method: 'POST',
  headers: {
    'X-Goog-Upload-Protocol': 'resumable',
    'X-Goog-Upload-Command': 'start',
    'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
    'X-Goog-Upload-Header-Content-Type': 'application/pdf',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    displayName: 'Safety Manual.pdf'  // ✅ Works with resumable upload
  })
})

// Step 2: Upload file bytes
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
  method: 'PUT',
  headers: {
    'Content-Length': numBytes.toString(),
    'X-Goog-Upload-Offset': '0',
    'X-Goog-Upload-Command': 'upload, finalize',
    'Content-Type': 'application/pdf'
  },
  body: fileBytes
})

Source: GitHub Issue #1078

Error 10: Grounding Metadata Ignored with JSON Response Mode

Symptom:

response.candidates[0].groundingMetadata === undefined
// Even though fileSearch tool is configured

Cause: When using

responseMimeType: 'application/json'

for structured output, the API ignores the

fileSearch

tool and returns no grounding metadata, even with Gemini 3 models.

Prevention:

// ❌ WRONG: Structured output overrides grounding
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    responseMimeType: 'application/json',  // Loses grounding
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

// ✅ CORRECT: Two-step approach
// Step 1: Get grounded text response
const textResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

const grounding = textResponse.candidates[0].groundingMetadata

// Step 2: Convert to structured format in prompt
const jsonResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: `Convert to JSON: ${textResponse.text}

Format:
{
  "summary": "...",
  "key_points": ["..."]
}`,
  config: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        summary: { type: 'string' },
        key_points: { type: 'array', items: { type: 'string' } }
      }
    }
  }
})

// Combine results
const result = {
  data: JSON.parse(jsonResponse.text),
  sources: grounding.groundingChunks
}

Source: GitHub Issue #829

Error 11: Google Search and File Search Tools Are Mutually Exclusive

Symptom:

Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT

Cause: The Gemini API does not allow using

googleSearch

and

fileSearch

tools in the same request.

Prevention:

// ❌ WRONG: Combining search tools
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the latest industry guidelines?',
  config: {
    tools: [
      { googleSearch: {} },
      { fileSearch: { fileSearchStoreNames: [storeName] } }
    ]
  }
})

// ✅ CORRECT: Use separate specialist agents
async function searchWeb(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ googleSearch: {} }] }
  })
}

async function searchDocuments(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
  })
}

// Orchestrate based on query type
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
  ? await searchWeb(query)
  : await searchDocuments(query)

Source: GitHub Issue #435, Google Codelabs

Error 12: Batch API Missing Response Metadata (Community-sourced)

Symptom: Cannot correlate batch responses with requests when using metadata field.

Cause: When using Batch API with

InlinedRequest

that includes a

metadata

field, the corresponding

InlinedResponse

does not return the metadata.

Prevention:

// ❌ WRONG: Expecting metadata in response
const batchRequest = {
  metadata: { key: 'my-request-id' },
  contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
}

const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata)  // ❌ undefined

// ✅ CORRECT: Use array index to correlate
const requests = [
  { metadata: { id: 'req-1' }, contents: [...] },
  { metadata: { id: 'req-2' }, contents: [...] }
]

const responses = await ai.batch.create({ requests })

// Map by index (not ideal but works)
responses.responses.forEach((response, i) => {
  const requestMetadata = requests[i].metadata
  console.log(`Response for ${requestMetadata.id}:`, response)
})

Community Verification: Maintainer confirmed, internal bug filed.

Source: GitHub Issue #1191

Setup Instructions

Step 1: Initialize Client

import { GoogleGenAI } from '@google/genai'
import fs from 'fs'

// Initialize client with API key
const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_API_KEY
})

// Verify API key is set
if (!process.env.GOOGLE_API_KEY) {
  throw new Error('GOOGLE_API_KEY environment variable is required')
}

Step 2: Create File Search Store

// Create a store (container for documents)
const fileStore = await ai.fileSearchStores.create({
  config: {
    displayName: 'my-knowledge-base',  // Human-readable name
    // Optional: Add store-level metadata
    customMetadata: {
      project: 'customer-support',
      environment: 'production'
    }
  }
})

console.log('Created store:', fileStore.name)
// Output: fileSearchStores/abc123xyz...

Finding Existing Stores:

// List all stores (paginated)
const stores = await ai.fileSearchStores.list({
  pageSize: 20  // Max 20 per page
})

// Find by display name
let targetStore = null
let pageToken = null

do {
  const page = await ai.fileSearchStores.list({ pageToken })
  targetStore = page.fileSearchStores.find(
    s => s.displayName === 'my-knowledge-base'
  )
  pageToken = page.nextPageToken
} while (!targetStore && pageToken)

if (targetStore) {
  console.log('Found existing store:', targetStore.name)
} else {
  console.log('Store not found, creating new one...')
}

Step 3: Upload Documents

Single File Upload:

const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('./docs/manual.pdf'),
  config: {
    displayName: 'Installation Manual',
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      language: 'en'
    },
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,
        maxOverlapTokens: 50
      }
    }
  }
})

// Poll until indexing complete
while (!operation.done) {
  await new Promise(resolve => setTimeout(resolve, 1000))
  operation = await ai.operations.get({ name: operation.name })
}

console.log('✅ Indexed:', operation.response.displayName)

Batch Upload (Concurrent):

const filePaths = [
  './docs/manual.pdf',
  './docs/faq.md',
  './docs/troubleshooting.docx'
]

// Upload all files concurrently
const uploadPromises = filePaths.map(filePath =>
  ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath),
    config: {
      displayName: filePath.split('/').pop(),
      customMetadata: {
        doc_type: 'support',
        source_path: filePath
      },
      chunkingConfig: {
        whiteSpaceConfig: {
          maxTokensPerChunk: 500,
          maxOverlapTokens: 50
        }
      }
    }
  })
)

const operations = await Promise.all(uploadPromises)

// Poll all operations
for (const operation of operations) {
  let op = operation
  while (!op.done) {
    await new Promise(resolve => setTimeout(resolve, 1000))
    op = await ai.operations.get({ name: op.name })
  }
  console.log('✅ Indexed:', op.response.displayName)
}

Step 4: Query with File Search

Basic Query:

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the safety precautions for installation?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name]
      }
    }]
  }
})

console.log('Answer:', response.text)

// Access citations
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
  console.log('\nSources:')
  grounding.groundingChunks.forEach((chunk, i) => {
    console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
    console.log(`   URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
  })
}

Query with Metadata Filtering:

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'How do I reset the device?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        // Filter to only search troubleshooting docs in English, version 1.0
        metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
      }
    }]
  }
})

console.log('Answer:', response.text)

Metadata Filter Syntax:

AND:
```
key1="value1" AND key2="value2"
```
OR:
```
key1="value1" OR key1="value2"
```
Parentheses:
```
(key1="a" OR key1="b") AND key2="c"
```

Step 5: List and Manage Documents

// List all documents in store
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name,
  pageSize: 20
})

console.log(`Total documents: ${docs.documents?.length || 0}`)

docs.documents?.forEach(doc => {
  console.log(`- ${doc.displayName} (${doc.name})`)
  console.log(`  Metadata:`, doc.customMetadata)
})

// Get specific document details
const docDetails = await ai.fileSearchStores.documents.get({
  name: docs.documents[0].name
})

console.log('Document details:', docDetails)

// Delete document
await ai.fileSearchStores.documents.delete({
  name: docs.documents[0].name,
  force: true
})

Step 6: Cleanup

// Delete entire store (force deletes all documents)
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true
})

console.log('✅ Store deleted')

Recommended Chunking Strategies

Chunking configuration significantly impacts retrieval quality. Adjust based on content type:

Technical Documentation

chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 500,   // Smaller chunks for precise code/API lookup
    maxOverlapTokens: 50      // 10% overlap
  }
}

Best for: API docs, SDK references, code examples, configuration guides

Prose and Articles

chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 800,   // Larger chunks preserve narrative flow
    maxOverlapTokens: 80      // 10% overlap
  }
}

Best for: Blog posts, news articles, product descriptions, marketing materials

Legal and Contracts

chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 300,   // Very small chunks for high precision
    maxOverlapTokens: 30      // 10% overlap
  }
}

Best for: Legal documents, contracts, regulations, compliance docs

FAQ and Support

chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 400,   // Medium chunks (1-2 Q&A pairs)
    maxOverlapTokens: 40      // 10% overlap
  }
}

Best for: FAQs, troubleshooting guides, how-to articles

General Rule: Maintain 10% overlap (overlap = chunk size / 10) to prevent context loss at chunk boundaries.

Metadata Best Practices

Design metadata schema for filtering and organization:

Example: Customer Support Knowledge Base

customMetadata: {
  doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
  product: 'widget-pro' | 'widget-lite',
  version: '1.0' | '2.0',
  language: 'en' | 'es' | 'fr',
  category: 'installation' | 'configuration' | 'maintenance',
  priority: 'critical' | 'normal' | 'low',
  last_updated: '2025-01-15',
  author: 'support-team'
}

Query Example:

metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'

Example: Legal Document Repository

customMetadata: {
  doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
  jurisdiction: 'US' | 'EU' | 'UK',
  practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
  effective_date: '2025-01-01',
  status: 'active' | 'archived',
  confidentiality: 'public' | 'internal' | 'privileged'
}

Example: Code Documentation

customMetadata: {
  doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
  language: 'javascript' | 'python' | 'java' | 'go',
  framework: 'react' | 'nextjs' | 'express' | 'fastapi',
  version: '1.2.0',
  difficulty: 'beginner' | 'intermediate' | 'advanced'
}

Tips:

Use consistent key naming (
```
snake_case
```
or
```
camelCase
```
)
Limit to most important filterable fields (20 max)
Use enums/constants for values (easier filtering)
Include version and date fields for time-based filtering

Cost Optimization

1. Deduplicate Before Upload

// Track uploaded file hashes to avoid duplicates
const uploadedHashes = new Set<string>()

async function uploadWithDeduplication(filePath: string) {
  const fileHash = await getFileHash(filePath)

  if (uploadedHashes.has(fileHash)) {
    console.log(`Skipping duplicate: ${filePath}`)
    return
  }

  await ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath)
  })

  uploadedHashes.add(fileHash)
}

2. Compress Large Files

// Convert images to text before indexing (OCR)
// Compress PDFs (remove images, use text-only)
// Use markdown instead of Word docs (smaller size)

3. Use Metadata Filtering to Reduce Query Scope

// ❌ EXPENSIVE: Search all 10GB of documents
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

// ✅ CHEAPER: Filter to only troubleshooting docs (subset)
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        metadataFilter: 'doc_type="troubleshooting"'  // Reduces search scope
      }
    }]
  }
})

4. Choose Flash Over Pro for Cost Savings

// Gemini 3 Flash is 10x cheaper than Pro for queries
// Use Flash unless you need Pro's advanced reasoning

// Development/testing: Use Flash
model: 'gemini-3-flash'

// Production (high-stakes answers): Use Pro
model: 'gemini-3-pro'

5. Monitor Storage Usage

// List stores and estimate storage
const stores = await ai.fileSearchStores.list()

for (const store of stores.fileSearchStores || []) {
  const docs = await ai.fileSearchStores.documents.list({
    parent: store.name
  })

  console.log(`Store: ${store.displayName}`)
  console.log(`Documents: ${docs.documents?.length || 0}`)
  // Estimate storage (3x input size)
  console.log(`Estimated storage: ~${(docs.documents?.length || 0) * 10} MB`)
}

Testing & Verification

Verify Store Creation

const store = await ai.fileSearchStores.get({
  name: fileStore.name
})

console.assert(store.displayName === 'my-knowledge-base', 'Store name mismatch')
console.log('✅ Store created successfully')

Verify Document Indexing

const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

console.assert(docs.documents?.length > 0, 'No documents indexed')
console.log(`✅ ${docs.documents?.length} documents indexed`)

Verify Query Functionality

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What is this knowledge base about?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

console.assert(response.text.length > 0, 'Empty response')
console.log('✅ Query successful:', response.text.substring(0, 100) + '...')

Verify Citations

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Provide a specific answer with citations.',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

const grounding = response.candidates[0].groundingMetadata
console.assert(
  grounding?.groundingChunks?.length > 0,
  'No grounding/citations returned'
)
console.log(`✅ ${grounding?.groundingChunks?.length} citations returned`)

Integration Examples

Streaming Support

File Search supports streaming responses with

generateContentStream()

// ✅ Streaming works with File Search (v1.34.0+)
const stream = await ai.models.generateContentStream({
  model: 'gemini-3-flash',
  contents: 'Summarize the document',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

// Access grounding after stream completes
const grounding = stream.candidates[0].groundingMetadata

Note: Early SDK versions (pre-v1.34.0) may have had streaming issues. Use v1.34.0+ for reliable streaming support.

Source: GitHub Issue #1221

Working Templates

This skill includes 3 working templates in the

templates/

directory:

Template 1: basic-node-rag

Minimal Node.js/TypeScript example demonstrating:

Create file search store
Upload multiple documents
Query with natural language
Display citations

Use when: Learning File Search, prototyping, simple CLI tools

Run:

cd templates/basic-node-rag
npm install
npm run dev

Template 2: cloudflare-worker-rag

Cloudflare Workers integration showing:

Edge API for document upload
Edge API for semantic search
Integration with R2 for document storage
Hybrid architecture (Gemini File Search + Cloudflare edge)

Use when: Building global edge applications, integrating with Cloudflare stack

Deploy:

cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy

Template 3: nextjs-docs-search

Full-stack Next.js application featuring:

Document upload UI with drag-and-drop
Real-time search interface
Citation rendering with source links
Metadata filtering UI

Use when: Building production documentation sites, knowledge bases

Run:

cd templates/nextjs-docs-search
npm install
npm run dev

References

Official Documentation:

File Search Overview: https://ai.google.dev/gemini-api/docs/file-search
API Reference (Stores): https://ai.google.dev/api/file-search/file-search-stores
API Reference (Documents): https://ai.google.dev/api/file-search/documents
Blog Announcement: https://blog.google/technology/developers/file-search-gemini-api/
Pricing: https://ai.google.dev/pricing

Tutorials:

JavaScript/TypeScript Guide: https://www.philschmid.de/gemini-file-search-javascript
SDK Repository: https://github.com/googleapis/js-genai

Bundled Resources in This Skill:

```
references/api-reference.md
```
- Complete API documentation
```
references/chunking-best-practices.md
```
- Detailed chunking strategies
```
references/pricing-calculator.md
```
- Cost estimation guide
```
references/migration-from-openai.md
```
- Migration guide from OpenAI Files API
```
scripts/create-store.ts
```
- CLI tool to create stores
```
scripts/upload-batch.ts
```
- Batch upload script
```
scripts/query-store.ts
```
- Interactive query tool
```
scripts/cleanup.ts
```
- Cleanup script

Working Templates:

```
templates/basic-node-rag/
```
- Minimal Node.js example
```
templates/cloudflare-worker-rag/
```
- Edge deployment example
```
templates/nextjs-docs-search/
```
- Full-stack Next.js app

Skill Version: 1.1.0 Last Verified: 2026-01-21 Package Version: @google/genai ^1.38.0 (minimum 1.29.0 required) Token Savings: ~67% Errors Prevented: 12 Changes: Added 4 new errors from community research (displayName Blob issue, grounding with JSON mode, tool conflicts, batch API metadata), enhanced polling timeout pattern with fallback verification, added streaming support note