Claude-skill-registry gemini-token-optimization
Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/gemini-token-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-gemini-token-optimization && rm -rf "$T"
skills/data/gemini-token-optimization/SKILL.mdGemini Token Optimization
🚨 MANDATORY: Invoke gemini-cli-docs First
STOP - Before providing ANY response about Gemini token usage:
- INVOKE
skillgemini-cli-docs- QUERY for the specific token or pricing topic
- BASE all responses EXCLUSIVELY on official documentation loaded
Overview
Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.
When to Use This Skill
Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens
Use this skill when:
- Planning bulk Gemini operations
- Optimizing costs for large-scale analysis
- Choosing between Flash and Pro models
- Understanding token caching benefits
- Tracking usage across sessions
Token Caching
Gemini CLI automatically caches context to reduce costs by reusing previously processed content.
Availability
| Auth Method | Caching Available |
|---|---|
| API key (Gemini API) | YES |
| Vertex AI | YES |
| OAuth (personal/enterprise) | NO |
How It Works
- System instructions and repeated context are cached
- Cached tokens don't count toward billing
- View savings via
command or JSON output/stats
Maximizing Cache Hits
- Use consistent system prompts - Same prefix increases cache reuse
- Batch similar queries - Group related analysis together
- Reuse context files - Same files in same order
Monitoring Cache Usage
result=$(gemini "query" --output-format json) total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0') billable=$((total - cached)) savings=$((cached * 100 / total)) echo "Total: $total tokens" echo "Cached: $cached tokens ($savings% savings)" echo "Billable: $billable tokens"
Model Selection
Model Comparison
| Model | Context Window | Speed | Cost | Quality |
|---|---|---|---|---|
| gemini-2.5-flash | Large | Fast | Lower | Good |
| gemini-2.5-pro | Very large | Slower | Higher | Best |
Selection Criteria
Use Flash (
) when:-m gemini-2.5-flash
- Processing large files (bulk analysis)
- Simple extraction tasks
- Cost is a primary concern
- Speed is critical
- Task is straightforward
Use Pro (
) when:-m gemini-2.5-pro
- Complex reasoning required
- Quality is critical
- Nuanced analysis needed
- Task requires deep understanding
- Context exceeds 1M tokens
Model Selection Examples
# Bulk file analysis - use Flash for file in src/*.ts; do gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file" done # Security audit - use Pro for quality gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts # Cost tracking with model info result=$(gemini "query" --output-format json) model=$(echo "$result" | jq -r '.stats.models | keys[0]') tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total') echo "Used $model: $tokens tokens"
Batching Strategy
Why Batch?
- Reduces API overhead
- Increases cache hit rate
- Provides consistent context
Batching Patterns
Pattern 1: Concatenate Files
# Instead of N separate calls # Do one call with all files cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
Pattern 2: Batch Prompts
# Combine related questions gemini "Answer these questions about the codebase: 1. What is the main architecture pattern? 2. How is authentication handled? 3. What database is used?" --output-format json
Pattern 3: Staged Analysis
# First pass: Quick overview with Flash overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json) # Second pass: Deep dive critical areas with Pro echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json done
Cost Tracking
Per-Query Tracking
result=$(gemini "query" --output-format json) # Extract all cost-relevant stats total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0') models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")') tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0') latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0') echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
Session Tracking
# Track cumulative usage across a session total_session_tokens=0 total_session_cached=0 total_session_calls=0 track_usage() { local result="$1" local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0') total_session_tokens=$((total_session_tokens + tokens)) total_session_cached=$((total_session_cached + cached)) total_session_calls=$((total_session_calls + 1)) } # Use in workflow result=$(gemini "query 1" --output-format json) track_usage "$result" result=$(gemini "query 2" --output-format json) track_usage "$result" echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
Optimization Checklist
Before Large Operations
- Choose appropriate model (Flash vs Pro)
- Check if caching is available (API key or Vertex)
- Plan batching strategy
- Set up usage tracking
During Operations
- Monitor cache hit rates
- Track per-query costs
- Adjust model if quality insufficient
- Batch similar queries
After Operations
- Review total usage
- Calculate effective cost
- Identify optimization opportunities
- Document learnings
Quick Reference
Cost-Saving Commands
# Use Flash for bulk gemini "query" -m gemini-2.5-flash --output-format json # Check cache effectiveness gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}' # Minimal output (fewer output tokens) gemini "Answer in one sentence: {question}" --output-format json
Cost Estimation
Rough token estimates:
- 1 token ~ 4 characters (English)
- 1 page of code ~ 500-1000 tokens
- Typical source file ~ 200-2000 tokens
Keyword Registry (Delegates to gemini-cli-docs)
| Topic | Query Keywords |
|---|---|
| Caching | , , |
| Model selection | , , |
| Costs | , , |
| Output control | , |
Test Scenarios
Scenario 1: Check Token Usage
Query: "How do I see how many tokens Gemini used?" Expected Behavior:
- Skill activates on "token usage" or "gemini cost"
- Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts
Scenario 2: Reduce Costs
Query: "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior:
- Skill activates on "cost optimization" or "reduce tokens"
- Recommends Flash model and batching Success Criteria: User receives cost optimization strategies
Scenario 3: Model Selection
Query: "Should I use Flash or Pro for this task?" Expected Behavior:
- Skill activates on "flash vs pro" or "model selection"
- Provides decision criteria table Success Criteria: User receives model comparison and recommendation
References
Query
gemini-cli-docs for official documentation on:
- "token caching"
- "model selection"
- "quota and pricing"
Version History
- v1.1.0 (2025-12-01): Added Test Scenarios section
- v1.0.0 (2025-11-25): Initial release