Claude-code-plugins-plus-skills vertex-agent-builder
Build and deploy production-ready generative AI agents using Vertex AI, Gemini models, and Google Cloud infrastructure with RAG, function calling, and multi-modal capabilities
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/backups/skills-batch-20251204-000554/plugins/jeremy-vertex-ai/skills/vertex-agent-builder" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-vertex-agent-builder && rm -rf "$T"
manifest:
backups/skills-batch-20251204-000554/plugins/jeremy-vertex-ai/skills/vertex-agent-builder/SKILL.mdsource content
Vertex AI Agent Builder Skill
Overview
This skill provides production-ready scaffolding and deployment for generative AI agents using Google Cloud's Vertex AI platform. Built on actual Google Cloud source code from
GoogleCloudPlatform/generative-ai and agent-starter-pack repositories, it offers:
- Gemini Model Integration (1.5 Pro, 1.5 Flash, experimental models)
- Multi-modal Capabilities (text, image, video, audio processing)
- RAG Implementation (Retrieval Augmented Generation with Vector Search)
- Function Calling (Tool use with Gemini models)
- Production Deployment (Cloud Run, Vertex AI Endpoints)
- Evaluation Framework (AutoSxS, ROUGE, custom metrics)
- Observability (Cloud Logging, Monitoring, Trace)
Installation
# Install Google Cloud SDK curl https://sdk.cloud.google.com | bash exec -l $SHELL gcloud init # Install Python SDK pip install google-cloud-aiplatform>=1.38.0 pip install vertexai>=1.46.0 pip install google-generativeai>=0.3.2 # Clone source repositories git clone https://github.com/GoogleCloudPlatform/generative-ai.git git clone https://github.com/GoogleCloudPlatform/agent-starter-pack.git git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git # Install this plugin /plugin install jeremy-vertex-ai@jeremylongshore
Quick Start (5 Minutes)
Create Your First Vertex AI Agent
import vertexai from vertexai.generative_models import GenerativeModel, ChatSession from vertexai.preview.generative_models import grounding # Initialize Vertex AI vertexai.init(project="your-project-id", location="us-central1") # Create Gemini model model = GenerativeModel( "gemini-1.5-pro-002", system_instruction="""You are a helpful AI assistant that can: - Search the web for information - Analyze documents and images - Execute Python code - Call external APIs """ ) # Start chat session chat = model.start_chat() # Send message with grounding response = chat.send_message( "What are the latest developments in quantum computing?", generation_config={ "temperature": 0.3, "max_output_tokens": 2048, "top_p": 0.95, }, safety_settings={ "HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE", "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_MEDIUM_AND_ABOVE", }, tools=[grounding.Tool(google_search_retrieval=grounding.GoogleSearchRetrieval())] ) print(response.text)
Agent Patterns
1. Production Agent with Agent Builder
Based on
GoogleCloudPlatform/agent-starter-pack:
from vertexai.preview import agents from vertexai.preview.generative_models import Tool, FunctionDeclaration import functions_framework class ProductionAgent: """ Production-ready agent using Vertex AI Agent Builder """ def __init__(self, project_id: str, location: str = "us-central1"): self.project_id = project_id self.location = location # Initialize agent self.agent = agents.Agent.create( project=project_id, location=location, display_name="production-agent", description="Production-ready agent with tools and RAG", model="gemini-1.5-pro-002", tools=self._create_tools(), system_instruction=self._get_system_instruction() ) def _create_tools(self): """Create function calling tools""" return [ Tool(function_declarations=[ FunctionDeclaration( name="search_knowledge_base", description="Search internal knowledge base", parameters={ "type": "object", "properties": { "query": {"type": "string"}, "top_k": {"type": "integer", "default": 5} } } ), FunctionDeclaration( name="execute_sql", description="Execute SQL query on BigQuery", parameters={ "type": "object", "properties": { "query": {"type": "string"}, "dataset": {"type": "string"} } } ), FunctionDeclaration( name="send_email", description="Send email notification", parameters={ "type": "object", "properties": { "to": {"type": "string"}, "subject": {"type": "string"}, "body": {"type": "string"} } } ) ]) ] def _get_system_instruction(self): """Get system instruction for agent""" return """You are a production AI agent that helps users with: 1. Information retrieval from knowledge bases 2. Data analysis using BigQuery 3. Automated communications Always verify user intent before executing critical operations. Provide clear explanations of what you're doing and why. """ async def process_request(self, user_input: str): """Process user request""" session = self.agent.start_session() response = await session.send_message(user_input) # Handle function calls if response.function_calls: for func_call in response.function_calls: result = await self._execute_function( func_call.name, func_call.args ) # Send function result back to model response = await session.send_message( content=None, function_responses=[{ "name": func_call.name, "response": result }] ) return response.text async def _execute_function(self, name: str, args: dict): """Execute function call""" if name == "search_knowledge_base": return await self._search_kb(args["query"], args.get("top_k", 5)) elif name == "execute_sql": return await self._execute_bigquery(args["query"], args["dataset"]) elif name == "send_email": return await self._send_email(args["to"], args["subject"], args["body"]) # Deploy as Cloud Function @functions_framework.http def agent_endpoint(request): """HTTP Cloud Function for agent""" agent = ProductionAgent( project_id="your-project", location="us-central1" ) request_json = request.get_json() response = agent.process_request(request_json["message"]) return {"response": response}
2. RAG-Enhanced Agent
Based on
GoogleCloudPlatform/generative-ai/gemini/use-cases/retrieval-augmented-generation:
from vertexai.language_models import TextEmbeddingModel from vertexai.preview import rag from google.cloud import aiplatform class RAGAgent: """ Agent with Retrieval Augmented Generation using Vertex AI Search """ def __init__(self, project_id: str, corpus_name: str): self.project_id = project_id # Create RAG corpus self.corpus = rag.create_corpus( display_name=corpus_name, description="Knowledge base for RAG" ) # Initialize embedding model self.embedding_model = TextEmbeddingModel.from_pretrained( "text-embedding-004" ) # Initialize Gemini model self.model = GenerativeModel("gemini-1.5-pro-002") async def import_documents(self, gcs_uris: list): """Import documents into RAG corpus""" import_job = rag.import_files( corpus_name=self.corpus.name, gcs_uris=gcs_uris, chunk_size=512, chunk_overlap=100 ) # Wait for import to complete import_job.wait() return import_job async def query_with_rag(self, query: str, similarity_top_k: int = 5): """Query with RAG retrieval""" # Retrieve relevant chunks retrieval_response = rag.retrieval_query( rag_resources=[ rag.RagResource( rag_corpus=self.corpus.name, similarity_top_k=similarity_top_k ) ], query=query ) # Build context from retrieved chunks context = "\n\n".join([ f"Source: {chunk.source}\nContent: {chunk.text}" for chunk in retrieval_response.contexts ]) # Generate response with context prompt = f"""Based on the following context, answer the question. Context: {context} Question: {query} Answer:""" response = self.model.generate_content( prompt, generation_config={ "temperature": 0.2, "max_output_tokens": 1024, } ) return { "answer": response.text, "sources": [chunk.source for chunk in retrieval_response.contexts], "confidence": retrieval_response.attribution_score }
3. Multi-Modal Agent
Based on
GoogleCloudPlatform/generative-ai/gemini/multimodality:
from vertexai.generative_models import GenerativeModel, Part import vertexai.preview.generative_models as generative_models class MultiModalAgent: """ Agent that processes text, images, video, and audio """ def __init__(self): self.model = GenerativeModel("gemini-1.5-flash-002") async def analyze_image(self, image_path: str, prompt: str): """Analyze image with text prompt""" image = Part.from_uri( uri=f"gs://your-bucket/{image_path}", mime_type="image/jpeg" ) response = self.model.generate_content([prompt, image]) return response.text async def analyze_video(self, video_path: str, prompt: str): """Analyze video content""" video = Part.from_uri( uri=f"gs://your-bucket/{video_path}", mime_type="video/mp4" ) response = self.model.generate_content( [prompt, video], generation_config={ "temperature": 0.4, "max_output_tokens": 2048, } ) return response.text async def analyze_document(self, pdf_path: str): """Extract and analyze PDF document""" document = Part.from_uri( uri=f"gs://your-bucket/{pdf_path}", mime_type="application/pdf" ) prompt = """Extract all key information from this document: 1. Main topics covered 2. Key findings or conclusions 3. Important data or statistics 4. Actionable recommendations """ response = self.model.generate_content([prompt, document]) return response.text async def generate_code(self, description: str, language: str = "python"): """Generate code from natural language""" prompt = f"""Generate {language} code for the following requirement: {description} Requirements: - Include proper error handling - Add comprehensive comments - Follow best practices for {language} - Make it production-ready """ response = self.model.generate_content( prompt, generation_config={ "temperature": 0.2, "max_output_tokens": 4096, } ) return response.text
Deployment Patterns
1. Cloud Run Deployment
# cloudbuild.yaml steps: # Build container - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA', '.'] # Push to Container Registry - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA'] # Deploy to Cloud Run - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: - 'run' - 'deploy' - 'vertex-agent' - '--image' - 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA' - '--region' - 'us-central1' - '--platform' - 'managed' - '--allow-unauthenticated' - '--set-env-vars' - 'GCP_PROJECT=$PROJECT_ID'
# Dockerfile FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application COPY . . # Run as non-root user RUN useradd -m -u 1000 agent && chown -R agent:agent /app USER agent EXPOSE 8080 CMD ["gunicorn", "--bind", ":8080", "--workers", "1", "--threads", "8", "--timeout", "0", "main:app"]
2. Vertex AI Endpoint Deployment
from google.cloud import aiplatform def deploy_to_vertex_endpoint(): """Deploy model to Vertex AI Endpoint""" aiplatform.init(project="your-project", location="us-central1") # Upload model model = aiplatform.Model.upload( display_name="vertex-agent-model", artifact_uri="gs://your-bucket/model", serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest" ) # Create endpoint endpoint = aiplatform.Endpoint.create( display_name="vertex-agent-endpoint", ) # Deploy model to endpoint endpoint.deploy( model=model, deployed_model_display_name="vertex-agent-v1", machine_type="n1-standard-4", min_replica_count=1, max_replica_count=5, accelerator_type="NVIDIA_TESLA_T4", accelerator_count=1, ) return endpoint
Evaluation Framework
Based on
GoogleCloudPlatform/generative-ai/gemini/evaluation:
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplate import pandas as pd class AgentEvaluator: """ Evaluate agent performance using Vertex AI evaluation framework """ def __init__(self): self.eval_dataset = self._load_eval_dataset() def _load_eval_dataset(self): """Load evaluation dataset""" return pd.DataFrame([ { "prompt": "What is the capital of France?", "reference": "The capital of France is Paris.", "context": "France is a country in Western Europe." }, # Add more test cases ]) async def evaluate_accuracy(self, agent): """Evaluate agent accuracy""" eval_task = EvalTask( dataset=self.eval_dataset, metrics=[ "rouge_l_sum", "bleu", "exact_match", MetricPromptTemplate( metric="custom_factuality", metric_prompt_template="""Rate the factual accuracy of the response. Reference: {reference} Response: {response} Score (0-1):""" ) ] ) # Run evaluation results = [] for _, row in self.eval_dataset.iterrows(): response = await agent.process_request(row["prompt"]) results.append({ "prompt": row["prompt"], "reference": row["reference"], "response": response }) eval_result = eval_task.evaluate( model=agent.model, response_column="response" ) return eval_result.summary_metrics
Monitoring & Observability
from google.cloud import monitoring_v3 from google.cloud import logging import time class AgentMonitor: """ Monitor agent performance and usage """ def __init__(self, project_id: str): self.project_id = project_id self.monitoring_client = monitoring_v3.MetricServiceClient() self.logging_client = logging.Client() self.logger = self.logging_client.logger("vertex-agent") def log_request(self, request_id: str, user_input: str, response: str, latency: float): """Log agent request""" self.logger.log_struct({ "request_id": request_id, "timestamp": time.time(), "user_input": user_input, "response": response[:500], # Truncate long responses "latency_ms": latency * 1000, "model": "gemini-1.5-pro-002", "success": True }) def create_custom_metric(self, metric_name: str, value: float): """Create custom metric in Cloud Monitoring""" project_name = f"projects/{self.project_id}" series = monitoring_v3.TimeSeries() series.metric.type = f"custom.googleapis.com/agent/{metric_name}" now = time.time() seconds = int(now) nanos = int((now - seconds) * 10 ** 9) interval = monitoring_v3.TimeInterval( {"end_time": {"seconds": seconds, "nanos": nanos}} ) point = monitoring_v3.Point({ "interval": interval, "value": {"double_value": value} }) series.points = [point] self.monitoring_client.create_time_series( name=project_name, time_series=[series] ) def create_dashboard(self): """Create monitoring dashboard""" # Dashboard configuration dashboard_config = { "displayName": "Vertex Agent Dashboard", "widgets": [ { "title": "Request Latency", "xyChart": { "dataSets": [{ "timeSeriesQuery": { "timeSeriesFilter": { "filter": 'metric.type="custom.googleapis.com/agent/latency"' } } }] } }, { "title": "Error Rate", "xyChart": { "dataSets": [{ "timeSeriesQuery": { "timeSeriesFilter": { "filter": 'metric.type="custom.googleapis.com/agent/errors"' } } }] } } ] } return dashboard_config
Cost Optimization
class CostOptimizedAgent: """ Cost-optimized agent configuration """ def __init__(self): # Model selection based on task complexity self.models = { "simple": "gemini-1.5-flash-002", # Fastest and cheapest "standard": "gemini-1.5-pro-002", # Balanced "complex": "gemini-1.5-pro-002", # Most capable } # Caching for repeated queries self.response_cache = {} def select_model(self, query: str) -> str: """Select appropriate model based on query complexity""" # Simple heuristic based on query length and keywords complexity_score = len(query) / 100 if "analyze" in query or "explain" in query: complexity_score += 0.3 if "code" in query or "implement" in query: complexity_score += 0.4 if complexity_score < 0.3: return self.models["simple"] elif complexity_score < 0.7: return self.models["standard"] else: return self.models["complex"] async def process_with_caching(self, query: str): """Process query with response caching""" # Check cache first cache_key = hashlib.md5(query.encode()).hexdigest() if cache_key in self.response_cache: return self.response_cache[cache_key] # Select optimal model model_name = self.select_model(query) model = GenerativeModel(model_name) # Process query response = model.generate_content( query, generation_config={ "temperature": 0.1, # Lower temperature for consistency "max_output_tokens": 1024, # Limit output size } ) # Cache response self.response_cache[cache_key] = response.text return response.text
Integration with Google Cloud Services
BigQuery Integration
from google.cloud import bigquery class BigQueryAgent: """Agent with BigQuery data access""" def __init__(self, project_id: str): self.bq_client = bigquery.Client(project=project_id) self.model = GenerativeModel("gemini-1.5-pro-002") async def nl2sql(self, natural_language_query: str, dataset: str): """Convert natural language to SQL""" # Get table schemas tables = self.bq_client.list_tables(dataset) schema_info = [] for table in tables: table_ref = self.bq_client.get_table(table.reference) schema_info.append(f"Table: {table.table_id}\nSchema: {table_ref.schema}") prompt = f"""Convert this natural language query to SQL: Query: {natural_language_query} Available tables and schemas: {chr(10).join(schema_info)} Return only the SQL query, no explanation. """ response = self.model.generate_content(prompt) sql_query = response.text.strip() # Execute query query_job = self.bq_client.query(sql_query) results = query_job.result() return { "sql": sql_query, "results": [dict(row) for row in results], "total_rows": results.total_rows }
Cloud Storage Integration
from google.cloud import storage class StorageAgent: """Agent with Cloud Storage access""" def __init__(self, bucket_name: str): self.storage_client = storage.Client() self.bucket = self.storage_client.bucket(bucket_name) async def process_documents(self, prefix: str): """Process all documents in a GCS prefix""" blobs = self.bucket.list_blobs(prefix=prefix) results = [] for blob in blobs: if blob.name.endswith(('.pdf', '.txt', '.docx')): # Download and process content = blob.download_as_text() analysis = await self.analyze_document(content) results.append({ "file": blob.name, "analysis": analysis }) return results
Best Practices
1. Security
from google.cloud import secretmanager class SecureAgent: """Agent with security best practices""" def __init__(self, project_id: str): self.secret_client = secretmanager.SecretManagerServiceClient() self.project_id = project_id def get_secret(self, secret_id: str) -> str: """Get secret from Secret Manager""" name = f"projects/{self.project_id}/secrets/{secret_id}/versions/latest" response = self.secret_client.access_secret_version(request={"name": name}) return response.payload.data.decode("UTF-8") def sanitize_input(self, user_input: str) -> str: """Sanitize user input""" # Remove potential injection attempts sanitized = user_input.replace("```", "") sanitized = sanitized.replace("<script>", "") # Add more sanitization as needed return sanitized
2. Error Handling
from tenacity import retry, stop_after_attempt, wait_exponential import logging class ResilientAgent: """Agent with robust error handling""" @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=60) ) async def process_with_retry(self, request): """Process with automatic retry""" try: response = await self._process(request) return response except Exception as e: logging.error(f"Processing failed: {e}") raise
Examples Repository
Complete working examples from Google Cloud repositories:
- Text Generation -
/gemini/function-calling/ - RAG Implementation -
/gemini/use-cases/retrieval-augmented-generation/ - Multi-modal Processing -
/gemini/multimodality/ - Agent Builder -
/agent-builder/ - Production Templates - From
agent-starter-pack
Resources
- Vertex AI Documentation
- Gemini API Reference
- Agent Builder Guide
- GitHub: GoogleCloudPlatform/generative-ai
- GitHub: GoogleCloudPlatform/agent-starter-pack
Version: 1.0.0 Last Updated: October 2025 Author: Jeremy Longshore Based on: Official Google Cloud repositories License: MIT