Antigravity-awesome-skills gcp-cloud-run
Specialized skill for building production-ready serverless
git clone https://github.com/sickn33/antigravity-awesome-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/sickn33/antigravity-awesome-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/antigravity-awesome-skills-claude/skills/gcp-cloud-run" ~/.claude/skills/sickn33-antigravity-awesome-skills-gcp-cloud-run && rm -rf "$T"
plugins/antigravity-awesome-skills-claude/skills/gcp-cloud-run/SKILL.md- rm -rf on root/home
- pip install
- references .env files
- references API keys
GCP Cloud Run
Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub.
Principles
- Cloud Run for containers, Functions for simple event handlers
- Optimize for cold starts with startup CPU boost and min instances
- Set concurrency based on workload (start with 8, adjust)
- Memory includes /tmp filesystem - plan accordingly
- Use VPC Connector only when needed (adds latency)
- Containers should start fast and be stateless
- Handle signals gracefully for clean shutdown
Patterns
Cloud Run Service Pattern
Containerized web service on Cloud Run
When to use: Web applications and APIs,Need any runtime or library,Complex services with multiple endpoints,Stateless containerized workloads
# Dockerfile - Multi-stage build for smaller image FROM node:20-slim AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM node:20-slim WORKDIR /app # Copy only production dependencies COPY --from=builder /app/node_modules ./node_modules COPY src ./src COPY package.json ./ # Cloud Run uses PORT env variable ENV PORT=8080 EXPOSE 8080 # Run as non-root user USER node CMD ["node", "src/index.js"]
// src/index.js const express = require('express'); const app = express(); app.use(express.json()); // Health check endpoint app.get('/health', (req, res) => { res.status(200).send('OK'); }); // API routes app.get('/api/items/:id', async (req, res) => { try { const item = await getItem(req.params.id); res.json(item); } catch (error) { console.error('Error:', error); res.status(500).json({ error: 'Internal server error' }); } }); // Graceful shutdown process.on('SIGTERM', () => { console.log('SIGTERM received, shutting down gracefully'); server.close(() => { console.log('Server closed'); process.exit(0); }); }); const PORT = process.env.PORT || 8080; const server = app.listen(PORT, () => { console.log(`Server listening on port ${PORT}`); });
# cloudbuild.yaml steps: # Build the container image - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA', '.'] # Push the container image - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'] # Deploy to Cloud Run - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: - 'run' - 'deploy' - 'my-service' - '--image=gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA' - '--region=us-central1' - '--platform=managed' - '--allow-unauthenticated' - '--memory=512Mi' - '--cpu=1' - '--min-instances=1' - '--max-instances=100' - '--concurrency=80' - '--cpu-boost' images: - 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'
Structure
project/ ├── Dockerfile ├── .dockerignore ├── src/ │ ├── index.js │ └── routes/ ├── package.json └── cloudbuild.yaml
Gcloud_deploy
Direct gcloud deployment
gcloud run deploy my-service
--source .
--region us-central1
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 1
--max-instances 100
--concurrency 80
--cpu-boost
Cloud Run Functions Pattern
Event-driven functions (formerly Cloud Functions)
When to use: Simple event handlers,Pub/Sub message processing,Cloud Storage triggers,HTTP webhooks
// HTTP Function // index.js const functions = require('@google-cloud/functions-framework'); functions.http('helloHttp', (req, res) => { const name = req.query.name || req.body.name || 'World'; res.send(`Hello, ${name}!`); });
// Pub/Sub Function const functions = require('@google-cloud/functions-framework'); functions.cloudEvent('processPubSub', (cloudEvent) => { // Decode Pub/Sub message const message = cloudEvent.data.message; const data = message.data ? JSON.parse(Buffer.from(message.data, 'base64').toString()) : {}; console.log('Received message:', data); // Process message processMessage(data); });
// Cloud Storage Function const functions = require('@google-cloud/functions-framework'); functions.cloudEvent('processStorageEvent', async (cloudEvent) => { const file = cloudEvent.data; console.log(`Event: ${cloudEvent.type}`); console.log(`Bucket: ${file.bucket}`); console.log(`File: ${file.name}`); if (cloudEvent.type === 'google.cloud.storage.object.v1.finalized') { await processUploadedFile(file.bucket, file.name); } });
# Deploy HTTP function gcloud functions deploy hello-http \ --gen2 \ --runtime nodejs20 \ --trigger-http \ --allow-unauthenticated \ --region us-central1 # Deploy Pub/Sub function gcloud functions deploy process-messages \ --gen2 \ --runtime nodejs20 \ --trigger-topic my-topic \ --region us-central1 # Deploy Cloud Storage function gcloud functions deploy process-uploads \ --gen2 \ --runtime nodejs20 \ --trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \ --trigger-event-filters="bucket=my-bucket" \ --region us-central1
Cold Start Optimization Pattern
Minimize cold start latency for Cloud Run
When to use: Latency-sensitive applications,User-facing APIs,High-traffic services
1. Enable Startup CPU Boost
gcloud run deploy my-service \ --cpu-boost \ --region us-central1
2. Set Minimum Instances
gcloud run deploy my-service \ --min-instances 1 \ --region us-central1
3. Optimize Container Image
# Use distroless for minimal image FROM node:20-slim AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM gcr.io/distroless/nodejs20-debian12 WORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY src ./src CMD ["src/index.js"]
4. Lazy Initialize Heavy Dependencies
// Lazy load heavy libraries let bigQueryClient = null; function getBigQueryClient() { if (!bigQueryClient) { const { BigQuery } = require('@google-cloud/bigquery'); bigQueryClient = new BigQuery(); } return bigQueryClient; } // Only initialize when needed app.get('/api/analytics', async (req, res) => { const client = getBigQueryClient(); const results = await client.query({...}); res.json(results); });
5. Increase Memory (More CPU)
# Higher memory = more CPU during startup gcloud run deploy my-service \ --memory 1Gi \ --cpu 2 \ --region us-central1
Optimization_impact
- Startup_cpu_boost: 50% faster cold starts
- Min_instances: Eliminates cold starts for traffic spikes
- Distroless_image: Smaller attack surface, faster pull
- Lazy_init: Defers heavy loading to first request
Concurrency Configuration Pattern
Proper concurrency settings for Cloud Run
When to use: Need to optimize instance utilization,Handle traffic spikes efficiently,Reduce cold starts
Understanding Concurrency
# Default concurrency is 80 # Adjust based on your workload # For I/O-bound workloads (most web apps) gcloud run deploy my-service \ --concurrency 80 \ --cpu 1 # For CPU-bound workloads gcloud run deploy my-service \ --concurrency 1 \ --cpu 1 # For memory-intensive workloads gcloud run deploy my-service \ --concurrency 10 \ --memory 2Gi
Node.js Concurrency
// Node.js is single-threaded but handles I/O concurrently // Use async/await for all I/O operations // GOOD - async I/O app.get('/api/data', async (req, res) => { const [users, products] = await Promise.all([ fetchUsers(), fetchProducts() ]); res.json({ users, products }); }); // BAD - blocking operation app.get('/api/compute', (req, res) => { const result = heavyCpuOperation(); // Blocks other requests! res.json(result); });
Python Concurrency with Gunicorn
FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . # 4 workers for concurrency CMD exec gunicorn --bind :$PORT --workers 4 --threads 2 main:app
# main.py from flask import Flask app = Flask(__name__) @app.route('/api/data') def get_data(): return {'status': 'ok'}
Concurrency_guidelines
- Concurrency=1: Only for CPU-bound or unsafe code
- Concurrency=8 20: Memory-intensive workloads
- Concurrency=80: Default, good for I/O-bound
- Concurrency=250: Maximum, for very lightweight handlers
Pub/Sub Integration Pattern
Event-driven processing with Cloud Pub/Sub
When to use: Asynchronous message processing,Decoupled microservices,Event-driven architecture
Push Subscription to Cloud Run
# Create topic gcloud pubsub topics create orders # Create push subscription to Cloud Run gcloud pubsub subscriptions create orders-push \ --topic orders \ --push-endpoint https://my-service-xxx.run.app/pubsub \ --ack-deadline 600
// Handle Pub/Sub push messages const express = require('express'); const app = express(); app.use(express.json()); app.post('/pubsub', async (req, res) => { // Verify the request is from Pub/Sub if (!req.body.message) { return res.status(400).send('Invalid Pub/Sub message'); } try { // Decode message data const message = req.body.message; const data = message.data ? JSON.parse(Buffer.from(message.data, 'base64').toString()) : {}; console.log('Processing order:', data); await processOrder(data); // Return 200 to acknowledge res.status(200).send('OK'); } catch (error) { console.error('Processing failed:', error); // Return 500 to trigger retry res.status(500).send('Processing failed'); } });
Publishing Messages
const { PubSub } = require('@google-cloud/pubsub'); const pubsub = new PubSub(); async function publishOrder(order) { const topic = pubsub.topic('orders'); const messageBuffer = Buffer.from(JSON.stringify(order)); const messageId = await topic.publishMessage({ data: messageBuffer, attributes: { type: 'order_created', priority: 'high' } }); console.log(`Published message ${messageId}`); return messageId; }
Dead Letter Queue
# Create DLQ topic gcloud pubsub topics create orders-dlq # Update subscription with DLQ gcloud pubsub subscriptions update orders-push \ --dead-letter-topic orders-dlq \ --max-delivery-attempts 5
Cloud SQL Connection Pattern
Connect Cloud Run to Cloud SQL securely
When to use: Need relational database,Migrating existing applications,Complex queries and transactions
# Deploy with Cloud SQL connection gcloud run deploy my-service \ --add-cloudsql-instances PROJECT:REGION:INSTANCE \ --set-env-vars INSTANCE_CONNECTION_NAME="PROJECT:REGION:INSTANCE" \ --set-env-vars DB_NAME="mydb" \ --set-env-vars DB_USER="myuser"
// Using Unix socket connection const { Pool } = require('pg'); const pool = new Pool({ user: process.env.DB_USER, password: process.env.DB_PASS, database: process.env.DB_NAME, // Cloud SQL connector uses Unix socket host: `/cloudsql/${process.env.INSTANCE_CONNECTION_NAME}`, max: 5, // Connection pool size idleTimeoutMillis: 30000, connectionTimeoutMillis: 10000, }); app.get('/api/users', async (req, res) => { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users LIMIT 100'); res.json(result.rows); } finally { client.release(); } });
# Python with SQLAlchemy import os from sqlalchemy import create_engine def get_engine(): instance_connection_name = os.environ["INSTANCE_CONNECTION_NAME"] db_user = os.environ["DB_USER"] db_pass = os.environ["DB_PASS"] db_name = os.environ["DB_NAME"] engine = create_engine( f"postgresql+pg8000://{db_user}:{db_pass}@/{db_name}", connect_args={ "unix_sock": f"/cloudsql/{instance_connection_name}/.s.PGSQL.5432" }, pool_size=5, max_overflow=2, pool_timeout=30, pool_recycle=1800, ) return engine
Best_practices
- Use connection pooling (max 5-10 per instance)
- Set appropriate idle timeouts
- Handle connection errors gracefully
- Consider Cloud SQL Proxy for local development
Secret Manager Integration
Securely manage secrets in Cloud Run
When to use: API keys, database passwords,Service account keys,Any sensitive configuration
# Create secret echo -n "my-secret-value" | gcloud secrets create my-secret --data-file=- # Mount as environment variable gcloud run deploy my-service \ --update-secrets=API_KEY=my-secret:latest # Mount as file volume gcloud run deploy my-service \ --update-secrets=/secrets/api-key=my-secret:latest
// Access mounted as environment variable const apiKey = process.env.API_KEY; // Access mounted as file const fs = require('fs'); const apiKey = fs.readFileSync('/secrets/api-key', 'utf8'); // Access via Secret Manager API (when not mounted) const { SecretManagerServiceClient } = require('@google-cloud/secret-manager'); const client = new SecretManagerServiceClient(); async function getSecret(name) { const [version] = await client.accessSecretVersion({ name: `projects/${projectId}/secrets/${name}/versions/latest` }); return version.payload.data.toString(); }
Sharp Edges
/tmp Filesystem Counts Against Memory
Severity: HIGH
Situation: Writing files to /tmp directory in Cloud Run
Symptoms: Container killed with OOM error. Memory usage spikes unexpectedly. File operations cause container restarts. "Container memory limit exceeded" in logs.
Why this breaks: Cloud Run uses an in-memory filesystem for /tmp. Any files written to /tmp consume memory from your container's allocation.
Common scenarios:
- Downloading files temporarily
- Creating temp processing files
- Libraries caching to /tmp
- Large log buffers
A 512MB container that downloads a 200MB file to /tmp only has ~300MB left for the application.
Recommended fix:
Calculate memory including /tmp usage
# cloudbuild.yaml steps: - name: 'gcr.io/cloud-builders/gcloud' args: - 'run' - 'deploy' - 'my-service' - '--memory=1Gi' # Include /tmp overhead - '--image=gcr.io/$PROJECT_ID/my-service'
Stream instead of buffering
# BAD - buffers entire file in /tmp def process_large_file(bucket_name, blob_name): blob = bucket.blob(blob_name) blob.download_to_filename('/tmp/large_file') with open('/tmp/large_file', 'rb') as f: process(f.read()) # GOOD - stream processing def process_large_file(bucket_name, blob_name): blob = bucket.blob(blob_name) with blob.open('rb') as f: for chunk in iter(lambda: f.read(8192), b''): process_chunk(chunk)
Use Cloud Storage for large files
from google.cloud import storage def process_with_gcs(bucket_name, input_blob, output_blob): client = storage.Client() bucket = client.bucket(bucket_name) # Process directly to/from GCS input_blob = bucket.blob(input_blob) output_blob = bucket.blob(output_blob) with input_blob.open('rb') as reader: with output_blob.open('wb') as writer: for chunk in iter(lambda: reader.read(65536), b''): processed = transform(chunk) writer.write(processed)
Monitor memory usage
import psutil import logging def log_memory(): memory = psutil.virtual_memory() logging.info(f"Memory: {memory.percent}% used, " f"{memory.available / 1024 / 1024:.0f}MB available")
Concurrency=1 Causes Scaling Bottlenecks
Severity: HIGH
Situation: Setting concurrency to 1 for request isolation
Symptoms: Auto-scaling creates many container instances. High latency during traffic spikes. Increased cold starts. Higher costs from more instances.
Why this breaks: Setting concurrency to 1 means each container handles only one request at a time. During traffic spikes:
- 100 concurrent requests = 100 container instances
- Each instance has cold start overhead
- More instances = higher costs
- Scaling takes time, requests queue up
This should only be used when:
- Processing is truly single-threaded
- Memory-heavy per-request processing
- Using thread-unsafe libraries
Recommended fix:
Set appropriate concurrency
# For I/O-bound workloads (most web apps) gcloud run deploy my-service \ --concurrency=80 \ --max-instances=100 # For CPU-bound workloads gcloud run deploy my-service \ --concurrency=4 \ --cpu=2 # Only use 1 when absolutely necessary gcloud run deploy my-service \ --concurrency=1 \ --max-instances=1000 # Be prepared for many instances
Node.js - use async properly
// With high concurrency, ensure async operations const express = require('express'); const app = express(); app.get('/api/data', async (req, res) => { // All I/O should be async const data = await fetchFromDatabase(); const enriched = await enrichData(data); res.json(enriched); }); // Concurrency 80+ is safe for async I/O workloads
Python - use async framework
from fastapi import FastAPI import asyncio import httpx app = FastAPI() @app.get("/api/data") async def get_data(): # Async I/O allows high concurrency async with httpx.AsyncClient() as client: response = await client.get("https://api.example.com/data") return response.json() # Concurrency 80+ safe with async framework
Calculate concurrency
concurrency = memory_limit / per_request_memory Example: - 512MB container - 20MB per request overhead - Safe concurrency: ~25
CPU Throttled When Not Handling Requests
Severity: HIGH
Situation: Running background tasks or processing between requests
Symptoms: Background tasks run extremely slowly. Scheduled work doesn't complete. Metrics collection fails. Connection keep-alive breaks.
Why this breaks: By default, Cloud Run throttles CPU to near-zero when not actively handling a request. This is "CPU only during requests" mode.
Affected operations:
- Background threads
- Connection pool maintenance
- Metrics/telemetry emission
- Scheduled tasks within container
- Cleanup operations after response
Recommended fix:
Enable CPU always allocated
# CPU allocated even outside requests gcloud run deploy my-service \ --cpu-throttling=false \ --min-instances=1 # Note: This increases costs but enables background work
Use startup CPU boost for initialization
# Boost CPU during cold start only gcloud run deploy my-service \ --cpu-boost \ --cpu-throttling=true # Default, throttle after request
Move background work to Cloud Tasks
from google.cloud import tasks_v2 import json def create_background_task(payload): client = tasks_v2.CloudTasksClient() parent = client.queue_path( "my-project", "us-central1", "my-queue" ) task = { "http_request": { "http_method": tasks_v2.HttpMethod.POST, "url": "https://my-service.run.app/process", "body": json.dumps(payload).encode(), "headers": {"Content-Type": "application/json"} } } client.create_task(parent=parent, task=task) # Handle response immediately, background via Cloud Tasks @app.post("/api/order") async def create_order(order: Order): order_id = await save_order(order) # Queue background processing create_background_task({"order_id": order_id}) return {"order_id": order_id, "status": "processing"}
Use Pub/Sub for async processing
# Move heavy processing to separate service steps: # Main service - responds quickly - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'api-service', '--cpu-throttling=true'] # Worker service - processes messages - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'worker-service', '--cpu-throttling=false', '--min-instances=1']
VPC Connector 10-Minute Idle Timeout
Severity: MEDIUM
Situation: Cloud Run service connecting to VPC resources
Symptoms: Connection errors after period of inactivity. "Connection reset" or "Connection refused" errors. Sporadic failures to VPC resources. Database connections drop unexpectedly.
Why this breaks: Cloud Run's VPC connector has a 10-minute idle timeout on connections. If a connection is idle for 10 minutes, it's silently closed.
Affects:
- Database connection pools
- Redis connections
- Internal API connections
- Any persistent VPC connection
Recommended fix:
Configure connection pool with keep-alive
# SQLAlchemy with connection recycling from sqlalchemy import create_engine engine = create_engine( DATABASE_URL, pool_size=5, max_overflow=2, pool_recycle=300, # Recycle connections every 5 minutes pool_pre_ping=True # Validate connection before use )
TCP keep-alive for custom connections
import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
Redis with connection validation
import redis pool = redis.ConnectionPool( host=REDIS_HOST, port=6379, socket_keepalive=True, socket_keepalive_options={ socket.TCP_KEEPIDLE: 60, socket.TCP_KEEPINTVL: 60, socket.TCP_KEEPCNT: 5 }, health_check_interval=30 ) client = redis.Redis(connection_pool=pool)
Use Cloud SQL Proxy sidecar
# Use Cloud SQL connector which handles reconnection # requirements.txt cloud-sql-python-connector[pg8000]
from google.cloud.sql.connector import Connector import sqlalchemy connector = Connector() def getconn(): return connector.connect( "project:region:instance", "pg8000", user="user", password="password", db="database" ) engine = sqlalchemy.create_engine( "postgresql+pg8000://", creator=getconn )
Container Startup Timeout (4 minutes max)
Severity: HIGH
Situation: Deploying containers with slow initialization
Symptoms: Deployment fails with "Container failed to start". Service never becomes healthy. "Revision failed to become ready" errors. Works locally but fails on Cloud Run.
Why this breaks: Cloud Run expects your container to start listening on PORT within 4 minutes (240 seconds). If it doesn't, the instance is killed.
Common causes:
- Heavy framework initialization (ML models, etc.)
- Waiting for external dependencies at startup
- Large dependency loading
- Database migrations on startup
Recommended fix:
Enable startup CPU boost
gcloud run deploy my-service \ --cpu-boost \ --startup-cpu-boost
Lazy initialization
from functools import lru_cache from fastapi import FastAPI app = FastAPI() # Don't load at import time model = None @lru_cache() def get_model(): global model if model is None: # Load on first request, not at startup model = load_heavy_model() return model @app.get("/predict") async def predict(data: dict): model = get_model() # Loads on first call only return model.predict(data) # Startup is fast - model loads on first request
Start listening immediately
import asyncio from fastapi import FastAPI import uvicorn app = FastAPI() # Global state for async initialization initialized = asyncio.Event() @app.on_event("startup") async def startup(): # Start background initialization asyncio.create_task(async_init()) async def async_init(): # Heavy initialization happens after server starts await load_models() await warm_up_connections() initialized.set() @app.get("/ready") async def ready(): if not initialized.is_set(): raise HTTPException(503, "Still initializing") return {"status": "ready"} @app.get("/health") async def health(): # Always respond - health check passes return {"status": "healthy"}
Use multi-stage builds
# Build stage - slow FROM python:3.11 as builder WORKDIR /app COPY requirements.txt . RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt # Runtime stage - fast startup FROM python:3.11-slim WORKDIR /app COPY --from=builder /wheels /wheels RUN pip install --no-cache /wheels/* && rm -rf /wheels COPY . . CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
Run migrations separately
# Don't migrate on startup - use Cloud Build steps: # Run migrations first - name: 'gcr.io/cloud-builders/gcloud' entrypoint: 'bash' args: - '-c' - | gcloud run jobs execute migrate-job --wait # Then deploy - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'my-service', ...]
Second Generation Execution Environment Differences
Severity: MEDIUM
Situation: Migrating to or using Cloud Run second-gen execution environment
Symptoms: Network behavior changes. Different syscall support. File system behavior differences. Container behaves differently than in first-gen.
Why this breaks: Cloud Run's second-generation execution environment uses a different sandbox (gVisor) with different characteristics:
- More Linux syscalls supported
- Full /proc and /sys access
- Different network stack
- No automatic HTTPS redirect
- Different tmp filesystem behavior
Recommended fix:
Explicitly set execution environment
# First generation (legacy) gcloud run deploy my-service \ --execution-environment=gen1 # Second generation (recommended for most) gcloud run deploy my-service \ --execution-environment=gen2
Handle network differences
# Second-gen doesn't auto-redirect HTTP to HTTPS from fastapi import FastAPI, Request from fastapi.responses import RedirectResponse app = FastAPI() @app.middleware("http") async def redirect_https(request: Request, call_next): # Check X-Forwarded-Proto header if request.headers.get("X-Forwarded-Proto") == "http": url = request.url.replace(scheme="https") return RedirectResponse(url, status_code=301) return await call_next(request)
GPU access (second-gen only)
# GPUs only available in second-gen gcloud run deploy ml-service \ --execution-environment=gen2 \ --gpu=1 \ --gpu-type=nvidia-l4
Check execution environment
import os def get_execution_environment(): # Second-gen has different /proc structure try: with open('/proc/version', 'r') as f: version = f.read() if 'gVisor' in version: return 'gen2' except: pass return 'gen1'
Request Timeout Configuration Mismatch
Severity: MEDIUM
Situation: Long-running requests or background processing
Symptoms: Requests terminated before completion. 504 Gateway Timeout errors. Processing stops unexpectedly. Inconsistent timeout behavior.
Why this breaks: Cloud Run has multiple timeout configurations that must align:
- Request timeout (default 300s, max 3600s for HTTP, 60m for gRPC)
- Client timeout
- Downstream service timeouts
- Load balancer timeout (for external access)
Recommended fix:
Set consistent timeouts
# Increase request timeout (max 3600s for HTTP) gcloud run deploy my-service \ --timeout=900 # 15 minutes
Handle long-running with webhooks
from fastapi import FastAPI, BackgroundTasks import httpx app = FastAPI() @app.post("/process") async def process(data: dict, background_tasks: BackgroundTasks): task_id = create_task_id() # Start background processing background_tasks.add_task( long_running_process, task_id, data, data.get("callback_url") ) # Return immediately return {"task_id": task_id, "status": "processing"} async def long_running_process(task_id, data, callback_url): result = await heavy_computation(data) # Callback when done if callback_url: async with httpx.AsyncClient() as client: await client.post(callback_url, json={ "task_id": task_id, "result": result })
Use Cloud Tasks for reliable long-running
from google.cloud import tasks_v2 def create_long_running_task(data): client = tasks_v2.CloudTasksClient() parent = client.queue_path(PROJECT, REGION, "long-tasks") task = { "http_request": { "http_method": tasks_v2.HttpMethod.POST, "url": "https://worker.run.app/process", "body": json.dumps(data).encode(), "headers": {"Content-Type": "application/json"} }, "dispatch_deadline": {"seconds": 1800} # 30 min } return client.create_task(parent=parent, task=task)
Streaming for long responses
from fastapi import FastAPI from fastapi.responses import StreamingResponse @app.get("/large-report") async def large_report(): async def generate(): for chunk in process_large_data(): yield chunk return StreamingResponse(generate(), media_type="text/plain")
Validation Checks
Hardcoded GCP Credentials
Severity: ERROR
GCP credentials must never be hardcoded in source code
Message: Hardcoded GCP service account credentials. Use Secret Manager or Workload Identity.
GCP API Key in Source Code
Severity: ERROR
API keys should use Secret Manager
Message: Hardcoded GCP API key. Use Secret Manager.
Credentials JSON File in Repository
Severity: ERROR
Service account JSON files should not be in source control
Message: Credentials file detected. Add to .gitignore and use Secret Manager.
Running as Root User
Severity: WARNING
Containers should not run as root for security
Message: Dockerfile runs as root. Add USER directive for security.
Missing Health Check in Dockerfile
Severity: INFO
Cloud Run uses HTTP health checks, Dockerfile HEALTHCHECK is optional
Message: No HEALTHCHECK in Dockerfile. Cloud Run uses its own health checks.
Hardcoded Port in Application
Severity: WARNING
Port should come from PORT environment variable
Message: Hardcoded port. Use PORT environment variable for Cloud Run.
Large File Writes to /tmp
Severity: WARNING
/tmp uses container memory, large writes can cause OOM
Message: /tmp writes consume memory. Consider Cloud Storage for large files.
Synchronous File Operations
Severity: WARNING
Sync file ops block the event loop in async apps
Message: Synchronous file operations. Use async versions for better concurrency.
Global Mutable State
Severity: WARNING
Global state issues with concurrent requests
Message: Global mutable state may cause issues with concurrent requests.
Thread-Unsafe Singleton Pattern
Severity: WARNING
Singletons need thread safety for concurrency > 1
Message: Singleton pattern - ensure thread safety if using concurrency > 1.
Collaboration
Delegation Triggers
- user needs AWS serverless -> aws-serverless (Lambda, API Gateway, SAM)
- user needs Azure containers -> azure-functions (Azure Container Apps, Functions)
- user needs database design -> postgres-wizard (Cloud SQL design, AlloyDB)
- user needs authentication -> auth-specialist (Firebase Auth, Identity Platform)
- user needs AI integration -> llm-architect (Vertex AI, Cloud Run + LLM)
- user needs workflow orchestration -> workflow-automation (Cloud Workflows, Eventarc)
When to Use
Use this skill when the request clearly matches the capabilities and patterns described above.
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.