Claude-skill-registry bigquery
Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/bigquery" ~/.claude/skills/majiayu000-claude-skill-registry-bigquery && rm -rf "$T"
skills/data/bigquery/SKILL.mdBigQuery CLI Skill
You are a BigQuery specialist using the
bigquery CLI tool. This skill provides comprehensive guidance for working with Google BigQuery through a unified Rust-based CLI with query execution, template management, and server modes.
Core Capabilities
The
bigquery CLI provides:
- Authentication: Check status and login with gcloud
- Query Execution: Run SQL queries with cost awareness and confirmation prompts
- Dry Run: Estimate query costs without execution
- Dataset Operations: List datasets in a project
- Table Operations: List, describe, insert, load, and manage external tables
- Template System: Named query templates with parameter substitution
- MCP Server: Semantic search via stdio or HTTP modes
- LSP Server: SQL language server for editor integration
Authentication
Check Authentication Status
# Check if authenticated and verify required scopes bigquery auth check # Will show: # - Authentication status # - Active account # - BigQuery scopes availability
Login with gcloud
# Authenticate with gcloud including all required BigQuery scopes bigquery auth login # This will: # 1. Run gcloud auth login # 2. Ensure all necessary BigQuery scopes are granted # 3. Verify authentication succeeded
Best Practice: Always run
bigquery auth check first to verify authentication before operations.
Query Operations
Running Queries
# Basic query execution (interactive cost confirmation) bigquery query "SELECT * FROM dataset.table LIMIT 10" # Skip cost confirmation for automation bigquery query --yes "SELECT COUNT(*) FROM dataset.table" # JSON output (default) bigquery query "SELECT * FROM dataset.table LIMIT 5" # Text/table output bigquery query --format text "SELECT * FROM dataset.table LIMIT 5"
Cost Awareness: The query command automatically:
- Estimates query cost before execution
- Displays bytes to be processed
- Prompts for confirmation (unless
is used)--yes - Prevents accidental expensive queries
Query Output Formats
# JSON output (default, machine-readable) bigquery query "SELECT * FROM dataset.table" bigquery query --format json "SELECT * FROM dataset.table" # Text output (human-readable table) bigquery query --format text "SELECT * FROM dataset.table"
Dry Run (Cost Estimation)
# Estimate cost without executing bigquery dry-run "SELECT * FROM large_dataset.table WHERE date >= '2025-01-01'" # Returns: # - Bytes that would be processed # - Estimated cost # - No actual data
Use dry-run to:
- Estimate costs before running expensive queries
- Validate query syntax
- Check partition pruning effectiveness
- Test queries in CI/CD pipelines
Dataset Operations
Listing Datasets
# List datasets in current project (text format, default) bigquery datasets list my-project # JSON output bigquery datasets list my-project --format json # Example output shows: # - Dataset ID # - Location # - Creation time # - Labels (if any)
Note: Dataset reference format is
project.dataset or just project to list all datasets.
Table Operations
Listing Tables
# List tables in a dataset (text format, first 10) bigquery tables list my-project.my-dataset # JSON output bigquery tables list my-project.my-dataset --format json # Limit results bigquery tables list my-project.my-dataset --limit 20 # Maximum limit is 100 bigquery tables list my-project.my-dataset --limit 100
Describing Table Schema
# Show table schema and metadata (text format) bigquery tables describe my-project.my-dataset.my-table # JSON output bigquery tables describe my-project.my-dataset.my-table --format json # Output includes: # - Column names and types # - Nullability (NULLABLE, REQUIRED, REPEATED) # - Mode information # - Table metadata
Inserting Rows (Small Datasets)
Best for <1000 rows. Uses streaming insert API for immediate availability.
JSONL (Newline-Delimited JSON) Format
From JSONL File:
# Create sample JSONL file cat > users.jsonl <<EOF {"id": "1", "name": "Alice Johnson", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob Smith", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie Brown", "email": "charlie@example.com", "age": 35} EOF # Insert from JSONL file bigquery tables insert my-project.dataset.users \ --data users.jsonl --format json
From JSONL Stream (stdin):
# Stream from command output echo '{"id": "1", "name": "Alice", "email": "alice@example.com"}' | \ bigquery tables insert my-project.dataset.users --data - --format json # Stream from multiple sources (heredoc) cat << EOF | bigquery tables insert my-project.dataset.users --data - --format json {"id": "1", "name": "Alice", "email": "alice@example.com", "age": 30} {"id": "2", "name": "Bob", "email": "bob@example.com", "age": 25} {"id": "3", "name": "Charlie", "email": "charlie@example.com", "age": 35} EOF # Stream from application output my-etl-tool --output jsonl | bigquery tables insert my-project.dataset.events --data - # Stream from compressed file gunzip -c logs.jsonl.gz | bigquery tables insert my-project.dataset.logs --data - # Stream from jq transformation cat raw_data.json | jq -c '.records[]' | \ bigquery tables insert my-project.dataset.processed --data -
JSONL Format Requirements:
- Each line is a separate JSON object
- Empty lines are automatically skipped
- No commas between objects
- Ideal for streaming and large datasets
- Format:
{"field1":"value1","field2":"value2"}\n
CSV Format
From CSV File:
# Create sample CSV file cat > users.csv <<EOF id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,"Charlie Brown, Jr.",charlie@example.com,35 EOF # Insert from CSV file bigquery tables insert my-project.dataset.users \ --data users.csv --format csv
From CSV Stream (stdin):
# Stream from heredoc cat << EOF | bigquery tables insert my-project.dataset.users --data - --format csv id,name,email,age 1,Alice Johnson,alice@example.com,30 2,Bob Smith,bob@example.com,25 3,Charlie Brown,charlie@example.com,35 EOF # Stream from application output ./generate_report.sh | bigquery tables insert my-project.dataset.reports --data - --format csv # Stream from compressed CSV gunzip -c data.csv.gz | bigquery tables insert my-project.dataset.imports --data - # Stream from curl/API response curl -s https://api.example.com/export.csv | \ bigquery tables insert my-project.dataset.api_data --data - --format csv # Transform and stream CSV cat raw.csv | tail -n +2 | awk '{print tolower($0)}' | \ bigquery tables insert my-project.dataset.cleaned --data - --format csv
CSV Format Requirements:
- First row must contain column headers matching BigQuery table schema
- Values are inserted as strings (BigQuery will coerce types)
- Supports quoted fields, escaped quotes, and newlines (RFC 4180 compliant)
- Headers are case-sensitive and must match table column names
Additional Insert Options
# Insert inline JSON (single object) bigquery tables insert my-project.dataset.users \ --json '{"id": "1", "name": "Alice", "email": "alice@example.com"}' # Insert inline JSON array bigquery tables insert my-project.dataset.users \ --json '[{"id": "1", "name": "Alice"}, {"id": "2", "name": "Bob"}]' # Dry-run validation (no data inserted) bigquery tables insert my-project.dataset.users \ --data users.csv --format csv --dry-run # Skip invalid rows instead of failing bigquery tables insert my-project.dataset.users \ --data users.csv --format csv --skip-invalid # Ignore unknown fields in data bigquery tables insert my-project.dataset.users \ --data users.csv --format csv --ignore-unknown # Combine options for production pipelines cat production_data.jsonl | \ bigquery tables insert my-project.dataset.production \ --data - --format json \ --skip-invalid \ --ignore-unknown
Insert Options:
: Inline JSON data (object or array)--json <JSON>
: Path to data file, or--data <PATH>
for stdin-
: Data format (json or csv, default: json)--format <FORMAT>
: Validate without inserting--dry-run
: Skip invalid rows instead of failing--skip-invalid
: Ignore unknown fields in data--ignore-unknown
: Skip confirmation prompts--yes
Loading Data (Large Datasets)
Best for >10MB files or >1000 rows. Uses BigQuery load jobs.
⚠️ IMPORTANT: Local file loading requires GCS staging bucket configuration.
- If you get "The specified bucket does not exist" error, use
for datasets <1000 rows insteadtables insert - For larger datasets, upload to GCS first, then use
bigquery tables load gs://...
# Load from Cloud Storage URI (RECOMMENDED - no bucket config needed) bigquery tables load my-project.dataset.users \ gs://my-bucket/data.csv --format csv # Load from local CSV file (requires GCS staging bucket configured) bigquery tables load my-project.dataset.users data.csv --format csv # Load with schema auto-detection bigquery tables load my-project.dataset.new_table data.csv \ --format csv --autodetect # Load with replace write disposition (truncates table first) bigquery tables load my-project.dataset.users data.csv \ --format csv --write-disposition replace # Load JSON file bigquery tables load my-project.dataset.events events.json \ --format json # Supported formats: csv, json, avro, parquet, orc bigquery tables load my-project.dataset.table data.parquet \ --format parquet # Dry-run validation (no data loaded) bigquery tables load my-project.dataset.users data.csv \ --format csv --dry-run # Allow some bad records (skip up to 100 invalid rows) bigquery tables load my-project.dataset.users data.csv \ --format csv --max-bad-records 100 # Ignore unknown fields bigquery tables load my-project.dataset.users data.csv \ --format csv --ignore-unknown # Skip confirmation prompts (for automation/CI) bigquery tables load my-project.dataset.users data.csv \ --format csv --write-disposition replace --yes
Load Job Features:
- GCS Staging Bucket Required: Local file loading needs GCS bucket configuration
- Real-time progress tracking with exponential backoff
- Automatic cleanup of temporary files after completion
- Write modes:
(default) orappend
(truncate first)replace - Safety confirmations for destructive operations
- Configurable error tolerance with
--max-bad-records
When to Use:
- Large datasets (>1000 rows or >10MB)
- Data already in Cloud Storage
- Bulk data migrations
When NOT to Use:
- Small datasets (<1000 rows) → Use
instead (no GCS required)tables insert - Don't have GCS staging bucket configured → Use
or upload to GCS firsttables insert
Load Options:
: csv, json, avro, parquet, orc (default: csv)--format <FORMAT>
: append or replace (default: append)--write-disposition <DISPOSITION>
: Auto-detect schema from source files--autodetect
: Validate without loading--dry-run
: Maximum bad records before failing--max-bad-records <N>
: Ignore unknown fields--ignore-unknown
: Skip confirmation prompts--yes
Extracting Data
Export table data to Cloud Storage in various formats:
# Extract table to Cloud Storage as CSV bigquery tables extract my-project.dataset.users \ gs://my-bucket/exports/users.csv --format csv # Extract as JSON bigquery tables extract my-project.dataset.events \ gs://my-bucket/exports/events-*.json --format json # Extract with compression bigquery tables extract my-project.dataset.large_table \ gs://my-bucket/exports/data-*.csv.gz --format csv --compression gzip # Extract as Avro with Snappy compression bigquery tables extract my-project.dataset.events \ gs://my-bucket/exports/events-*.avro --format avro --compression snappy # Extract as Parquet bigquery tables extract my-project.dataset.analytics \ gs://my-bucket/exports/analytics.parquet --format parquet # CSV with custom delimiter and header bigquery tables extract my-project.dataset.data \ gs://my-bucket/data.csv \ --format csv \ --field-delimiter "|" \ --print-header # Dry-run to validate configuration bigquery tables extract my-project.dataset.users \ gs://my-bucket/users.csv --format csv --dry-run # Skip confirmation prompt bigquery tables extract my-project.dataset.large \ gs://my-bucket/export.csv --format csv --yes
Supported Formats: CSV, JSON (newline-delimited), Avro, Parquet Compression: none, gzip, snappy (Avro/Parquet only)
External Tables
External tables reference data in Cloud Storage without copying it to BigQuery.
Creating External Tables
# Create CSV external table bigquery tables create-external my-project.dataset.external_table \ --source-uri gs://bucket/data.csv \ --format csv \ --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP" # Create with auto-detected schema bigquery tables create-external my-project.dataset.external_table \ --source-uri gs://bucket/data.csv \ --format csv \ --autodetect # Multiple source URIs (comma-separated) bigquery tables create-external my-project.dataset.external_table \ --source-uri "gs://bucket/file1.csv,gs://bucket/file2.csv" \ --format csv \ --autodetect # Multiple source URIs (multiple flags) bigquery tables create-external my-project.dataset.external_table \ --source-uri gs://bucket/file1.csv \ --source-uri gs://bucket/file2.csv \ --format csv \ --autodetect # CSV-specific options bigquery tables create-external my-project.dataset.external_table \ --source-uri gs://bucket/data.csv \ --format csv \ --schema "id:INTEGER,name:STRING" \ --field-delimiter "," \ --skip-leading-rows 1 # Other formats (Parquet, JSON, Avro, ORC) bigquery tables create-external my-project.dataset.parquet_table \ --source-uri gs://bucket/data.parquet \ --format parquet \ --autodetect bigquery tables create-external my-project.dataset.json_table \ --source-uri gs://bucket/data.jsonl \ --format json \ --autodetect
External Table Options:
: Cloud Storage URI(s) - required--source-uri <URI>
: csv, json, avro, parquet, orc - required--format <FORMAT>
: Schema definition (column:type,column:type,...)--schema <SCHEMA>
: Auto-detect schema from source files--autodetect
: CSV field delimiter (default: ,)--field-delimiter <DELIMITER>
: CSV header rows to skip--skip-leading-rows <N>
Updating External Tables
# Update source URIs bigquery tables update-external my-project.dataset.external_table \ --source-uri gs://bucket/new-data.csv # Update schema bigquery tables update-external my-project.dataset.external_table \ --schema "id:INTEGER,name:STRING,email:STRING" # Update CSV options bigquery tables update-external my-project.dataset.external_table \ --field-delimiter "|" \ --skip-leading-rows 2 # Update multiple properties bigquery tables update-external my-project.dataset.external_table \ --source-uri gs://bucket/new-data.csv \ --schema "id:INTEGER,name:STRING,updated_at:TIMESTAMP" \ --skip-leading-rows 1
Template System
Named query templates allow you to save frequently-used queries with parameter placeholders.
Listing Templates
# List all available templates (text format) bigquery templates list # JSON output bigquery templates list --format json # Shows: # - Template name # - Description # - Parameters # - Query preview
Searching Templates
# Search by name or description bigquery templates search "customer" bigquery templates search "daily metrics" # JSON output bigquery templates search "analytics" --format json
Validating Templates
# Validate template for parameter consistency bigquery templates validate my-template # Checks: # - Parameter definitions match query placeholders # - Required parameters are defined # - Parameter types are valid
Running Templates
# Run template with default parameters bigquery templates run my-template # Override parameters bigquery templates run daily-report \ --param date=2025-01-15 \ --param region=US # Multiple parameters bigquery templates run customer-analysis \ --param customer_id=CUST123 \ --param start_date=2025-01-01 \ --param end_date=2025-01-31 # JSON output bigquery templates run my-template --format json # Skip cost confirmation bigquery templates run expensive-query --yes
Template Run Options:
: Parameter override (can be used multiple times)--param <KEY=VALUE>
: Output format (json or text, default: json)--format <FORMAT>
: Skip cost confirmation prompt--yes
Template Workflow Example
# 1. Search for templates bigquery templates search "revenue" # 2. Validate template before running bigquery templates validate monthly-revenue # 3. Run with parameters bigquery templates run monthly-revenue \ --param month=2025-01 \ --param min_amount=1000 # 4. Run in automation (skip confirmation) bigquery templates run monthly-revenue \ --param month=2025-01 \ --yes \ --format json > output.json
Use templates for:
- Standardized reporting queries
- Common analytics patterns
- Scheduled data pipelines
- Team query sharing
- Reducing query errors
MCP Server Integration
The BigQuery MCP server provides semantic search and natural language query capabilities via Model Context Protocol.
Starting MCP Server
STDIO Mode (for local clients):
# Start MCP server in stdio mode bigquery mcp stdio # Server will: # - Accept MCP protocol messages on stdin # - Send responses on stdout # - Expose BigQuery tools to MCP clients
HTTP Mode (for network clients):
# Start HTTP MCP server on default port 8080 bigquery mcp http # Specify custom port bigquery mcp http --port 3000 # Server provides: # - HTTP endpoint for MCP protocol # - JSON-RPC over HTTP # - Remote access to BigQuery tools
MCP Server Capabilities
The MCP server exposes these tools through the Model Context Protocol:
- semantic_search: Search tables using natural language
- execute_query: Run SQL queries with automatic formatting
- get_schema: Retrieve table schemas
- list_tables: List available tables
- list_datasets: List available datasets
- explain_query: Get query execution plan
- optimize_query: Suggest query optimizations
- run_template: Execute named templates with parameters
MCP Configuration
Configure in Claude Code or other MCP-enabled applications:
STDIO Mode (
.claude/mcp.json or similar):
{ "mcpServers": { "bigquery": { "command": "bigquery", "args": ["mcp", "stdio"], "env": { "GOOGLE_CLOUD_PROJECT": "my-project" } } } }
HTTP Mode:
{ "mcpServers": { "bigquery": { "url": "http://localhost:8080", "transport": "http" } } }
MCP Usage Patterns
When using BigQuery MCP through clients:
Semantic Search:
"Find all tables containing customer purchase data from the last 30 days" → MCP translates to appropriate SQL query
Schema Discovery:
"What columns are in the analytics.events table?" → MCP returns schema information
Natural Language Queries:
"Show me total revenue by region for Q1 2025" → MCP generates and executes SQL
Template Execution:
"Run the monthly revenue template for January 2025" → MCP executes template with parameters
LSP Integration
The BigQuery LSP provides SQL language features in text editors.
Starting LSP Server
# Start LSP server bigquery lsp # Server provides: # - Language Server Protocol communication # - SQL syntax validation # - Schema-aware completions # - Query formatting # - Hover documentation
LSP Features
- SQL syntax highlighting: Proper tokenization and highlighting
- Schema completion: Table and column suggestions based on project schema
- Query validation: Real-time syntax and semantic checks
- Hover documentation: Table and column info on hover
- Go to definition: Navigate to table definitions
- Query formatting: Auto-format SQL queries
- Diagnostics: Show errors and warnings inline
Editor Configuration
Neovim:
-- In nvim/lua/bigquery-lsp.lua or init.lua vim.api.nvim_create_autocmd("FileType", { pattern = { "sql", "bq", "bigquery" }, callback = function() vim.lsp.start({ name = "bigquery-lsp", cmd = { "bigquery", "lsp" }, root_dir = vim.fn.getcwd(), }) end, })
VS Code (in
settings.json or language server config):
{ "bigquery-lsp": { "command": "bigquery", "args": ["lsp"], "filetypes": ["sql", "bq", "bigquery"] } }
Helix (in
languages.toml):
[[language]] name = "sql" language-servers = ["bigquery-lsp"] [language-server.bigquery-lsp] command = "bigquery" args = ["lsp"]
Common Workflows
Workflow 1: Exploratory Data Analysis
# 1. Verify authentication bigquery auth check # 2. List available datasets bigquery datasets list my-project # 3. List tables in dataset bigquery tables list my-project.analytics # 4. Check table schema bigquery tables describe my-project.analytics.events # 5. Preview data (text format for readability) bigquery query --format text \ "SELECT * FROM my-project.analytics.events LIMIT 10" # 6. Get row count bigquery query "SELECT COUNT(*) as total FROM my-project.analytics.events" # 7. Check data distribution bigquery query --format text " SELECT DATE(timestamp) as date, COUNT(*) as events FROM my-project.analytics.events GROUP BY date ORDER BY date DESC LIMIT 30 "
Workflow 2: Cost-Aware Query Development
# 1. Dry run to estimate cost bigquery dry-run " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' " # 2. If cost is acceptable, run query bigquery query " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' " # 3. For automation, skip confirmation bigquery query --yes " SELECT * FROM my-project.large_dataset.table WHERE date >= '2025-01-01' " > results.json
Workflow 3: Template-Based Reporting
# 1. Search for relevant templates bigquery templates search "daily" # 2. Validate template bigquery templates validate daily-metrics # 3. Run template with parameters bigquery templates run daily-metrics \ --param date=$(date +%Y-%m-%d) \ --param region=US \ --format json > daily-report.json # 4. Schedule in cron or CI/CD # 0 1 * * * bigquery templates run daily-metrics --param date=$(date +%Y-%m-%d) --yes
Workflow 4: External Data Analysis
# 1. Create external table pointing to GCS bigquery tables create-external my-project.staging.raw_logs \ --source-uri gs://logs-bucket/2025-01-*.json \ --format json \ --autodetect # 2. Query external table bigquery query " SELECT timestamp, user_id, action FROM my-project.staging.raw_logs WHERE action = 'purchase' LIMIT 100 " # 3. Update external table when new files arrive bigquery tables update-external my-project.staging.raw_logs \ --source-uri gs://logs-bucket/2025-02-*.json
Workflow 5: Data Loading Pipeline
# 1. Load initial data bigquery tables load my-project.dataset.events \ gs://bucket/events-2025-01-01.csv \ --format csv \ --write-disposition replace # 2. Append incremental data bigquery tables load my-project.dataset.events \ gs://bucket/events-2025-01-02.csv \ --format csv \ --write-disposition append # 3. Verify data loaded bigquery query " SELECT DATE(timestamp) as date, COUNT(*) as count FROM my-project.dataset.events GROUP BY date ORDER BY date "
Workflow 6: Real-Time Data Insertion
# 1. Insert single event (inline JSON) bigquery tables insert my-project.dataset.events \ --json '{"user_id": "U123", "event": "click", "timestamp": "2025-01-15T10:00:00Z"}' # 2. Stream JSONL from application my-app --output jsonl | bigquery tables insert my-project.dataset.events --data - --format json # 3. Insert batch from JSONL file bigquery tables insert my-project.dataset.events \ --data events.jsonl --format json # 4. Stream with transformation and error handling cat raw_events.json | jq -c '.events[]' | \ bigquery tables insert my-project.dataset.events \ --data - --format json \ --skip-invalid \ --ignore-unknown
Best Practices
Query Development
- Always dry-run first: Use
to estimate costsbigquery dry-run - Use templates: Create templates for repeated queries
- Validate before running: Check syntax and cost before execution
- Use text format for exploration:
for human-readable tables--format text - Use JSON for automation:
for machine processing--format json - Skip confirmations in scripts: Use
flag for automation--yes
Cost Management
- Dry run expensive queries: Always estimate with
bigquery dry-run - Monitor bytes processed: Check query cost estimates before running
- Use partition pruning: Filter on partitioned columns in WHERE clauses
- Limit result sets: Use LIMIT for exploratory queries
- Use templates: Standardize queries to avoid mistakes
- Leverage external tables: Avoid copying data when querying directly from GCS
Authentication
- Check auth first: Run
before operationsbigquery auth check - Use service accounts: For automation and CI/CD
- Verify scopes: Ensure all required BigQuery scopes are granted
- Re-authenticate when needed:
if check failsbigquery auth login
Template Management
- Use descriptive names: Make templates easy to find
- Document parameters: Include parameter descriptions in templates
- Validate before use: Run
before executionbigquery templates validate - Search before creating: Check if similar template exists
- Version control templates: Store template definitions in git
Data Loading
- Choose the right method:
- Use
for <1000 rows (streaming insert API, immediate availability)insert - Use
for >10MB files or >1000 rows (load jobs with GCS upload)load
- Use
- Use JSONL for streaming: Newline-delimited JSON is ideal for streaming pipelines
- Stream from stdin: Use
to pipe data from applications or transformations--data - - Validate before loading: Use
flag to test configurations--dry-run - Handle bad records: Set
for messy data--max-bad-records - Choose write disposition:
for full refresh,replace
for incrementalappend - Use external tables: For data that changes frequently in GCS (no data copying)
- Use appropriate formats: CSV for simple data, JSON/JSONL for complex, Parquet/Avro for large datasets
MCP Server
- Use stdio for local: Prefer stdio mode for local MCP clients
- Use HTTP for remote: Use HTTP mode for networked deployments
- Secure HTTP endpoints: Put HTTP server behind authentication/firewall
- Monitor server logs: Check for errors and performance issues
- Set appropriate port: Choose non-conflicting port for HTTP mode
LSP Integration
- Configure per-project: Set up LSP for SQL files in your editor
- Use schema completion: Leverage auto-complete for table/column names
- Check diagnostics: Fix errors and warnings shown inline
- Format queries: Use LSP formatting for consistent style
Configuration
Environment Variables
# Set default project export GOOGLE_CLOUD_PROJECT=my-project # Set credentials (for service accounts) export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json # Add to ~/.zshrc or ~/.bashrc for persistence echo 'export GOOGLE_CLOUD_PROJECT=my-project' >> ~/.zshrc
Authentication Methods
User Credentials (interactive):
bigquery auth login # Opens browser for Google authentication
Service Account (automation):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json bigquery auth check
Application Default Credentials (gcloud):
gcloud auth application-default login bigquery auth check
Troubleshooting
Issue: "Not authenticated" or "Permission denied"
Solution: Check authentication and scopes
# Check current auth status bigquery auth check # Re-authenticate if needed bigquery auth login # Verify gcloud is set to correct project gcloud config get-value project # Set project if needed gcloud config set project my-project
Issue: "Table not found"
Solution: Use fully qualified table names
# Wrong - missing project/dataset bigquery query "SELECT * FROM table" # Correct - fully qualified bigquery query "SELECT * FROM my-project.my-dataset.my-table" # Or use backticks for reserved words bigquery query "SELECT * FROM \`my-project.my-dataset.my-table\`"
Issue: "Query too expensive"
Solution: Check cost with dry-run and optimize
# Check estimated cost bigquery dry-run "SELECT * FROM large_table WHERE date >= '2025-01-01'" # Optimize with partition filters bigquery dry-run " SELECT * FROM large_table WHERE _PARTITIONDATE = '2025-01-15' "
Issue: "Template not found"
Solution: Search for templates and verify name
# List all templates bigquery templates list # Search for template bigquery templates search "keyword" # Use exact template name bigquery templates run exact-template-name
Issue: "The specified bucket does not exist"
Cause:
bigquery tables load with a local file requires GCS staging bucket configuration.
Solutions:
-
Preferred for small datasets (<1000 rows): Use
instead (no GCS required)tables insertbigquery tables insert my-project.dataset.table \ --data /tmp/data.jsonl \ --format json -
For larger datasets: Upload to GCS first, then load
gsutil cp /tmp/large-file.jsonl gs://my-bucket/ bigquery tables load my-project.dataset.table \ gs://my-bucket/large-file.jsonl \ --format json -
Last resort: Configure GCS staging bucket in BigQuery CLI config (requires additional setup)
Issue: "Invalid schema"
Solution: Check schema format for external tables
# Schema format: column:type,column:type,... bigquery tables create-external my-project.dataset.table \ --source-uri gs://bucket/file.csv \ --format csv \ --schema "id:INTEGER,name:STRING,created_at:TIMESTAMP" # Or use autodetect bigquery tables create-external my-project.dataset.table \ --source-uri gs://bucket/file.csv \ --format csv \ --autodetect
Issue: "MCP server not responding"
Solution: Check server mode and connectivity
# For stdio mode, ensure client is using stdio transport bigquery mcp stdio # For HTTP mode, check port and firewall bigquery mcp http --port 8080 # Test HTTP endpoint curl http://localhost:8080
Issue: "LSP not starting in editor"
Solution: Verify LSP configuration and binary path
# Check bigquery is in PATH which bigquery # Test LSP manually bigquery lsp # Verify editor configuration points to correct command # Neovim: check cmd = { "bigquery", "lsp" } # VS Code: check "command": "bigquery", "args": ["lsp"]
Quick Reference
# Authentication bigquery auth check # Check auth status bigquery auth login # Login with gcloud # Queries bigquery query "SELECT ..." # Execute query bigquery query --yes "SELECT ..." # Skip confirmation bigquery query --format text "SELECT ..." # Table output bigquery dry-run "SELECT ..." # Estimate cost # Datasets bigquery datasets list PROJECT # List datasets # Tables bigquery tables list PROJECT.DATASET # List tables bigquery tables describe PROJECT.DATASET.TABLE # Show schema bigquery tables insert TABLE --json '{"id": 1}' # Insert rows (inline) bigquery tables insert TABLE --data file.jsonl --format json # Insert from JSONL cat data.jsonl | bigquery tables insert TABLE --data - # Stream insert bigquery tables load TABLE file.csv # Load data (bulk) bigquery tables load TABLE gs://bucket/file.csv # Load from GCS bigquery tables extract TABLE gs://bucket/output.csv # Extract to GCS bigquery tables create-external TABLE --source-uri ... # External table bigquery tables update-external TABLE --source-uri ... # Update external # Templates bigquery templates list # List templates bigquery templates search "keyword" # Search templates bigquery templates validate TEMPLATE # Validate template bigquery templates run TEMPLATE --param key=value # Run template # MCP Server bigquery mcp stdio # Start MCP (stdio mode) bigquery mcp http # Start MCP (HTTP mode) bigquery mcp http --port 3000 # Custom port # LSP Server bigquery lsp # Start LSP server
Integration Examples
CI/CD Pipeline
#!/bin/bash # daily-etl.sh # Authenticate with service account export GOOGLE_APPLICATION_CREDENTIALS=/secrets/sa-key.json bigquery auth check || exit 1 # Run daily ETL template bigquery templates run daily-etl \ --param date=$(date +%Y-%m-%d) \ --yes \ --format json > /tmp/etl-result.json # Check result if [ $? -eq 0 ]; then echo "ETL completed successfully" else echo "ETL failed" exit 1 fi
Data Quality Checks
#!/bin/bash # check-data-quality.sh # Run data quality template RESULT=$(bigquery templates run data-quality-check \ --param table=my-project.dataset.table \ --yes \ --format json) # Parse result and check quality metrics INVALID_ROWS=$(echo $RESULT | jq '.invalid_rows') if [ "$INVALID_ROWS" -gt 100 ]; then echo "Data quality check failed: $INVALID_ROWS invalid rows" exit 1 else echo "Data quality check passed" fi
Scheduled Reporting
#!/bin/bash # generate-report.sh # Generate weekly report bigquery templates run weekly-revenue-report \ --param week_start=$(date -d "last monday" +%Y-%m-%d) \ --param week_end=$(date -d "next sunday" +%Y-%m-%d) \ --yes \ --format json > /reports/weekly-$(date +%Y-%m-%d).json # Upload to GCS gsutil cp /reports/weekly-*.json gs://reports-bucket/
Summary
Primary commands:
- Authentication managementbigquery auth {check,login}
- Execute SQL with cost awarenessbigquery query
- Estimate query costsbigquery dry-run
- List datasetsbigquery datasets list
- Table operationsbigquery tables {list,describe,insert,load,extract,create-external,update-external}
- Named templatesbigquery templates {list,search,validate,run}
- MCP server modesbigquery mcp {stdio,http}
- LSP serverbigquery lsp
Key features:
- Cost-aware query execution with confirmation prompts
- Named query templates with parameter substitution
- Streaming insert API for real-time data (<1000 rows)
- Bulk load jobs for large datasets (>10MB or >1000 rows)
- JSONL streaming support with stdin (
)--data - - Data extraction to Cloud Storage (CSV, JSON, Avro, Parquet)
- External table support for GCS data
- MCP server with stdio and HTTP modes
- LSP integration for editor support
Best practices:
- Always check authentication first with
auth check - Use
to estimate costs before expensive queriesdry-run - Create templates for frequently-used queries
- Use
flag for automation and CI/CD--yes - Use
for <1000 rows,insert
for larger datasetsload - Use JSONL format for streaming pipelines
- Stream from stdin with
for data transformations--data - - Use external tables to avoid data duplication
- Configure MCP for natural language query capabilities
- Set up LSP in your editor for SQL development
MCP Integration:
- Semantic search across datasets
- Natural language to SQL translation
- Schema discovery and exploration
- Template execution via MCP tools
- Available in both stdio and HTTP modes