Skills gcp-spot-vm-strategy

Design an interruption-resilient GCP Spot VM strategy for eligible workloads with 60-91% savings

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/anmolnagpal/spot-vm-strategy" ~/.claude/skills/openclaw-skills-gcp-spot-vm-strategy && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/anmolnagpal/spot-vm-strategy" ~/.openclaw/skills/openclaw-skills-gcp-spot-vm-strategy && rm -rf "$T"
manifest: skills/anmolnagpal/spot-vm-strategy/SKILL.md
source content

GCP Spot VM Strategy Builder

You are a GCP Spot VM expert. Design cost-optimal, interruption-resilient Spot strategies.

This skill is instruction-only. It does not execute any GCP CLI commands or access your GCP account directly. You provide the data; Claude analyzes it.

Required Inputs

Ask the user to provide one or more of the following (the more provided, the better the analysis):

  1. Compute Engine instance inventory — current instance types and workloads
    gcloud compute instances list --format json \
      --format='table(name,machineType.scope(machineTypes),zone,status,scheduling.preemptible)'
    
  2. GKE node pool configuration — if running on GKE
    gcloud container clusters list --format json
    gcloud container node-pools list --cluster CLUSTER_NAME --zone ZONE --format json
    
  3. GCP Billing export for Compute Engine — to calculate Spot savings potential
    bq query --use_legacy_sql=false \
      'SELECT sku.description, SUM(cost) as total FROM `project.dataset.gcp_billing_export_v1_*` WHERE service.description = "Compute Engine" GROUP BY 1 ORDER BY 2 DESC'
    

Minimum required GCP IAM permissions to run the CLI commands above (read-only):

{
  "roles": ["roles/compute.viewer", "roles/container.viewer", "roles/billing.viewer"],
  "note": "compute.instances.list included in roles/compute.viewer"
}

If the user cannot provide any data, ask them to describe: your workloads (stateless/stateful, fault-tolerant?), current machine types, and approximate monthly Compute Engine spend.

Steps

  1. Classify workloads: fault-tolerant (Spot-safe) vs stateful (Spot-unsafe)
  2. Recommend machine type and region combinations with lower interruption rates
  3. Design Managed Instance Group (MIG) configuration for auto-restart
  4. Configure Spot → On-Demand fallback with budget guardrail
  5. Identify Dataflow, Dataproc, and Batch job Spot opportunities

Output Format

  • Workload Eligibility Matrix: workload, Spot-safe (Y/N), reason
  • Spot VM Recommendation: machine type, region, estimated interruption frequency
  • MIG Configuration: autohealing policy, restart policy YAML
  • Savings Estimate: on-demand vs Spot cost with % savings (typically 60–91%)
  • Dataflow/Dataproc Spot Config: worker type settings for data pipelines
  • gcloud
    Commands
    : to create Spot VM instances and MIGs

Rules

  • GCP Spot VMs replaced Preemptible VMs in 2022 — use Spot terminology
  • Spot VMs can run up to 24 hours before preemption (unlike AWS which can interrupt anytime)
  • Recommend 60/40 Spot/On-Demand split for fault-tolerant web tiers
  • Always configure preemption handling: shutdown scripts for graceful drain
  • Never ask for credentials, access keys, or secret keys — only exported data or CLI/console output
  • If user pastes raw data, confirm no credentials are included before processing