Claude-code-plugins-plus vastai-cost-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/vastai-pack/skills/vastai-cost-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-vastai-cost-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/vastai-pack/skills/vastai-cost-tuning/SKILL.mdsource content
Vast.ai Cost Tuning
Overview
Minimize Vast.ai GPU cloud costs by choosing the right GPU for your workload, leveraging interruptible (spot) instances, eliminating idle compute, and implementing auto-destroy safeguards. Vast.ai pricing is dynamic and varies significantly: RTX 4090 ($0.15-0.30/hr), A100 80GB ($1.00-2.00/hr), H100 SXM ($2.50-4.00/hr).
Prerequisites
- Vast.ai account with billing history
- Understanding of your workload's GPU requirements
CLI installedvastai
Instructions
Step 1: GPU Selection by Cost-Efficiency
# Compare cost-per-TFLOP across GPU types GPU_SPECS = { "RTX_4090": {"fp16_tflops": 82.6, "vram": 24}, "A100": {"fp16_tflops": 77.97, "vram": 80}, "H100_SXM": {"fp16_tflops": 267, "vram": 80}, "RTX_3090": {"fp16_tflops": 35.6, "vram": 24}, "A6000": {"fp16_tflops": 38.7, "vram": 48}, } def cost_per_tflop(gpu_name, dph): specs = GPU_SPECS.get(gpu_name, {"fp16_tflops": 1}) return dph / specs["fp16_tflops"] # Often RTX 4090 is the best value for inference # A100 is best for training large models needing >24GB VRAM # H100 is best only when wall-clock time justifies 10x price premium
Step 2: Spot vs On-Demand Analysis
# Interruptible (spot) instances are 30-60% cheaper vastai search offers 'num_gpus=1 gpu_name=RTX_4090 rentable=true' \ --order dph_total --limit 5 # Compare interruptible vs on-demand pricing # Use interruptible for: batch inference, checkpointed training # Use on-demand for: final training epochs, production inference
Step 3: Auto-Destroy Safeguards
import time, subprocess, json def auto_destroy_after(instance_id, max_hours=4): """Destroy instance after max_hours to prevent cost overruns.""" max_seconds = max_hours * 3600 time.sleep(max_seconds) subprocess.run(["vastai", "destroy", "instance", str(instance_id)], check=True) print(f"Instance {instance_id} auto-destroyed after {max_hours}h") # Run in background thread when provisioning import threading watchdog = threading.Thread(target=auto_destroy_after, args=(inst_id, 4), daemon=True) watchdog.start()
Step 4: Idle Instance Detection
#!/bin/bash # Find and destroy idle instances (GPU util < 10% for >10 min) vastai show instances --raw | python3 -c " import sys, json for inst in json.load(sys.stdin): if inst.get('actual_status') == 'running': gpu_util = inst.get('gpu_util', 0) if gpu_util < 10: print(f'IDLE: Instance {inst[\"id\"]} GPU util={gpu_util}% ' f'(\${inst.get(\"dph_total\", 0):.3f}/hr)') "
Step 5: Cost Reporting
def daily_cost_report(): """Calculate current daily burn rate from running instances.""" result = subprocess.run( ["vastai", "show", "instances", "--raw"], capture_output=True, text=True) instances = json.loads(result.stdout) total_hourly = 0 for inst in instances: if inst.get("actual_status") == "running": dph = inst.get("dph_total", 0) total_hourly += dph print(f" {inst['id']}: {inst.get('gpu_name')} ${dph:.3f}/hr") print(f"\nTotal: ${total_hourly:.3f}/hr = ${total_hourly * 24:.2f}/day")
Cost Optimization Checklist
- Always search with
to find cheapest offers--order dph_total - Use interruptible instances for checkpointed workloads
- Implement auto-destroy timeout on all instances
- Monitor GPU utilization; destroy idle instances
- Use RTX 4090 for workloads that fit in 24GB VRAM
- Only use H100 when wall-clock time savings justify cost premium
- Pre-install dependencies in Docker images (avoid paying for pip install)
Output
- GPU cost-efficiency analysis by model
- Spot vs on-demand comparison
- Auto-destroy watchdog for cost protection
- Idle instance detection script
- Daily cost burn rate report
Error Handling
| Error | Cause | Solution |
|---|---|---|
| Unexpected $50+ bill | Forgot to destroy instances | Implement auto-destroy watchdog |
| GPU idle at $2/hr | Waiting for data download | Pre-stage data before provisioning GPU |
| Spot preemption mid-job | Cheapest instance reclaimed | Checkpoint frequently; auto-recover |
Resources
Next Steps
For reference architecture, see
vastai-reference-architecture.
Examples
Budget cap: Set
dph_total<=0.25 in search queries and auto_destroy_after(inst_id, 4) to cap any single job at $1.00.
GPU comparison: Run the same workload on RTX 4090 ($0.20/hr) vs A100 ($1.50/hr). If the A100 finishes in less than 1/7th the time, it's cheaper overall.