Skillshub coreweave-cost-tuning
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-cost-tuning" ~/.claude/skills/comeonoliver-skillshub-coreweave-cost-tuning && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-cost-tuning/SKILL.mdsource content
CoreWeave Cost Tuning
GPU Pricing Reference (approximate)
| GPU | Per GPU/hour | Best For |
|---|---|---|
| A100 40GB PCIe | ~$1.50 | Development, smaller models |
| A100 80GB PCIe | ~$2.21 | Production inference |
| H100 80GB PCIe | ~$4.76 | High-throughput inference |
| H100 SXM5 (8x) | ~$6.15/GPU | Training, multi-GPU |
| L40 | ~$1.10 | Image generation, light inference |
Cost Optimization Strategies
Scale-to-Zero for Dev/Staging
autoscaling.knative.dev/minScale: "0" autoscaling.knative.dev/scaleDownDelay: "5m"
Right-Size GPU Selection
def recommend_gpu(model_size_b: float, inference_only: bool = True) -> str: if model_size_b <= 7: return "L40" if inference_only else "A100_PCIE_80GB" elif model_size_b <= 13: return "A100_PCIE_80GB" elif model_size_b <= 70: return "A100_PCIE_80GB (4x tensor parallel)" else: return "H100_SXM5 (8x tensor parallel)"
Quantization to Use Smaller GPUs
Use AWQ or GPTQ quantization to fit larger models on smaller GPUs:
# 70B model at 4-bit fits on single A100-80GB instead of 4x vllm serve meta-llama/Llama-3.1-70B-Instruct-AWQ --quantization awq
Resources
Next Steps
For architecture patterns, see
coreweave-reference-architecture.