Skillshub coreweave-cost-tuning

install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-cost-tuning" ~/.claude/skills/comeonoliver-skillshub-coreweave-cost-tuning && rm -rf "$T"
manifest: skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-cost-tuning/SKILL.md
source content

CoreWeave Cost Tuning

GPU Pricing Reference (approximate)

GPUPer GPU/hourBest For
A100 40GB PCIe~$1.50Development, smaller models
A100 80GB PCIe~$2.21Production inference
H100 80GB PCIe~$4.76High-throughput inference
H100 SXM5 (8x)~$6.15/GPUTraining, multi-GPU
L40~$1.10Image generation, light inference

Cost Optimization Strategies

Scale-to-Zero for Dev/Staging

autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/scaleDownDelay: "5m"

Right-Size GPU Selection

def recommend_gpu(model_size_b: float, inference_only: bool = True) -> str:
    if model_size_b <= 7:
        return "L40" if inference_only else "A100_PCIE_80GB"
    elif model_size_b <= 13:
        return "A100_PCIE_80GB"
    elif model_size_b <= 70:
        return "A100_PCIE_80GB (4x tensor parallel)"
    else:
        return "H100_SXM5 (8x tensor parallel)"

Quantization to Use Smaller GPUs

Use AWQ or GPTQ quantization to fit larger models on smaller GPUs:

# 70B model at 4-bit fits on single A100-80GB instead of 4x
vllm serve meta-llama/Llama-3.1-70B-Instruct-AWQ --quantization awq

Resources

Next Steps

For architecture patterns, see

coreweave-reference-architecture
.