Skillshub coreweave-prod-checklist
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-prod-checklist" ~/.claude/skills/comeonoliver-skillshub-coreweave-prod-checklist && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/coreweave-prod-checklist/SKILL.mdsource content
CoreWeave Production Checklist
Inference Services
- GPU type and count validated for model size
- Autoscaling configured (KServe or HPA)
- Health and readiness probes set
- Resource requests AND limits specified
- Node affinity targeting correct GPU class
-
for production (no cold starts)minReplicas >= 1
Storage
- Model weights in PVC (not downloaded at startup)
- Checkpoints saved to persistent storage
- Storage class appropriate (SSD for inference, HDD for archival)
Security
- Secrets for model tokens and registry access
- Network policies applied
- Container images from trusted registries
Monitoring
- GPU utilization metrics collected
- Inference latency and throughput tracked
- Alert on pod restarts and OOM events
- Log aggregation configured
Rollback
kubectl rollout undo deployment/my-inference kubectl rollout status deployment/my-inference
Resources
Next Steps
For upgrades, see
coreweave-upgrade-migration.