Claude-code-plugins-plus-skills coreweave-debug-bundle

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/coreweave-pack/skills/coreweave-debug-bundle" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-coreweave-debug-bundle && rm -rf "$T"
manifest: plugins/saas-packs/coreweave-pack/skills/coreweave-debug-bundle/SKILL.md
source content

CoreWeave Debug Bundle

Overview

Collect GPU node health, Kubernetes pod status, event logs, and API connectivity into a single diagnostic archive for CoreWeave support tickets. This bundle captures cluster-level resource allocation, failed pod logs, GPU device plugin state, and network reachability so support engineers can diagnose infrastructure issues without requesting additional information. Useful when GPU pods are stuck pending, inference workloads OOM, or node autoscaling behaves unexpectedly.

Debug Collection Script

#!/bin/bash
set -euo pipefail
BUNDLE="debug-coreweave-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"

# Environment check
echo "=== CoreWeave Debug Bundle ===" | tee "$BUNDLE/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE/summary.txt"
echo "COREWEAVE_API_KEY: ${COREWEAVE_API_KEY:+[SET]}" >> "$BUNDLE/summary.txt"
echo "KUBECONFIG: ${KUBECONFIG:-default}" >> "$BUNDLE/summary.txt"
echo "kubectl: $(kubectl version --client --short 2>/dev/null || echo 'not found')" >> "$BUNDLE/summary.txt"

# API connectivity
HTTP=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
  https://api.coreweave.com/v1/namespaces 2>/dev/null || echo "000")
echo "API Status: HTTP $HTTP" >> "$BUNDLE/summary.txt"

# Cluster state
kubectl get nodes -o wide > "$BUNDLE/nodes.txt" 2>&1 || true
kubectl get pods --all-namespaces -o wide > "$BUNDLE/pods.txt" 2>&1 || true
kubectl get events --sort-by=.lastTimestamp > "$BUNDLE/events.txt" 2>&1 || true

# GPU allocation and device plugin status
kubectl describe nodes | grep -A10 "Allocated resources" > "$BUNDLE/gpu-allocation.txt" 2>&1 || true
kubectl get pods -n kube-system -l k8s-app=nvidia-device-plugin -o wide > "$BUNDLE/gpu-plugin.txt" 2>&1 || true

# Failed pod logs
for pod in $(kubectl get pods --field-selector=status.phase=Failed -o name 2>/dev/null); do
  kubectl logs "$pod" --tail=200 > "$BUNDLE/$(basename "$pod")-logs.txt" 2>&1 || true
done

# Rate limit headers
curl -s -D "$BUNDLE/rate-headers.txt" -o /dev/null \
  -H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
  https://api.coreweave.com/v1/namespaces 2>/dev/null || true

tar -czf "$BUNDLE.tar.gz" "$BUNDLE" && rm -rf "$BUNDLE"
echo "Bundle: $BUNDLE.tar.gz"

Analyzing the Bundle

tar -xzf debug-coreweave-*.tar.gz
cat debug-coreweave-*/summary.txt          # API + env status at a glance
grep -i "error\|fail\|oom" debug-coreweave-*/events.txt  # Critical events
cat debug-coreweave-*/gpu-allocation.txt   # GPU resource pressure

Common Issues

SymptomCheck in BundleFix
GPU pods stuck Pending
gpu-allocation.txt
shows 0 allocatable GPUs
Request quota increase or switch to available GPU type
OOMKilled on inference pod
events.txt
for OOMKilled entries
Increase memory limits in pod spec; check model size vs allocated RAM
Node NotReady
nodes.txt
status column
Check
events.txt
for kubelet issues; contact CoreWeave if persistent
API returns 401
summary.txt
shows HTTP 401
Regenerate API key at CoreWeave dashboard; verify
COREWEAVE_API_KEY
is set
NVIDIA device plugin missing
gpu-plugin.txt
empty or error
Verify namespace
kube-system
has device plugin DaemonSet; redeploy if missing

Automated Health Check

async function checkCoreWeave(): Promise<void> {
  const key = process.env.COREWEAVE_API_KEY;
  if (!key) { console.error("[FAIL] COREWEAVE_API_KEY not set"); return; }

  const res = await fetch("https://api.coreweave.com/v1/namespaces", {
    headers: { Authorization: `Bearer ${key}` },
  });
  console.log(`[${res.ok ? "OK" : "FAIL"}] API: HTTP ${res.status}`);

  const limit = res.headers.get("x-ratelimit-remaining");
  if (limit) console.log(`[INFO] Rate limit remaining: ${limit}`);
}
checkCoreWeave();

Resources

Next Steps

See

coreweave-common-errors
for GPU scheduling and Kubernetes troubleshooting patterns.