Claude-code-plugins coreweave-deploy-integration
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/coreweave-pack/skills/coreweave-deploy-integration" ~/.claude/skills/jeremylongshore-claude-code-plugins-coreweave-deploy-integration && rm -rf "$T"
manifest:
plugins/saas-packs/coreweave-pack/skills/coreweave-deploy-integration/SKILL.mdsource content
CoreWeave Deploy Integration
Overview
Deploy GPU-accelerated inference services on CoreWeave Kubernetes (CKS). This skill covers containerizing inference workloads with NVIDIA CUDA base images, configuring GPU resource limits and node affinity for A100/H100 scheduling, setting up health checks that validate GPU availability and model loading, and executing rolling updates that respect GPU node draining. CoreWeave's scheduler requires explicit GPU resource requests to place pods on the correct hardware tier.
Docker Configuration
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS base RUN apt-get update && apt-get install -y --no-install-recommends \ python3 python3-pip curl && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY requirements.txt ./ RUN pip3 install --no-cache-dir -r requirements.txt FROM base RUN groupadd -r app && useradd -r -g app app COPY --chown=app:app src/ ./src/ COPY --chown=app:app models/ ./models/ USER app EXPOSE 8080 HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD curl -f http://localhost:8080/health || exit 1 CMD ["python3", "src/server.py"]
Environment Variables
export COREWEAVE_API_KEY="cw_xxxxxxxxxxxx" export COREWEAVE_NAMESPACE="tenant-my-org" export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" export GPU_TYPE="A100_PCIE_80GB" export GPU_COUNT="1" export LOG_LEVEL="info" export PORT="8080"
Health Check Endpoint
import express from 'express'; import { execSync } from 'child_process'; const app = express(); app.get('/health', async (req, res) => { try { const gpuInfo = execSync('nvidia-smi --query-gpu=name,memory.used --format=csv,noheader').toString().trim(); const modelLoaded = globalThis.modelReady === true; if (!modelLoaded) throw new Error('Model not loaded'); res.json({ status: 'healthy', gpu: gpuInfo, model: process.env.MODEL_NAME, timestamp: new Date().toISOString() }); } catch (error) { res.status(503).json({ status: 'unhealthy', error: (error as Error).message }); } });
Deployment Steps
Step 1: Build
docker build -t registry.coreweave.com/my-org/inference-svc:latest . docker push registry.coreweave.com/my-org/inference-svc:latest
Step 2: Run
# k8s/deployment.yaml resources: limits: nvidia.com/gpu: 1 cpu: "4" memory: "48Gi" nodeSelector: gpu.nvidia.com/class: A100_PCIE_80GB
kubectl apply -f k8s/deployment.yaml -n tenant-my-org
Step 3: Verify
kubectl get pods -n tenant-my-org -l app=inference-svc curl -s http://inference-svc.tenant-my-org.svc.cluster.local:8080/health | jq .
Step 4: Rolling Update
kubectl set image deployment/inference-svc \ inference=registry.coreweave.com/my-org/inference-svc:v2 \ -n tenant-my-org kubectl rollout status deployment/inference-svc -n tenant-my-org --timeout=600s
Error Handling
| Issue | Cause | Fix |
|---|---|---|
pod stuck | No GPU nodes available for requested type | Check for allocatable GPUs or switch GPU tier |
| Model exceeds GPU memory | Reduce model size, enable quantization, or request larger GPU |
not found | Missing NVIDIA device plugin | Verify CoreWeave namespace has GPU operator installed |
| Invalid API key or expired credentials | Regenerate key in CoreWeave dashboard |
| Slow rolling update | GPU nodes take time to drain | Set in deployment spec |
Resources
Next Steps
See
coreweave-webhooks-events.