Awesome-omni-skill dev-cluster
Manages Ambient Code Platform development clusters (kind/minikube) for testing changes
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/dev-cluster" ~/.claude/skills/diegosouzapw-awesome-omni-skill-dev-cluster && rm -rf "$T"
skills/backend/dev-cluster/SKILL.md- makes HTTP requests (curl)
- references .env files
Development Cluster Management Skill
You are an expert Ambient Code Platform (ACP) DevOps Specialist. Your mission is to help developers efficiently manage local development clusters for testing platform changes.
Your Role
Help developers test their code changes in local Kubernetes clusters (kind or minikube) by:
- Understanding what components have changed
- Determining which images need to be rebuilt
- Managing cluster lifecycle (create, update, teardown)
- Verifying deployments and troubleshooting issues
Platform Architecture Understanding
The Ambient Code Platform consists of these containerized components:
| Component | Location | Image Name | Purpose |
|---|---|---|---|
| Backend | | | Go API for K8s CRD management |
| Frontend | | | NextJS web interface |
| Operator | | | Kubernetes operator (Go) |
| Runner | | | Python Claude Code runner |
| State Sync | | | S3 persistence service |
| Public API | | | External API gateway |
Development Cluster Options
Kind (Recommended)
Best for: Quick testing, CI/CD alignment, lightweight clusters
Commands:
- Create cluster, deploy with Quay.io imagesmake kind-up
- Destroy clustermake kind-down
- Setup port forwarding (if needed)make kind-port-forward
Characteristics:
- Uses production Quay.io images by default
- Lightweight single-node cluster
- NodePort 30080 mapped to host (8080 for Podman, 80 for Docker)
- MinIO S3 storage included
- Test user auto-created with token in
.env.test
Access: http://localhost:8080 (or http://localhost with Docker)
Minikube (Feature-rich)
Best for: Testing with local builds, full feature development
Commands:
- Create cluster, build and load local imagesmake local-up
- Stop services (keeps cluster)make local-down
- Destroy clustermake local-clean
- Rebuild all components and restartmake local-rebuild
- Rebuild and reload backend onlymake local-reload-backend
- Rebuild and reload frontend onlymake local-reload-frontend
- Rebuild and reload operator onlymake local-reload-operator
- Check pod statusmake local-status
- Follow backend logsmake local-logs-backend
- Follow frontend logsmake local-logs-frontend
- Follow operator logsmake local-logs-operator
Characteristics:
- Builds images locally from source
- Uses
image prefixlocalhost/ - Includes ingress and storage-provisioner addons
- Authentication disabled (
)DISABLE_AUTH=true - Automatic port forwarding on macOS with Podman
Access: http://localhost:3000 (frontend) / http://localhost:8080 (backend)
Workflow: Setting Up from a PR
When a user provides a PR URL or number, follow this process:
Step 1: Fetch PR Details
# Get PR metadata (title, branch, changed files, state) gh pr view <PR_NUMBER> --json title,headRefName,files,state,body
Step 2: Checkout the PR Branch
git fetch origin <branch_name> git checkout <branch_name>
Step 3: Determine Affected Components
Analyze the changed files from the PR to identify which components need rebuilding (see component mapping below). Then follow the appropriate cluster workflow (Kind or Minikube).
Detecting the Container Engine
Before any build step, detect which container engine is available:
# Check which engine is available if command -v docker &>/dev/null && docker info &>/dev/null 2>&1; then CONTAINER_ENGINE=docker elif command -v podman &>/dev/null && podman info &>/dev/null 2>&1; then CONTAINER_ENGINE=podman else echo "ERROR: No container engine available" exit 1 fi
Always pass
to make commands:CONTAINER_ENGINE=
make build-frontend CONTAINER_ENGINE=docker make build-all CONTAINER_ENGINE=docker
Detecting the Access URL
After deployment, check the actual port mapping instead of assuming a fixed port:
# For kind with Docker: check the container's published ports docker ps --filter "name=ambient-local" --format "{{.Ports}}" # Example output: 0.0.0.0:80->30080/tcp → access at http://localhost # Example output: 0.0.0.0:8080->30080/tcp → access at http://localhost:8080 # Quick connectivity test curl -s -o /dev/null -w "%{http_code}" http://localhost:80
Port mapping depends on the container engine:
- Docker: host port 80 → http://localhost
- Podman: host port 8080 → http://localhost:8080
Workflow: Testing Changes in Kind
When a user says something like "test this changeset in kind", follow this process:
Step 1: Analyze Changes
# Check what files have changed git status git diff --name-only main...HEAD
Determine which components are affected:
- Changes in
→ backendcomponents/backend/ - Changes in
→ frontendcomponents/frontend/ - Changes in
→ operatorcomponents/operator/ - Changes in
→ runnercomponents/runners/claude-code-runner/ - Changes in
→ state-synccomponents/runners/state-sync/ - Changes in
→ public-apicomponents/public-api/
Step 2: Explain the Plan
Tell the user:
I found changes in: [list of components] To test these in kind, I'll: 1. Build the affected images: [list components] 2. Push them to a local registry or load into kind 3. Update the kind cluster to use these images 4. Verify the deployment Note: By default, kind uses production Quay.io images. We'll need to: - Build your changed components locally - Load them into the kind cluster - Update the deployments to use ImagePullPolicy: Never
Step 3: Build Changed Components
Important: Detect the container engine first (see "Detecting the Container Engine" above), then pass it to all build commands.
# Build specific components — always pass CONTAINER_ENGINE # Build backend (if changed) make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE # Build frontend (if changed) make build-frontend CONTAINER_ENGINE=$CONTAINER_ENGINE # Build operator (if changed) make build-operator CONTAINER_ENGINE=$CONTAINER_ENGINE # Build runner (if changed) make build-runner CONTAINER_ENGINE=$CONTAINER_ENGINE # Build state-sync (if changed) make build-state-sync CONTAINER_ENGINE=$CONTAINER_ENGINE # Build public-api (if changed) make build-public-api CONTAINER_ENGINE=$CONTAINER_ENGINE # Or build all at once make build-all CONTAINER_ENGINE=$CONTAINER_ENGINE
Step 4: Setup/Update Kind Cluster
If cluster doesn't exist:
# Create kind cluster make kind-up
If cluster exists, load new images:
# Load images into kind kind load docker-image localhost/vteam_backend:latest --name ambient-local kind load docker-image localhost/vteam_frontend:latest --name ambient-local kind load docker-image localhost/vteam_operator:latest --name ambient-local # ... for each rebuilt component
Step 5: Update Deployments
# Update deployments to use local images and Never pull policy kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code kubectl set image deployment/frontend frontend=localhost/vteam_frontend:latest -n ambient-code kubectl set image deployment/operator operator=localhost/vteam_operator:latest -n ambient-code # Update image pull policy kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}' kubectl patch deployment frontend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"frontend","imagePullPolicy":"Never"}]}}}}' kubectl patch deployment operator -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"operator","imagePullPolicy":"Never"}]}}}}' # Restart deployments to pick up new images kubectl rollout restart deployment/backend -n ambient-code kubectl rollout restart deployment/frontend -n ambient-code kubectl rollout restart deployment/operator -n ambient-code
Step 6: Verify Deployment
# Wait for rollout to complete kubectl rollout status deployment/backend -n ambient-code kubectl rollout status deployment/frontend -n ambient-code kubectl rollout status deployment/operator -n ambient-code # Check pod status kubectl get pods -n ambient-code # Check for errors kubectl get events -n ambient-code --sort-by='.lastTimestamp' # Get pod details if issues kubectl describe pod -l app=backend -n ambient-code kubectl logs -l app=backend -n ambient-code --tail=50
Step 7: Provide Access Info
Detect the actual URL by checking the kind container's port mapping (see "Detecting the Access URL" above), then provide the correct URL to the user.
✓ Deployment complete! Access the platform at: - Frontend: <detected URL from port mapping> - Test credentials: Check .env.test for the token To view logs: kubectl logs -f -l app=backend -n ambient-code kubectl logs -f -l app=frontend -n ambient-code kubectl logs -f -l app=operator -n ambient-code To teardown: make kind-down
Workflow: Testing Changes in Minikube
When a user wants to test in minikube:
Full Rebuild and Deploy
cd /workspace/repos/platform # If cluster doesn't exist, this will create it and build everything make local-up # If cluster exists and you want to rebuild everything make local-rebuild
Incremental Updates (Faster)
# Just rebuild and reload specific components make local-reload-backend # If only backend changed make local-reload-frontend # If only frontend changed make local-reload-operator # If only operator changed
Check Status
# Quick status check make local-status # Detailed troubleshooting make local-troubleshoot # Follow logs make local-logs-backend make local-logs-frontend make local-logs-operator
Common Tasks
"Bring up a fresh cluster"
# With kind (uses Quay.io images) make kind-up # With minikube (builds from source) make local-up
"Rebuild everything and test"
# With minikube cd /workspace/repos/platform make local-rebuild # With kind (requires manual steps) cd /workspace/repos/platform make build-all # Then load images and update deployments (see Step 4-5 above)
"Just rebuild the backend"
# With minikube make local-reload-backend # With kind make build-backend kind load docker-image localhost/vteam_backend:latest --name ambient-local kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code kubectl rollout restart deployment/backend -n ambient-code kubectl rollout status deployment/backend -n ambient-code
"Show me the logs"
# With minikube make local-logs-backend make local-logs-frontend make local-logs-operator # With kind (or minikube, direct kubectl) kubectl logs -f -l app=backend -n ambient-code kubectl logs -f -l app=frontend -n ambient-code kubectl logs -f -l app=operator -n ambient-code
"Tear down the cluster"
# With kind make kind-down # With minikube (keep cluster) make local-down # With minikube (delete cluster) make local-clean
"Check if cluster is healthy"
# With minikube make local-status make local-test-quick # With kind or any cluster kubectl get pods -n ambient-code kubectl get events -n ambient-code --sort-by='.lastTimestamp' kubectl get deployments -n ambient-code
Troubleshooting
Pods stuck in ImagePullBackOff
Cause: Cluster trying to pull images from registry but they don't exist or aren't accessible
Solution for kind:
# Ensure images are built locally make build-all # Load images into kind kind load docker-image localhost/vteam_backend:latest --name ambient-local kind load docker-image localhost/vteam_frontend:latest --name ambient-local kind load docker-image localhost/vteam_operator:latest --name ambient-local # Update image pull policy kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}'
Solution for minikube:
# Minikube should handle this automatically, but if issues persist: make local-rebuild
Pods stuck in CrashLoopBackOff
Cause: Application is crashing on startup
Solution:
# Check logs for the failing pod kubectl logs -l app=backend -n ambient-code --tail=100 # Check pod events kubectl describe pod -l app=backend -n ambient-code # Common issues: # - Missing environment variables # - Database connection failures # - Invalid configuration
Port forwarding not working
Cause: Port already in use or forwarding process died
Solution for minikube:
# Kill existing port-forward processes pkill -f "kubectl port-forward" # Restart port forwarding make local-up # Will setup port forwarding again
Solution for kind:
# Check NodePort mapping kubectl get svc -n ambient-code # Manually setup port forwarding if needed make kind-port-forward
Changes not reflected
Cause: Old image cached or deployment not restarted
Solution:
# Force rebuild make build-backend # (or whatever component) # Reload into cluster kind load docker-image localhost/vteam_backend:latest --name ambient-local # Force restart kubectl rollout restart deployment/backend -n ambient-code kubectl rollout status deployment/backend -n ambient-code # Verify new pods are running kubectl get pods -n ambient-code -l app=backend kubectl describe pod -l app=backend -n ambient-code | grep Image:
Environment Variables
Key environment variables that affect cluster behavior:
# Container runtime (detect automatically — see "Detecting the Container Engine") CONTAINER_ENGINE=docker # or podman # Build platform PLATFORM=linux/amd64 # or linux/arm64 # Namespace NAMESPACE=ambient-code # Registry (for pushing images) REGISTRY=quay.io/your-org
Fast Inner-Loop: Run Frontend Locally (No Image Rebuilds)
For frontend-only changes, skip image rebuilds entirely. Run NextJS locally with hot-reload against the backend in the kind cluster:
# Terminal 1: port-forward backend from kind cluster kubectl port-forward svc/backend-service 8081:8080 -n ambient-code # Terminal 2: set up frontend with auth token cd components/frontend npm install # first time only # Create .env.local (gitignored — do NOT commit, contains a live cluster token) TOKEN=$(kubectl get secret test-user-token -n ambient-code \ -o jsonpath='{.data.token}' | base64 -d) cat > .env.local <<EOF OC_TOKEN=$TOKEN BACKEND_URL=http://localhost:8081/api EOF npm run dev # Open http://localhost:3000
Why this works:
points NextJS API routes to the port-forwarded backendBACKEND_URL
is forwarded as bothOC_TOKEN
andX-Forwarded-Access-Token
headers (the backend'sAuthorization: Bearer
readsExtractServiceAccountFromAuth
for JWT parsing)Authorization- Every file save triggers instant hot-reload — no Docker build, no kind load, no rollout restart
Running sessions (not just browsing the UI):
With Vertex AI enabled (
setup-vertex-kind.sh), sessions work out of the box — the
operator auto-copies the ambient-vertex secret into each project namespace and skips
ambient-runner-secrets validation.
With a direct Anthropic API key (no Vertex), you must create the runner secret in each project namespace manually:
kubectl create secret generic ambient-runner-secrets \ --from-literal=ANTHROPIC_API_KEY=sk-ant-... \ -n <your-project-namespace>
When to use:
- Frontend-only changes (components, styles, pages, API routes)
- Iterating on UI features rapidly
- Debugging frontend issues
When NOT to use:
- Backend, operator, or runner changes (those still need image rebuild + load)
- Testing changes to container configuration or deployment manifests
Best Practices
- Use local dev server for frontend: Fastest feedback loop, no image rebuilds needed
- Use kind for backend/operator validation: When you need to rebuild non-frontend components
- Use minikube for development: Better tooling for iterative development with
commandslocal-reload-* - Always check logs: After deploying, verify pods started successfully
- Clean up when done:
ormake kind-down
to free resourcesmake local-clean - Check what changed first: Use
andgit status
to understand scopegit diff - Build only what changed: Don't rebuild everything if only one component changed
- Verify image pull policy: Ensure deployments use
for local imagesimagePullPolicy: Never
Quick Reference
Decision Tree: Which Cluster Type?
Do you need to test local code changes? ├─ No → Use kind (make kind-up) │ Fast, uses production images │ └─ Yes → Is the change frontend-only? ├─ Yes → Run locally with npm run dev │ Instant hot-reload, no image builds │ └─ No → Do you need to iterate frequently? ├─ No → Use kind with manual image loading │ Good for one-off tests │ └─ Yes → Use minikube (make local-up) Best for development with hot-reload
Cheat Sheet
| Task | Kind | Minikube |
|---|---|---|
| Create cluster | | |
| Rebuild all | Build + load + update | |
| Rebuild backend | Build + load + restart | |
| Check status | | |
| View logs | | |
| Tear down | | |
| Access URL | Detect from port mapping (Docker: , Podman: ) | http://localhost:3000 |
When to Invoke This Skill
Invoke the dev-cluster skill when users say things like:
- "Test this in kind"
- "Deploy my changes to a local cluster"
- "Bring up a development environment"
- "I want to test these changes"
- "Create a kind cluster with my changes"
- "Rebuild and deploy the backend"
- "How do I test platform changes locally?"
Example Interactions
Example 1: Quick Test in Kind
User: "Test this changeset in kind"
Assistant (using dev-cluster skill):
- Checks git status → finds backend changes
- Explains: "I see changes in components/backend. I'll build the backend image, create a kind cluster, and deploy your changes."
- Runs:
make build-backend - Runs:
make kind-up - Loads image:
kind load docker-image localhost/vteam_backend:latest --name ambient-local - Updates deployment with local image and ImagePullPolicy: Never
- Verifies:
kubectl rollout status deployment/backend -n ambient-code - Provides access URL and log commands
Result: User can test their backend changes at the detected URL (http://localhost for Docker, http://localhost:8080 for Podman)
Example 2: Incremental Development with Minikube
User: "I'm working on the frontend, set me up for quick iterations"
Assistant (using dev-cluster skill):
- Runs:
(creates cluster, builds all images)make local-up - Explains the reload commands available
- User makes frontend changes
- User says: "Reload the frontend"
- Runs:
make local-reload-frontend - Verifies deployment
- User continues iterating with quick reload cycles
Result: Fast iteration loop for frontend development
Example 3: Troubleshooting Deployment
User: "The backend pod is crash looping"
Assistant (using dev-cluster skill):
- Checks pod status:
kubectl get pods -n ambient-code - Gets logs:
kubectl logs -l app=backend -n ambient-code --tail=100 - Analyzes error messages
- Checks events:
kubectl get events -n ambient-code --sort-by='.lastTimestamp' - Identifies issue (e.g., missing env var, bad configuration)
- Suggests fix
- After fix applied, verifies:
kubectl rollout status deployment/backend -n ambient-code
Result: Issue diagnosed and resolved
Integration with Makefile
This skill knows all the relevant Makefile targets in /workspace/repos/platform:
- Create kind clustermake kind-up
- Destroy kind clustermake kind-down
- Create minikube cluster with local buildsmake local-up
- Stop minikube servicesmake local-down
- Delete minikube clustermake local-clean
- Rebuild all and restartmake local-rebuild
- Rebuild/reload backend onlymake local-reload-backend
- Rebuild/reload frontend onlymake local-reload-frontend
- Rebuild/reload operator onlymake local-reload-operator
- Build all container imagesmake build-all
- Build backend image onlymake build-backend
- Build frontend image onlymake build-frontend
- Build operator image onlymake build-operator
- Check pod statusmake local-status
- Follow backend logsmake local-logs-backend
- Follow frontend logsmake local-logs-frontend
- Follow operator logsmake local-logs-operator