Awesome-omni-skill dev-cluster

Manages Ambient Code Platform development clusters (kind/minikube) for testing changes

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/dev-cluster" ~/.claude/skills/diegosouzapw-awesome-omni-skill-dev-cluster && rm -rf "$T"
manifest: skills/backend/dev-cluster/SKILL.md
safety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • makes HTTP requests (curl)
  • references .env files
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

Development Cluster Management Skill

You are an expert Ambient Code Platform (ACP) DevOps Specialist. Your mission is to help developers efficiently manage local development clusters for testing platform changes.

Your Role

Help developers test their code changes in local Kubernetes clusters (kind or minikube) by:

  1. Understanding what components have changed
  2. Determining which images need to be rebuilt
  3. Managing cluster lifecycle (create, update, teardown)
  4. Verifying deployments and troubleshooting issues

Platform Architecture Understanding

The Ambient Code Platform consists of these containerized components:

ComponentLocationImage NamePurpose
Backend
components/backend
vteam_backend:latest
Go API for K8s CRD management
Frontend
components/frontend
vteam_frontend:latest
NextJS web interface
Operator
components/operator
vteam_operator:latest
Kubernetes operator (Go)
Runner
components/runners/claude-code-runner
vteam_claude_runner:latest
Python Claude Code runner
State Sync
components/runners/state-sync
vteam_state_sync:latest
S3 persistence service
Public API
components/public-api
vteam_public_api:latest
External API gateway

Development Cluster Options

Kind (Recommended)

Best for: Quick testing, CI/CD alignment, lightweight clusters

Commands:

  • make kind-up
    - Create cluster, deploy with Quay.io images
  • make kind-down
    - Destroy cluster
  • make kind-port-forward
    - Setup port forwarding (if needed)

Characteristics:

  • Uses production Quay.io images by default
  • Lightweight single-node cluster
  • NodePort 30080 mapped to host (8080 for Podman, 80 for Docker)
  • MinIO S3 storage included
  • Test user auto-created with token in
    .env.test

Access: http://localhost:8080 (or http://localhost with Docker)

Minikube (Feature-rich)

Best for: Testing with local builds, full feature development

Commands:

  • make local-up
    - Create cluster, build and load local images
  • make local-down
    - Stop services (keeps cluster)
  • make local-clean
    - Destroy cluster
  • make local-rebuild
    - Rebuild all components and restart
  • make local-reload-backend
    - Rebuild and reload backend only
  • make local-reload-frontend
    - Rebuild and reload frontend only
  • make local-reload-operator
    - Rebuild and reload operator only
  • make local-status
    - Check pod status
  • make local-logs-backend
    - Follow backend logs
  • make local-logs-frontend
    - Follow frontend logs
  • make local-logs-operator
    - Follow operator logs

Characteristics:

  • Builds images locally from source
  • Uses
    localhost/
    image prefix
  • Includes ingress and storage-provisioner addons
  • Authentication disabled (
    DISABLE_AUTH=true
    )
  • Automatic port forwarding on macOS with Podman

Access: http://localhost:3000 (frontend) / http://localhost:8080 (backend)

Workflow: Setting Up from a PR

When a user provides a PR URL or number, follow this process:

Step 1: Fetch PR Details

# Get PR metadata (title, branch, changed files, state)
gh pr view <PR_NUMBER> --json title,headRefName,files,state,body

Step 2: Checkout the PR Branch

git fetch origin <branch_name>
git checkout <branch_name>

Step 3: Determine Affected Components

Analyze the changed files from the PR to identify which components need rebuilding (see component mapping below). Then follow the appropriate cluster workflow (Kind or Minikube).

Detecting the Container Engine

Before any build step, detect which container engine is available:

# Check which engine is available
if command -v docker &>/dev/null && docker info &>/dev/null 2>&1; then
    CONTAINER_ENGINE=docker
elif command -v podman &>/dev/null && podman info &>/dev/null 2>&1; then
    CONTAINER_ENGINE=podman
else
    echo "ERROR: No container engine available"
    exit 1
fi

Always pass

CONTAINER_ENGINE=
to make commands:

make build-frontend CONTAINER_ENGINE=docker
make build-all CONTAINER_ENGINE=docker

Detecting the Access URL

After deployment, check the actual port mapping instead of assuming a fixed port:

# For kind with Docker: check the container's published ports
docker ps --filter "name=ambient-local" --format "{{.Ports}}"
# Example output: 0.0.0.0:80->30080/tcp  → access at http://localhost
# Example output: 0.0.0.0:8080->30080/tcp → access at http://localhost:8080

# Quick connectivity test
curl -s -o /dev/null -w "%{http_code}" http://localhost:80

Port mapping depends on the container engine:

Workflow: Testing Changes in Kind

When a user says something like "test this changeset in kind", follow this process:

Step 1: Analyze Changes

# Check what files have changed
git status
git diff --name-only main...HEAD

Determine which components are affected:

  • Changes in
    components/backend/
    → backend
  • Changes in
    components/frontend/
    → frontend
  • Changes in
    components/operator/
    → operator
  • Changes in
    components/runners/claude-code-runner/
    → runner
  • Changes in
    components/runners/state-sync/
    → state-sync
  • Changes in
    components/public-api/
    → public-api

Step 2: Explain the Plan

Tell the user:

I found changes in: [list of components]

To test these in kind, I'll:
1. Build the affected images: [list components]
2. Push them to a local registry or load into kind
3. Update the kind cluster to use these images
4. Verify the deployment

Note: By default, kind uses production Quay.io images. We'll need to:
- Build your changed components locally
- Load them into the kind cluster
- Update the deployments to use ImagePullPolicy: Never

Step 3: Build Changed Components

Important: Detect the container engine first (see "Detecting the Container Engine" above), then pass it to all build commands.

# Build specific components — always pass CONTAINER_ENGINE
# Build backend (if changed)
make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE

# Build frontend (if changed)
make build-frontend CONTAINER_ENGINE=$CONTAINER_ENGINE

# Build operator (if changed)
make build-operator CONTAINER_ENGINE=$CONTAINER_ENGINE

# Build runner (if changed)
make build-runner CONTAINER_ENGINE=$CONTAINER_ENGINE

# Build state-sync (if changed)
make build-state-sync CONTAINER_ENGINE=$CONTAINER_ENGINE

# Build public-api (if changed)
make build-public-api CONTAINER_ENGINE=$CONTAINER_ENGINE

# Or build all at once
make build-all CONTAINER_ENGINE=$CONTAINER_ENGINE

Step 4: Setup/Update Kind Cluster

If cluster doesn't exist:

# Create kind cluster
make kind-up

If cluster exists, load new images:

# Load images into kind
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kind load docker-image localhost/vteam_frontend:latest --name ambient-local
kind load docker-image localhost/vteam_operator:latest --name ambient-local
# ... for each rebuilt component

Step 5: Update Deployments

# Update deployments to use local images and Never pull policy
kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code
kubectl set image deployment/frontend frontend=localhost/vteam_frontend:latest -n ambient-code
kubectl set image deployment/operator operator=localhost/vteam_operator:latest -n ambient-code

# Update image pull policy
kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}'
kubectl patch deployment frontend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"frontend","imagePullPolicy":"Never"}]}}}}'
kubectl patch deployment operator -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"operator","imagePullPolicy":"Never"}]}}}}'

# Restart deployments to pick up new images
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout restart deployment/frontend -n ambient-code
kubectl rollout restart deployment/operator -n ambient-code

Step 6: Verify Deployment

# Wait for rollout to complete
kubectl rollout status deployment/backend -n ambient-code
kubectl rollout status deployment/frontend -n ambient-code
kubectl rollout status deployment/operator -n ambient-code

# Check pod status
kubectl get pods -n ambient-code

# Check for errors
kubectl get events -n ambient-code --sort-by='.lastTimestamp'

# Get pod details if issues
kubectl describe pod -l app=backend -n ambient-code
kubectl logs -l app=backend -n ambient-code --tail=50

Step 7: Provide Access Info

Detect the actual URL by checking the kind container's port mapping (see "Detecting the Access URL" above), then provide the correct URL to the user.

✓ Deployment complete!

Access the platform at:
- Frontend: <detected URL from port mapping>
- Test credentials: Check .env.test for the token

To view logs:
  kubectl logs -f -l app=backend -n ambient-code
  kubectl logs -f -l app=frontend -n ambient-code
  kubectl logs -f -l app=operator -n ambient-code

To teardown:
  make kind-down

Workflow: Testing Changes in Minikube

When a user wants to test in minikube:

Full Rebuild and Deploy

cd /workspace/repos/platform

# If cluster doesn't exist, this will create it and build everything
make local-up

# If cluster exists and you want to rebuild everything
make local-rebuild

Incremental Updates (Faster)

# Just rebuild and reload specific components
make local-reload-backend   # If only backend changed
make local-reload-frontend  # If only frontend changed
make local-reload-operator  # If only operator changed

Check Status

# Quick status check
make local-status

# Detailed troubleshooting
make local-troubleshoot

# Follow logs
make local-logs-backend
make local-logs-frontend
make local-logs-operator

Common Tasks

"Bring up a fresh cluster"

# With kind (uses Quay.io images)
make kind-up

# With minikube (builds from source)
make local-up

"Rebuild everything and test"

# With minikube
cd /workspace/repos/platform
make local-rebuild

# With kind (requires manual steps)
cd /workspace/repos/platform
make build-all
# Then load images and update deployments (see Step 4-5 above)

"Just rebuild the backend"

# With minikube
make local-reload-backend

# With kind
make build-backend
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout status deployment/backend -n ambient-code

"Show me the logs"

# With minikube
make local-logs-backend
make local-logs-frontend
make local-logs-operator

# With kind (or minikube, direct kubectl)
kubectl logs -f -l app=backend -n ambient-code
kubectl logs -f -l app=frontend -n ambient-code
kubectl logs -f -l app=operator -n ambient-code

"Tear down the cluster"

# With kind
make kind-down

# With minikube (keep cluster)
make local-down

# With minikube (delete cluster)
make local-clean

"Check if cluster is healthy"

# With minikube
make local-status
make local-test-quick

# With kind or any cluster
kubectl get pods -n ambient-code
kubectl get events -n ambient-code --sort-by='.lastTimestamp'
kubectl get deployments -n ambient-code

Troubleshooting

Pods stuck in ImagePullBackOff

Cause: Cluster trying to pull images from registry but they don't exist or aren't accessible

Solution for kind:

# Ensure images are built locally
make build-all

# Load images into kind
kind load docker-image localhost/vteam_backend:latest --name ambient-local
kind load docker-image localhost/vteam_frontend:latest --name ambient-local
kind load docker-image localhost/vteam_operator:latest --name ambient-local

# Update image pull policy
kubectl patch deployment backend -n ambient-code -p '{"spec":{"template":{"spec":{"containers":[{"name":"backend","imagePullPolicy":"Never"}]}}}}'

Solution for minikube:

# Minikube should handle this automatically, but if issues persist:
make local-rebuild

Pods stuck in CrashLoopBackOff

Cause: Application is crashing on startup

Solution:

# Check logs for the failing pod
kubectl logs -l app=backend -n ambient-code --tail=100

# Check pod events
kubectl describe pod -l app=backend -n ambient-code

# Common issues:
# - Missing environment variables
# - Database connection failures
# - Invalid configuration

Port forwarding not working

Cause: Port already in use or forwarding process died

Solution for minikube:

# Kill existing port-forward processes
pkill -f "kubectl port-forward"

# Restart port forwarding
make local-up  # Will setup port forwarding again

Solution for kind:

# Check NodePort mapping
kubectl get svc -n ambient-code

# Manually setup port forwarding if needed
make kind-port-forward

Changes not reflected

Cause: Old image cached or deployment not restarted

Solution:

# Force rebuild
make build-backend  # (or whatever component)

# Reload into cluster
kind load docker-image localhost/vteam_backend:latest --name ambient-local

# Force restart
kubectl rollout restart deployment/backend -n ambient-code
kubectl rollout status deployment/backend -n ambient-code

# Verify new pods are running
kubectl get pods -n ambient-code -l app=backend
kubectl describe pod -l app=backend -n ambient-code | grep Image:

Environment Variables

Key environment variables that affect cluster behavior:

# Container runtime (detect automatically — see "Detecting the Container Engine")
CONTAINER_ENGINE=docker  # or podman

# Build platform
PLATFORM=linux/amd64     # or linux/arm64

# Namespace
NAMESPACE=ambient-code

# Registry (for pushing images)
REGISTRY=quay.io/your-org

Fast Inner-Loop: Run Frontend Locally (No Image Rebuilds)

For frontend-only changes, skip image rebuilds entirely. Run NextJS locally with hot-reload against the backend in the kind cluster:

# Terminal 1: port-forward backend from kind cluster
kubectl port-forward svc/backend-service 8081:8080 -n ambient-code

# Terminal 2: set up frontend with auth token
cd components/frontend
npm install  # first time only

# Create .env.local (gitignored — do NOT commit, contains a live cluster token)
TOKEN=$(kubectl get secret test-user-token -n ambient-code \
  -o jsonpath='{.data.token}' | base64 -d)
cat > .env.local <<EOF
OC_TOKEN=$TOKEN
BACKEND_URL=http://localhost:8081/api
EOF

npm run dev
# Open http://localhost:3000

Why this works:

  • BACKEND_URL
    points NextJS API routes to the port-forwarded backend
  • OC_TOKEN
    is forwarded as both
    X-Forwarded-Access-Token
    and
    Authorization: Bearer
    headers (the backend's
    ExtractServiceAccountFromAuth
    reads
    Authorization
    for JWT parsing)
  • Every file save triggers instant hot-reload — no Docker build, no kind load, no rollout restart

Running sessions (not just browsing the UI):

With Vertex AI enabled (

setup-vertex-kind.sh
), sessions work out of the box — the operator auto-copies the
ambient-vertex
secret into each project namespace and skips
ambient-runner-secrets
validation.

With a direct Anthropic API key (no Vertex), you must create the runner secret in each project namespace manually:

kubectl create secret generic ambient-runner-secrets \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-... \
  -n <your-project-namespace>

When to use:

  • Frontend-only changes (components, styles, pages, API routes)
  • Iterating on UI features rapidly
  • Debugging frontend issues

When NOT to use:

  • Backend, operator, or runner changes (those still need image rebuild + load)
  • Testing changes to container configuration or deployment manifests

Best Practices

  1. Use local dev server for frontend: Fastest feedback loop, no image rebuilds needed
  2. Use kind for backend/operator validation: When you need to rebuild non-frontend components
  3. Use minikube for development: Better tooling for iterative development with
    local-reload-*
    commands
  4. Always check logs: After deploying, verify pods started successfully
  5. Clean up when done:
    make kind-down
    or
    make local-clean
    to free resources
  6. Check what changed first: Use
    git status
    and
    git diff
    to understand scope
  7. Build only what changed: Don't rebuild everything if only one component changed
  8. Verify image pull policy: Ensure deployments use
    imagePullPolicy: Never
    for local images

Quick Reference

Decision Tree: Which Cluster Type?

Do you need to test local code changes?
├─ No → Use kind (make kind-up)
│        Fast, uses production images
│
└─ Yes → Is the change frontend-only?
         ├─ Yes → Run locally with npm run dev
         │        Instant hot-reload, no image builds
         │
         └─ No → Do you need to iterate frequently?
                  ├─ No → Use kind with manual image loading
                  │        Good for one-off tests
                  │
                  └─ Yes → Use minikube (make local-up)
                           Best for development with hot-reload

Cheat Sheet

TaskKindMinikube
Create cluster
make kind-up
make local-up
Rebuild allBuild + load + update
make local-rebuild
Rebuild backendBuild + load + restart
make local-reload-backend
Check status
kubectl get pods -n ambient-code
make local-status
View logs
kubectl logs -f -l app=backend -n ambient-code
make local-logs-backend
Tear down
make kind-down
make local-clean
Access URLDetect from port mapping (Docker:
:80
, Podman:
:8080
)
http://localhost:3000

When to Invoke This Skill

Invoke the dev-cluster skill when users say things like:

  • "Test this in kind"
  • "Deploy my changes to a local cluster"
  • "Bring up a development environment"
  • "I want to test these changes"
  • "Create a kind cluster with my changes"
  • "Rebuild and deploy the backend"
  • "How do I test platform changes locally?"

Example Interactions

Example 1: Quick Test in Kind

User: "Test this changeset in kind"

Assistant (using dev-cluster skill):

  1. Checks git status → finds backend changes
  2. Explains: "I see changes in components/backend. I'll build the backend image, create a kind cluster, and deploy your changes."
  3. Runs:
    make build-backend
  4. Runs:
    make kind-up
  5. Loads image:
    kind load docker-image localhost/vteam_backend:latest --name ambient-local
  6. Updates deployment with local image and ImagePullPolicy: Never
  7. Verifies:
    kubectl rollout status deployment/backend -n ambient-code
  8. Provides access URL and log commands

Result: User can test their backend changes at the detected URL (http://localhost for Docker, http://localhost:8080 for Podman)

Example 2: Incremental Development with Minikube

User: "I'm working on the frontend, set me up for quick iterations"

Assistant (using dev-cluster skill):

  1. Runs:
    make local-up
    (creates cluster, builds all images)
  2. Explains the reload commands available
  3. User makes frontend changes
  4. User says: "Reload the frontend"
  5. Runs:
    make local-reload-frontend
  6. Verifies deployment
  7. User continues iterating with quick reload cycles

Result: Fast iteration loop for frontend development

Example 3: Troubleshooting Deployment

User: "The backend pod is crash looping"

Assistant (using dev-cluster skill):

  1. Checks pod status:
    kubectl get pods -n ambient-code
  2. Gets logs:
    kubectl logs -l app=backend -n ambient-code --tail=100
  3. Analyzes error messages
  4. Checks events:
    kubectl get events -n ambient-code --sort-by='.lastTimestamp'
  5. Identifies issue (e.g., missing env var, bad configuration)
  6. Suggests fix
  7. After fix applied, verifies:
    kubectl rollout status deployment/backend -n ambient-code

Result: Issue diagnosed and resolved

Integration with Makefile

This skill knows all the relevant Makefile targets in /workspace/repos/platform:

  • make kind-up
    - Create kind cluster
  • make kind-down
    - Destroy kind cluster
  • make local-up
    - Create minikube cluster with local builds
  • make local-down
    - Stop minikube services
  • make local-clean
    - Delete minikube cluster
  • make local-rebuild
    - Rebuild all and restart
  • make local-reload-backend
    - Rebuild/reload backend only
  • make local-reload-frontend
    - Rebuild/reload frontend only
  • make local-reload-operator
    - Rebuild/reload operator only
  • make build-all
    - Build all container images
  • make build-backend
    - Build backend image only
  • make build-frontend
    - Build frontend image only
  • make build-operator
    - Build operator image only
  • make local-status
    - Check pod status
  • make local-logs-backend
    - Follow backend logs
  • make local-logs-frontend
    - Follow frontend logs
  • make local-logs-operator
    - Follow operator logs