Skillshub castai-upgrade-migration
install
source · Clone the upstream repo
git clone https://github.com/ComeOnOliver/skillshub
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/jeremylongshore/claude-code-plugins-plus-skills/castai-upgrade-migration" ~/.claude/skills/comeonoliver-skillshub-castai-upgrade-migration && rm -rf "$T"
manifest:
skills/jeremylongshore/claude-code-plugins-plus-skills/castai-upgrade-migration/SKILL.mdsource content
CAST AI Upgrade & Migration
Overview
Upgrade CAST AI components: Helm charts for the agent/controller/evictor, Terraform provider version, and workload autoscaler. Includes rollback procedures for each component.
Prerequisites
- Current CAST AI components installed
- Staging cluster for testing upgrades first
- Helm and kubectl access
- Change management approval for production
Instructions
Step 1: Check Current Versions
# Helm chart versions helm list -n castai-agent -o json | jq '.[] | {name: .name, chart: .chart, appVersion: .app_version}' # Available versions helm repo update helm search repo castai-helm --versions | head -20 # Terraform provider version grep "castai/castai" .terraform.lock.hcl terraform providers
Step 2: Review Changelog
# Check CAST AI changelog for breaking changes # https://docs.cast.ai/changelog/ # Check Terraform provider releases # https://github.com/castai/terraform-provider-castai/releases
Step 3: Upgrade on Staging First
# Upgrade agent helm upgrade castai-agent castai-helm/castai-agent \ -n castai-agent --reuse-values # Upgrade cluster controller helm upgrade cluster-controller castai-helm/castai-cluster-controller \ -n castai-agent --reuse-values # Upgrade evictor helm upgrade castai-evictor castai-helm/castai-evictor \ -n castai-agent --reuse-values # Upgrade workload autoscaler helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \ -n castai-agent --reuse-values # Verify all pods restarted cleanly kubectl get pods -n castai-agent -w
Step 4: Upgrade Terraform Provider
# Update version constraint in versions.tf terraform { required_providers { castai = { source = "castai/castai" version = "~> 7.5" # Update to target version } } }
# Upgrade provider terraform init -upgrade terraform plan -var-file=environments/staging.tfvars # Review plan carefully for resource recreation # Apply if plan looks safe terraform apply -var-file=environments/staging.tfvars
Step 5: Validate After Upgrade
# Verify agent is online curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \ "https://api.cast.ai/v1/kubernetes/external-clusters/${CASTAI_CLUSTER_ID}" \ | jq '{agentStatus, name}' # Verify autoscaler policies still applied curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \ "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \ | jq '.enabled' # Check for errors in new agent version kubectl logs -n castai-agent deployment/castai-agent --tail=50 | grep -i error
Rollback Procedure
# Helm rollback to previous release helm rollback castai-agent -n castai-agent helm rollback cluster-controller -n castai-agent # Terraform rollback terraform plan -var-file=environments/staging.tfvars # Review # If needed, pin previous provider version: # version = "= 7.4.2" terraform init -upgrade terraform apply -var-file=environments/staging.tfvars
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Agent CrashLoop after upgrade | Breaking chart change | to previous |
| Terraform plan shows destroy | Major provider version jump | Pin intermediate version |
| Policies reset after upgrade | Chart default values changed | Pass |
| Spot handler incompatible | Node format changed | Upgrade all components together |
Resources
Next Steps
For CI pipeline integration, see
castai-ci-integration.