Claude-code-plugins castai-performance-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/castai-pack/skills/castai-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-castai-performance-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/castai-pack/skills/castai-performance-tuning/SKILL.mdsource content
CAST AI Performance Tuning
Overview
Tune CAST AI for faster node provisioning, more responsive autoscaling, and efficient API usage. Covers headroom configuration, instance family selection, and API caching for multi-cluster dashboards.
Prerequisites
- CAST AI Phase 2 (full automation) enabled
- Understanding of workload scheduling patterns
- Access to autoscaler policy configuration
Instructions
Step 1: Optimize Node Provisioning Speed
# Configure headroom for proactive scaling (avoids waiting for pending pods) curl -X PUT -H "X-API-Key: ${CASTAI_API_KEY}" \ -H "Content-Type: application/json" \ "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \ -d '{ "enabled": true, "unschedulablePods": { "enabled": true, "headroom": { "enabled": true, "cpuPercentage": 15, "memoryPercentage": 15 } } }'
Headroom pre-provisions spare capacity so pods schedule immediately instead of waiting 2-5 minutes for new nodes.
Step 2: Instance Family Optimization
# Terraform: Prefer instance families with fast launch times resource "castai_node_template" "fast_launch" { cluster_id = castai_eks_cluster.this.id name = "fast-launch-workers" constraints { spot = true use_spot_fallbacks = true fallback_restore_rate_seconds = 300 # Newer instance types launch faster and have better availability instance_families { include = ["m6i", "m7i", "c6i", "c7i", "r6i", "r7i"] } # Enable spot diversity for faster provisioning spot_diversity_price_increase_limit_percent = 25 architectures = ["amd64"] } }
Step 3: Evictor Tuning for Faster Consolidation
# Reduce empty node delay for dev/staging (faster downscale) helm upgrade castai-evictor castai-helm/castai-evictor \ -n castai-agent \ --reuse-values \ --set evictor.aggressiveMode=true \ --set evictor.cycleInterval=120 # For production, use non-aggressive with longer intervals # --set evictor.aggressiveMode=false # --set evictor.cycleInterval=600
Step 4: API Performance for Multi-Cluster Dashboards
import { LRUCache } from "lru-cache"; const cache = new LRUCache<string, unknown>({ max: 100, ttl: 60_000 }); interface ClusterSummary { id: string; name: string; savings: number; savingsPercent: number; nodeCount: number; spotPercent: number; } async function getClusterSummary(clusterId: string): Promise<ClusterSummary> { const cacheKey = `summary:${clusterId}`; const cached = cache.get(cacheKey) as ClusterSummary | undefined; if (cached) return cached; const [cluster, savings, nodes] = await Promise.all([ castaiGet(`/v1/kubernetes/external-clusters/${clusterId}`), castaiGet(`/v1/kubernetes/clusters/${clusterId}/savings`), castaiGet(`/v1/kubernetes/external-clusters/${clusterId}/nodes`), ]); const spotNodes = nodes.items.filter( (n: { lifecycle: string }) => n.lifecycle === "spot" ).length; const summary: ClusterSummary = { id: clusterId, name: cluster.name, savings: savings.monthlySavings, savingsPercent: savings.savingsPercentage, nodeCount: nodes.items.length, spotPercent: nodes.items.length > 0 ? (spotNodes / nodes.items.length) * 100 : 0, }; cache.set(cacheKey, summary); return summary; } // Aggregate across all clusters async function getDashboardData( clusterIds: string[] ): Promise<ClusterSummary[]> { return Promise.all(clusterIds.map(getClusterSummary)); }
Step 5: Workload Autoscaler Tuning
# Faster resource adjustment with shorter cooldown # (use with caution in production) metadata: annotations: autoscaling.cast.ai/cpu-headroom: "10" # Lower headroom = tighter fit autoscaling.cast.ai/memory-headroom: "15" autoscaling.cast.ai/apply-type: "immediate" # Apply without waiting
Performance Benchmarks
| Metric | Default | Tuned |
|---|---|---|
| Node provision time | 3-5 min | 1-3 min (with headroom) |
| Empty node removal | 5 min | 2 min (aggressive evictor) |
| Workload resize | 5 min cooldown | Immediate |
| API response (cached) | 200ms | <5ms |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Headroom over-provisioning | Percentage too high | Reduce to 5-10% |
| Aggressive evictor causing disruptions | PDB not set | Add PodDisruptionBudgets |
| Cache stale data | TTL too long | Reduce cache TTL to 30s |
| Instance type unavailable | Too narrow constraints | Add more instance families |
Resources
Next Steps
For cost optimization strategies, see
castai-cost-tuning.