Claude-skill-registry autoscaling-configuration
Configure autoscaling for Kubernetes, VMs, and serverless workloads based on metrics, schedules, and custom indicators.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/autoscaling-configuration" ~/.claude/skills/majiayu000-claude-skill-registry-autoscaling-configuration && rm -rf "$T"
manifest:
skills/data/autoscaling-configuration/SKILL.mdsource content
Autoscaling Configuration
Overview
Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.
When to Use
- Traffic-driven workload scaling
- Time-based scheduled scaling
- Resource utilization optimization
- Cost reduction
- High-traffic event handling
- Batch processing optimization
- Database connection pooling
Implementation Examples
1. Kubernetes Horizontal Pod Autoscaler
# hpa-configuration.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max --- # Vertical Pod Autoscaler for resource optimization apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory
2. AWS Auto Scaling
# aws-autoscaling.yaml apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail ASG_NAME="myapp-asg" MIN_SIZE=2 MAX_SIZE=10 DESIRED_CAPACITY=3 TARGET_CPU=70 TARGET_MEMORY=80 echo "Creating Auto Scaling Group..." # Create launch template aws ec2 create-launch-template \ --launch-template-name myapp-template \ --version-description "Production version" \ --launch-template-data '{ "ImageId": "ami-0c55b159cbfafe1f0", "InstanceType": "t3.medium", "KeyName": "myapp-key", "SecurityGroupIds": ["sg-0123456789abcdef0"], "UserData": "#!/bin/bash\ncd /app && docker-compose up -d", "TagSpecifications": [{ "ResourceType": "instance", "Tags": [{"Key": "Name", "Value": "myapp-instance"}] }] }' || true # Create Auto Scaling Group aws autoscaling create-auto-scaling-group \ --auto-scaling-group-name "$ASG_NAME" \ --launch-template LaunchTemplateName=myapp-template \ --min-size $MIN_SIZE \ --max-size $MAX_SIZE \ --desired-capacity $DESIRED_CAPACITY \ --availability-zones us-east-1a us-east-1b us-east-1c \ --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \ --health-check-type ELB \ --health-check-grace-period 300 \ --tags "Key=Name,Value=myapp,PropagateAtLaunch=true" # Create CPU scaling policy aws autoscaling put-scaling-policy \ --auto-scaling-group-name "$ASG_NAME" \ --policy-name myapp-cpu-scaling \ --policy-type TargetTrackingScaling \ --target-tracking-configuration '{ "TargetValue": '$TARGET_CPU', "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 300 }' echo "Auto Scaling Group created: $ASG_NAME" --- apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec: # Scale up at 8 AM - schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity \ --auto-scaling-group-name myapp-asg \ --desired-capacity 10 restartPolicy: OnFailure # Scale down at 6 PM - schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity \ --auto-scaling-group-name myapp-asg \ --desired-capacity 3 restartPolicy: OnFailure
3. Custom Metrics Autoscaling
# custom-metrics-hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100" # Request rate from custom metrics - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" # Custom business metric - type: Pods pods: metric: name: active_connections target: type: AverageValue averageValue: "500" --- # Prometheus ServiceMonitor for custom metrics apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics
4. Autoscaling Script
#!/bin/bash # autoscaling-setup.sh - Complete autoscaling configuration set -euo pipefail ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}" echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT" # Create HPA cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF echo "HPA created successfully" # Monitor autoscaling echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
5. Monitoring Autoscaling
# autoscaling-monitoring.yaml apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas" - alert: HpaMinedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas for: 30m labels: severity: info annotations: summary: "HPA {{ $labels.hpa }} is at minimum replicas" - alert: AsgCapacityLow expr: | aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2 for: 10m labels: severity: warning annotations: summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
Best Practices
✅ DO
- Set appropriate min/max replicas
- Monitor metric aggregation window
- Implement cooldown periods
- Use multiple metrics
- Test scaling behavior
- Monitor scaling events
- Plan for peak loads
- Implement fallback strategies
❌ DON'T
- Set min replicas to 1
- Scale too aggressively
- Ignore cooldown periods
- Use single metric only
- Forget to test scaling
- Scale below resource needs
- Neglect monitoring
- Deploy without capacity tests
Scaling Metrics
- CPU Utilization: Most common metric
- Memory Utilization: Heap-bound applications
- Request Rate: API-driven scaling
- Queue Depth: Async job processing
- Custom Metrics: Business-specific indicators