Claude-skill-registry infrastructure-cost-optimization
Optimize cloud infrastructure costs through resource rightsizing, reserved instances, spot instances, and waste reduction strategies.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/infrastructure-cost-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-infrastructure-cost-optimization && rm -rf "$T"
manifest:
skills/data/infrastructure-cost-optimization/SKILL.mdsource content
Infrastructure Cost Optimization
Overview
Reduce infrastructure costs through intelligent resource allocation, reserved instances, spot instances, and continuous optimization without sacrificing performance.
When to Use
- Cloud cost reduction
- Budget management and tracking
- Resource utilization optimization
- Multi-environment cost allocation
- Waste identification and elimination
- Reserved instance planning
- Spot instance integration
Implementation Examples
1. AWS Cost Optimization Configuration
# cost-optimization-setup.yaml apiVersion: v1 kind: ConfigMap metadata: name: cost-optimization-scripts namespace: operations data: analyze-costs.sh: | #!/bin/bash set -euo pipefail echo "=== AWS Cost Analysis ===" # Get daily cost trend echo "Daily costs for last 7 days:" aws ce get-cost-and-usage \ --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \ --granularity DAILY \ --metrics "BlendedCost" \ --group-by Type=DIMENSION,Key=SERVICE \ --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \ --output table # Find unattached resources echo -e "\n=== Unattached EBS Volumes ===" aws ec2 describe-volumes \ --filters Name=status,Values=available \ --query 'Volumes[*].[VolumeId,Size,CreateTime]' \ --output table echo -e "\n=== Unattached Elastic IPs ===" aws ec2 describe-addresses \ --filters Name=association-id,Values=none \ --query 'Addresses[*].[PublicIp,AllocationId]' \ --output table echo -e "\n=== Unused RDS Instances ===" aws rds describe-db-instances \ --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \ --output table # Estimate savings with Reserved Instances echo -e "\n=== Reserved Instance Savings Potential ===" aws ce get-reservation-purchase-recommendation \ --service "EC2" \ --lookback-period THIRTY_DAYS \ --query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \ --output table optimize-resources.sh: | #!/bin/bash set -euo pipefail echo "Starting resource optimization..." # Remove unattached volumes echo "Removing unattached volumes..." aws ec2 describe-volumes \ --filters Name=status,Values=available \ --query 'Volumes[*].VolumeId' \ --output text | \ while read volume_id; do echo "Deleting volume: $volume_id" aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true done # Release unused Elastic IPs echo "Releasing unused Elastic IPs..." aws ec2 describe-addresses \ --filters Name=association-id,Values=none \ --query 'Addresses[*].AllocationId' \ --output text | \ while read alloc_id; do echo "Releasing EIP: $alloc_id" aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true done # Modify RDS to smaller instances echo "Analyzing RDS for downsizing..." # Implement logic to check CloudWatch metrics and downsize if needed echo "Optimization complete" --- # Terraform cost optimization resource "aws_ec2_instance" "spot" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium" # Use spot instances for non-critical workloads instance_market_options { market_type = "spot" spot_options { max_price = "0.05" # Set max price spot_instance_type = "persistent" interrupt_behavior = "terminate" valid_until = "2025-12-31T23:59:59Z" } } tags = { Name = "spot-instance" CostCenter = "engineering" } } # Reserved instance for baseline capacity resource "aws_ec2_instance" "reserved" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium" # Tag for reserved instance matching tags = { Name = "reserved-instance" ReservationType = "reserved" } } resource "aws_ec2_fleet" "mixed" { name = "mixed-capacity" launch_template_configs { launch_template_specification { launch_template_id = aws_launch_template.app.id version = "$Latest" } overrides { instance_type = "t3.medium" weighted_capacity = "1" priority = 1 # Reserved } overrides { instance_type = "t3.large" weighted_capacity = "2" priority = 2 # Reserved } overrides { instance_type = "t3a.medium" weighted_capacity = "1" priority = 3 # Spot } overrides { instance_type = "t3a.large" weighted_capacity = "2" priority = 4 # Spot } } target_capacity_specification { total_target_capacity = 10 on_demand_target_capacity = 6 spot_target_capacity = 4 default_target_capacity_type = "on-demand" } fleet_type = "maintain" }
2. Kubernetes Cost Optimization
# k8s-cost-optimization.yaml apiVersion: v1 kind: ConfigMap metadata: name: cost-optimization-policies namespace: kube-system data: policies.yaml: | # Resource quotas per namespace apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: production spec: hard: requests.cpu: "100" requests.memory: "200Gi" limits.cpu: "200" limits.memory: "400Gi" pods: "500" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["high", "medium"] --- # Pod Disruption Budget for cost-effective scaling apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: cost-optimized-pdb namespace: production spec: minAvailable: 1 selector: matchLabels: tier: backend --- # Prioritize spot instances with taints/tolerations apiVersion: v1 kind: Node metadata: name: spot-node-1 spec: taints: - key: cloud.google.com/gke-preemptible value: "true" effect: NoSchedule --- apiVersion: apps/v1 kind: Deployment metadata: name: cost-optimized-app namespace: production spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: # Tolerate spot instances tolerations: - key: cloud.google.com/gke-preemptible operator: Equal value: "true" effect: NoSchedule # Prefer nodes with lower cost affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: karpenter.sh/capacity-type operator: In values: ["spot"] containers: - name: app image: myapp:latest resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi
3. Cost Monitoring Dashboard
# cost-monitoring.py import boto3 import json from datetime import datetime, timedelta class CostOptimizer: def __init__(self): self.ce_client = boto3.client('ce') self.ec2_client = boto3.client('ec2') self.rds_client = boto3.client('rds') def get_daily_costs(self, days=30): """Get daily costs for past N days""" end_date = datetime.now().date() start_date = end_date - timedelta(days=days) response = self.ce_client.get_cost_and_usage( TimePeriod={ 'Start': str(start_date), 'End': str(end_date) }, Granularity='DAILY', Metrics=['BlendedCost'], GroupBy=[ {'Type': 'DIMENSION', 'Key': 'SERVICE'} ] ) return response def find_underutilized_instances(self): """Find EC2 instances with low CPU usage""" cloudwatch = boto3.client('cloudwatch') instances = [] ec2_instances = self.ec2_client.describe_instances() for reservation in ec2_instances['Reservations']: for instance in reservation['Instances']: instance_id = instance['InstanceId'] # Check CPU utilization response = cloudwatch.get_metric_statistics( Namespace='AWS/EC2', MetricName='CPUUtilization', Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}], StartTime=datetime.now() - timedelta(days=7), EndTime=datetime.now(), Period=3600, Statistics=['Average'] ) if response['Datapoints']: avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints']) if avg_cpu < 10: # Less than 10% average instances.append({ 'InstanceId': instance_id, 'Type': instance['InstanceType'], 'AverageCPU': avg_cpu, 'Recommendation': 'Downsize or terminate' }) return instances def estimate_reserved_instance_savings(self): """Estimate potential savings from reserved instances""" response = self.ce_client.get_reservation_purchase_recommendation( Service='EC2', LookbackPeriod='THIRTY_DAYS', PageSize=100 ) total_savings = 0 for recommendation in response.get('Recommendations', []): summary = recommendation['RecommendationSummary'] savings = float(summary['EstimatedMonthlyMonthlySavingsAmount']) total_savings += savings return total_savings def generate_report(self): """Generate comprehensive cost optimization report""" print("=== Cost Optimization Report ===\n") # Daily costs print("Daily Costs:") costs = self.get_daily_costs(7) for result in costs['ResultsByTime']: date = result['TimePeriod']['Start'] total = result['Total']['BlendedCost']['Amount'] print(f" {date}: ${total}") # Underutilized instances print("\nUnderutilized Instances:") underutilized = self.find_underutilized_instances() for instance in underutilized: print(f" {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}") # Reserved instance savings print("\nReserved Instance Savings Potential:") savings = self.estimate_reserved_instance_savings() print(f" Estimated Monthly Savings: ${savings:.2f}") # Usage if __name__ == '__main__': optimizer = CostOptimizer() optimizer.generate_report()
Cost Optimization Strategies
✅ DO
- Use reserved instances for baseline
- Leverage spot instances
- Right-size resources
- Monitor cost trends
- Implement auto-scaling
- Use multi-region pricing
- Tag resources consistently
- Schedule non-essential resources
❌ DON'T
- Over-provision resources
- Ignore unused resources
- Neglect cost monitoring
- Run all on-demand
- Forget to release EIPs
- Mix cost centers
- Ignore savings opportunities
- Deploy without budgets
Cost Saving Opportunities
- Reserved Instances: 40-70% savings
- Spot Instances: 70-90% savings
- Committed Use Discounts: 25-55% savings
- Right-sizing: 10-30% savings
- Resource cleanup: 5-20% savings