Claude-skill-registry cloud-infrastructure

Cloud platforms (AWS, Cloudflare, GCP, Azure), containerization (Docker), Kubernetes, Infrastructure as Code (Terraform), CI/CD, and observability.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cloud-infrastructure" ~/.claude/skills/majiayu000-claude-skill-registry-cloud-infrastructure && rm -rf "$T"

manifest: skills/data/cloud-infrastructure/SKILL.md

Cloud Infrastructure Skill

Quick Reference

Platform	Market	Best For	Learning
AWS	32%	Everything	3-6 mo
Azure	24%	Microsoft stack	3-6 mo
GCP	11%	Data, ML	3-6 mo
Cloudflare	Edge	CDN, Workers	2-4 wk

Learning Paths

AWS

[1] IAM + VPC (1-2 wk)
 │  └─ Roles, policies, networking
 │
 ▼
[2] Compute: EC2, Lambda (2-3 wk)
 │
 ▼
[3] Storage: S3, EBS (1-2 wk)
 │
 ▼
[4] Database: RDS, DynamoDB (2-3 wk)
 │
 ▼
[5] Containers: ECS, EKS (3-4 wk)
 │
 ▼
[6] Monitoring: CloudWatch (1-2 wk)

Docker & Containers

[1] Docker Basics (1 wk)
 │  └─ Images, containers, Dockerfile
 │
 ▼
[2] Multi-stage Builds (1 wk)
 │  └─ Optimization, layer caching
 │
 ▼
[3] Docker Compose (1 wk)
 │  └─ Multi-container apps
 │
 ▼
[4] Registry & Security (1 wk)
    └─ Push/pull, scanning, non-root

Kubernetes

[1] Pods & Deployments (2 wk)
 │
 ▼
[2] Services & Networking (1-2 wk)
 │
 ▼
[3] ConfigMaps & Secrets (1 wk)
 │
 ▼
[4] Helm Charts (2 wk)
 │
 ▼
[5] Production Patterns (ongoing)
    └─ HPA, PDB, resource limits

Terraform (IaC)

[1] Resources & State (1 wk)
 │
 ▼
[2] Variables & Outputs (1 wk)
 │
 ▼
[3] Modules (1-2 wk)
 │
 ▼
[4] Remote State (1 wk)
 │
 ▼
[5] Workspaces & Environments (1 wk)

Kubernetes Quick Reference

Resource	Purpose	Example
Pod	Smallest unit	Single container
Deployment	Manage replicas	Web app
Service	Network access	ClusterIP, LoadBalancer
Ingress	HTTP routing	Path-based routing
ConfigMap	Configuration	Environment variables
Secret	Sensitive data	Credentials
StatefulSet	Stateful apps	Databases

Terraform Structure

project/
├── main.tf           # Resources
├── variables.tf      # Inputs
├── outputs.tf        # Outputs
├── providers.tf      # Provider config
├── versions.tf       # Version constraints
├── modules/
│   ├── vpc/
│   ├── eks/
│   └── rds/
└── environments/
    ├── dev.tfvars
    ├── staging.tfvars
    └── prod.tfvars

CI/CD Pipeline Template

# GitHub Actions
name: CI/CD
on:
  push:
    branches: [main]
jobs:
  build-test-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: docker build -t app .
      - name: Test
        run: docker run app pytest
      - name: Push
        run: docker push registry/app:${{ github.sha }}
      - name: Deploy
        if: github.ref == 'refs/heads/main'
        run: kubectl set image deployment/app app=registry/app:${{ github.sha }}

Monitoring Stack

┌─────────────────────────────────────────┐
│         OBSERVABILITY STACK              │
├─────────────────────────────────────────┤
│  Metrics:  Prometheus → Grafana         │
│  Logs:     Loki / ELK                   │
│  Traces:   Jaeger / Tempo               │
│  Alerts:   Alertmanager → PagerDuty     │
└─────────────────────────────────────────┘

Troubleshooting

Container not starting?
├─► docker logs <container>
├─► Check port conflicts
├─► Check image name/tag
└─► Check resource limits

Pod in CrashLoopBackOff?
├─► kubectl describe pod <name>
├─► kubectl logs <pod>
├─► Check resource limits
├─► Check probes configuration
└─► Check image pull secrets

Terraform apply fails?
├─► terraform plan first
├─► Check state lock
├─► terraform import existing
└─► Restore state from backup

High cloud bill?
├─► Enable cost alerts
├─► Right-size instances
├─► Use spot instances
├─► Delete unused resources
└─► Storage lifecycle policies

Common Failure Modes

Symptom	Root Cause	Recovery
Pod CrashLoopBackOff	App error or OOM	Check logs, increase limits
ImagePullBackOff	Wrong image or auth	Verify image, check secrets
Terraform drift	Manual changes	Import or terraform apply
Slow deploys	Large images	Multi-stage builds, layer caching

Best Practices

Docker

Use multi-stage builds
Run as non-root user
Use .dockerignore
Pin base image versions
Scan for vulnerabilities

Kubernetes

Set resource requests/limits
Use readiness/liveness probes
Store config in ConfigMaps
Use namespaces for isolation
Enable network policies

Terraform

Use remote state (S3, GCS)
Lock state file
Use modules for reuse
Plan before apply
Tag all resources

Next Actions

Specify your cloud platform and focus area for detailed guidance.