Claude-skill-registry cloud-infrastructure

Cloud platforms (AWS, Cloudflare, GCP, Azure), containerization (Docker), Kubernetes, Infrastructure as Code (Terraform), CI/CD, and observability.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/cloud-infrastructure" ~/.claude/skills/majiayu000-claude-skill-registry-cloud-infrastructure && rm -rf "$T"
manifest: skills/data/cloud-infrastructure/SKILL.md
source content

Cloud Infrastructure Skill

Quick Reference

PlatformMarketBest ForLearning
AWS32%Everything3-6 mo
Azure24%Microsoft stack3-6 mo
GCP11%Data, ML3-6 mo
CloudflareEdgeCDN, Workers2-4 wk

Learning Paths

AWS

[1] IAM + VPC (1-2 wk)
 │  └─ Roles, policies, networking
 │
 ▼
[2] Compute: EC2, Lambda (2-3 wk)
 │
 ▼
[3] Storage: S3, EBS (1-2 wk)
 │
 ▼
[4] Database: RDS, DynamoDB (2-3 wk)
 │
 ▼
[5] Containers: ECS, EKS (3-4 wk)
 │
 ▼
[6] Monitoring: CloudWatch (1-2 wk)

Docker & Containers

[1] Docker Basics (1 wk)
 │  └─ Images, containers, Dockerfile
 │
 ▼
[2] Multi-stage Builds (1 wk)
 │  └─ Optimization, layer caching
 │
 ▼
[3] Docker Compose (1 wk)
 │  └─ Multi-container apps
 │
 ▼
[4] Registry & Security (1 wk)
    └─ Push/pull, scanning, non-root

Kubernetes

[1] Pods & Deployments (2 wk)
 │
 ▼
[2] Services & Networking (1-2 wk)
 │
 ▼
[3] ConfigMaps & Secrets (1 wk)
 │
 ▼
[4] Helm Charts (2 wk)
 │
 ▼
[5] Production Patterns (ongoing)
    └─ HPA, PDB, resource limits

Terraform (IaC)

[1] Resources & State (1 wk)
 │
 ▼
[2] Variables & Outputs (1 wk)
 │
 ▼
[3] Modules (1-2 wk)
 │
 ▼
[4] Remote State (1 wk)
 │
 ▼
[5] Workspaces & Environments (1 wk)

Kubernetes Quick Reference

ResourcePurposeExample
PodSmallest unitSingle container
DeploymentManage replicasWeb app
ServiceNetwork accessClusterIP, LoadBalancer
IngressHTTP routingPath-based routing
ConfigMapConfigurationEnvironment variables
SecretSensitive dataCredentials
StatefulSetStateful appsDatabases

Terraform Structure

project/
├── main.tf           # Resources
├── variables.tf      # Inputs
├── outputs.tf        # Outputs
├── providers.tf      # Provider config
├── versions.tf       # Version constraints
├── modules/
│   ├── vpc/
│   ├── eks/
│   └── rds/
└── environments/
    ├── dev.tfvars
    ├── staging.tfvars
    └── prod.tfvars

CI/CD Pipeline Template

# GitHub Actions
name: CI/CD
on:
  push:
    branches: [main]
jobs:
  build-test-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: docker build -t app .
      - name: Test
        run: docker run app pytest
      - name: Push
        run: docker push registry/app:${{ github.sha }}
      - name: Deploy
        if: github.ref == 'refs/heads/main'
        run: kubectl set image deployment/app app=registry/app:${{ github.sha }}

Monitoring Stack

┌─────────────────────────────────────────┐
│         OBSERVABILITY STACK              │
├─────────────────────────────────────────┤
│  Metrics:  Prometheus → Grafana         │
│  Logs:     Loki / ELK                   │
│  Traces:   Jaeger / Tempo               │
│  Alerts:   Alertmanager → PagerDuty     │
└─────────────────────────────────────────┘

Troubleshooting

Container not starting?
├─► docker logs <container>
├─► Check port conflicts
├─► Check image name/tag
└─► Check resource limits

Pod in CrashLoopBackOff?
├─► kubectl describe pod <name>
├─► kubectl logs <pod>
├─► Check resource limits
├─► Check probes configuration
└─► Check image pull secrets

Terraform apply fails?
├─► terraform plan first
├─► Check state lock
├─► terraform import existing
└─► Restore state from backup

High cloud bill?
├─► Enable cost alerts
├─► Right-size instances
├─► Use spot instances
├─► Delete unused resources
└─► Storage lifecycle policies

Common Failure Modes

SymptomRoot CauseRecovery
Pod CrashLoopBackOffApp error or OOMCheck logs, increase limits
ImagePullBackOffWrong image or authVerify image, check secrets
Terraform driftManual changesImport or terraform apply
Slow deploysLarge imagesMulti-stage builds, layer caching

Best Practices

Docker

  • Use multi-stage builds
  • Run as non-root user
  • Use .dockerignore
  • Pin base image versions
  • Scan for vulnerabilities

Kubernetes

  • Set resource requests/limits
  • Use readiness/liveness probes
  • Store config in ConfigMaps
  • Use namespaces for isolation
  • Enable network policies

Terraform

  • Use remote state (S3, GCS)
  • Lock state file
  • Use modules for reuse
  • Plan before apply
  • Tag all resources

Next Actions

Specify your cloud platform and focus area for detailed guidance.