Claude-skill-registry infrastructure-expert
Expert infrastructure design including networking, compute, storage, and operations
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/infrastructure-expert" ~/.claude/skills/majiayu000-claude-skill-registry-infrastructure-expert && rm -rf "$T"
manifest:
skills/data/infrastructure-expert/SKILL.mdsource content
Infrastructure Expert
Purpose
Design robust infrastructure including networking, compute resources, storage systems, and operational practices.
Activation Keywords
- infrastructure, infra
- networking, VPC, subnet
- compute, servers, instances
- storage, disk, volume
- operations, SRE
Core Capabilities
1. Networking
- VPC design
- Subnet planning
- Security groups
- Load balancers
- DNS/CDN
2. Compute
- Instance selection
- Container orchestration
- Serverless
- Spot/Preemptible
- Reserved capacity
3. Storage
- Block storage
- Object storage
- File storage
- Backup strategies
- Data lifecycle
4. Operations
- Monitoring
- Logging
- Alerting
- Incident response
- Runbooks
5. Disaster Recovery
- RPO/RTO definitions
- Backup verification
- Failover testing
- Multi-region design
Network Architecture
VPC Design: ┌─────────────────────────────────────┐ │ VPC (10.0.0.0/16) │ │ ├─ Public Subnet (10.0.1.0/24) │ │ │ └─ NAT Gateway, Bastion │ │ ├─ Private Subnet (10.0.2.0/24) │ │ │ └─ Application servers │ │ └─ Data Subnet (10.0.3.0/24) │ │ └─ Databases │ └─────────────────────────────────────┘
Infrastructure as Code
# Terraform example module "vpc" { source = "./modules/vpc" name = "production" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] enable_nat_gateway = true single_nat_gateway = false # High availability tags = { Environment = "production" Terraform = "true" } }
Storage Selection Guide
| Use Case | Storage Type | Service |
|---|---|---|
| OS/App data | Block | EBS/Persistent Disk |
| Static files | Object | S3/Cloud Storage |
| Shared files | File | EFS/Filestore |
| Database | Block (high IOPS) | io2/SSD |
| Backup | Object (cold) | Glacier/Coldline |
Operational Checklist
## Monitoring - [ ] System metrics (CPU, Memory, Disk) - [ ] Application metrics - [ ] Business metrics - [ ] Synthetic monitoring ## Logging - [ ] Centralized logging - [ ] Log retention policy - [ ] Log analysis/search - [ ] Audit logs ## Alerting - [ ] Critical alerts → PagerDuty - [ ] Warning alerts → Slack - [ ] Alert runbooks linked - [ ] On-call rotation ## Security - [ ] Security groups reviewed - [ ] Access logs enabled - [ ] Patch management - [ ] Vulnerability scanning ## Backup - [ ] Automated backups - [ ] Cross-region replication - [ ] Restore testing (quarterly) - [ ] Backup monitoring
Disaster Recovery Tiers
| Tier | RPO | RTO | Strategy |
|---|---|---|---|
| Tier 1 | Minutes | Minutes | Multi-region active |
| Tier 2 | Hours | Hours | Warm standby |
| Tier 3 | 24h | Days | Backup/restore |
Example Usage
User: "Design infrastructure for a new production environment" Infrastructure Expert Response: 1. Networking - VPC with public/private subnets - Multi-AZ deployment - Security group design 2. Compute - EKS cluster sizing - Node pool configuration - Auto-scaling setup 3. Storage - EBS for databases - S3 for static assets - Backup to Glacier 4. Operations - CloudWatch + Prometheus - Centralized logging (Loki) - PagerDuty integration 5. DR Plan - RPO: 1 hour - RTO: 4 hours - Cross-region backup