Ai-design-components writing-infrastructure-code
Managing cloud infrastructure using declarative and imperative IaC tools. Use when provisioning cloud resources (Terraform/OpenTofu for multi-cloud, Pulumi for developer-centric workflows, AWS CDK for AWS-native infrastructure), designing reusable modules, implementing state management patterns, or establishing infrastructure deployment workflows.
git clone https://github.com/ancoleman/ai-design-components
T=$(mktemp -d) && git clone --depth=1 https://github.com/ancoleman/ai-design-components "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/writing-infrastructure-code" ~/.claude/skills/ancoleman-ai-design-components-writing-infrastructure-code && rm -rf "$T"
skills/writing-infrastructure-code/SKILL.mdInfrastructure as Code
Provision and manage cloud infrastructure using code-based automation tools. This skill covers tool selection, state management, module design, and operational patterns across Terraform/OpenTofu, Pulumi, and AWS CDK.
When to Use
Use this skill when:
- Provisioning cloud infrastructure (compute, networking, databases, storage)
- Migrating from manual infrastructure to code-based workflows
- Designing reusable infrastructure modules
- Implementing multi-cloud or hybrid-cloud deployments
- Establishing state management and drift detection patterns
- Integrating infrastructure provisioning into CI/CD pipelines
- Evaluating IaC tools (Terraform vs Pulumi vs CDK)
Common requests:
- "Create a Terraform module for VPC provisioning"
- "Set up remote state with locking for team collaboration"
- "Compare Pulumi vs Terraform for our use case"
- "Design composable infrastructure modules"
- "Implement drift detection for existing infrastructure"
Core Concepts
Infrastructure as Code Fundamentals
Key Principles:
- Declarative vs Imperative - Describe desired state (Terraform) or program infrastructure (Pulumi)
- Idempotency - Same input produces same output, safe to re-run
- Version Control - Infrastructure changes tracked in Git
- State Management - Track actual infrastructure state
- Module Composition - Reusable, versioned infrastructure components
Benefits:
- Reproducibility (same code = same infrastructure)
- Auditability (Git history shows all changes)
- Collaboration (code reviews for infrastructure changes)
- Automation (CI/CD deploys infrastructure)
- Disaster recovery (rebuild from code)
Tool Selection Framework
Choose IaC tools based on team composition and cloud strategy:
Terraform/OpenTofu - Declarative, HCL-based
- Multi-cloud and hybrid-cloud deployments
- Operations/SRE teams prefer declarative approach
- Largest provider ecosystem (AWS, GCP, Azure, 3000+ providers)
- Mature module registry and community
Pulumi - Imperative, programming language-based
- Developer-centric teams familiar with TypeScript/Python/Go
- Complex logic requires programming constructs (loops, conditionals, functions)
- Native unit testing using familiar test frameworks
- Strong typing and IDE support
AWS CDK - AWS-native, programming language-based
- AWS-only infrastructure
- Tight integration with AWS services
- L1/L2/L3 construct abstractions
- CloudFormation under the hood
Decision Tree:
Multi-cloud required? ├─ YES → Team composition? │ ├─ Ops/SRE focused → Terraform/OpenTofu │ └─ Developer focused → Pulumi └─ NO → AWS only? ├─ YES → Language preference? │ ├─ HCL/declarative → Terraform │ ├─ TypeScript/Python → AWS CDK │ └─ YAML/simple → CloudFormation └─ NO → GCP/Azure only? └─ Terraform or Pulumi
State Management Architecture
Remote state with locking enables team collaboration:
Backend Selection:
| Cloud Provider | Recommended Backend | Locking Mechanism |
|---|---|---|
| AWS | S3 + DynamoDB | DynamoDB table |
| GCP | Google Cloud Storage | Native |
| Azure | Azure Blob Storage | Lease-based |
| Multi-cloud | Terraform Cloud/Enterprise | Built-in |
| Pulumi | Pulumi Service | Built-in |
State Isolation Strategies:
-
Directory Separation (recommended for most teams)
- Separate directories per environment (
,prod/
,staging/
)dev/ - Complete state file isolation
- No risk of cross-environment contamination
- Separate directories per environment (
-
Workspaces
- Single codebase, multiple environments
- Shared state backend, environment namespacing
- Risk: accidental cross-environment operations
-
Layered Architecture
- Separate state files for networking, compute, data layers
- Blast radius reduction
- Cross-layer references via remote state data sources
Critical State Management Rules:
- Always use remote state for team environments
- Enable state file encryption at rest
- Enable versioning on state storage
- Use state locking to prevent concurrent modifications
- Never commit state files to Git
- Mark sensitive outputs as
sensitive = true
Module Design Patterns
Composable Module Structure:
modules/ ├── vpc/ # Network foundation ├── security-group/ # Reusable security group patterns ├── rds/ # Database with backups, encryption ├── ecs-cluster/ # Container orchestration base ├── ecs-service/ # Individual microservice └── alb/ # Application load balancer
Module Versioning:
- Pin module versions in production (
)version = "5.1.0" - Use semantic versioning for internal modules
- Test module updates in non-prod first
- Maintain CHANGELOG for module releases
Module Design Principles:
- Clear input contract (required vs optional variables)
- Documented outputs (what consumers can reference)
- Sane defaults where possible
- Validation rules for inputs
- Examples directory showing usage
When to Create a Module:
- Resource group is reused 3+ times
- Clear boundaries and responsibilities
- Stable interface contract
- Team has module maintenance capacity
When to Keep Monolithic:
- One-off infrastructure
- Rapid prototyping phase
- High coupling between resources
- Small team, simple infrastructure
Quick Reference
Terraform/OpenTofu Commands
# Initialize providers and backend terraform init # Plan changes (preview) terraform plan # Apply changes terraform apply # Destroy infrastructure terraform destroy # Format HCL files terraform fmt # Validate syntax terraform validate # Show state terraform state list terraform state show <resource> # Import existing resources terraform import <resource.name> <id> # Workspace management terraform workspace list terraform workspace new staging terraform workspace select prod
Pulumi Commands
# Initialize new project pulumi new aws-typescript # Preview changes pulumi preview # Apply changes pulumi up # Destroy infrastructure pulumi destroy # Show stack outputs pulumi stack output # Manage stacks pulumi stack ls pulumi stack select prod # Import existing resources pulumi import <type> <name> <id> # Export/import state pulumi stack export > state.json pulumi stack import < state.json
AWS CDK Commands
# Initialize new app cdk init app --language typescript # Synthesize CloudFormation cdk synth # Preview changes cdk diff # Deploy stack cdk deploy # Destroy stack cdk destroy # Bootstrap account/region cdk bootstrap # List stacks cdk list
Common Patterns Checklist
Infrastructure Provisioning:
- Remote state configured with locking
- State file encryption enabled
- Provider versions pinned
- Module versions pinned (production)
- Variables have descriptions and types
- Sensitive outputs marked as sensitive
- Tagging strategy implemented
- Cost allocation tags applied
Module Development:
- Clear README with usage examples
- Required vs optional variables documented
- Outputs documented with descriptions
- Validation rules for critical inputs
- Examples directory with working code
- Tests for module behavior (Terratest/CDK assertions)
- CHANGELOG for version tracking
- Semantic versioning followed
Operational Readiness:
- Drift detection scheduled
- CI/CD pipeline for plan/apply
- State backup strategy
- Disaster recovery documented
- Team access controls configured (IAM/RBAC)
- Cost estimation integrated (Infracost)
- Security scanning integrated (Checkov/tfsec)
- Documentation kept current
Detailed Documentation
For comprehensive patterns and implementation details:
Tool-Specific Patterns:
- Terraform/OpenTofu best practices, HCL patternsreferences/terraform-patterns.md
- Pulumi across TypeScript/Python/Goreferences/pulumi-patterns.md
Architecture and Design:
- Remote state, locking, isolation strategiesreferences/state-management.md
- Composable modules, versioning, registriesreferences/module-design.md
Operations:
- Detecting and remediating infrastructure driftreferences/drift-detection.md
Working Examples
Practical implementations demonstrating IaC patterns:
Terraform Examples:
- Multi-AZ VPC with public/private subnetsexamples/terraform/vpc-module/
- ECS service with ALB, autoscalingexamples/terraform/ecs-service/
- Aurora cluster with backups, encryptionexamples/terraform/rds-cluster/
- S3 + DynamoDB backend setupexamples/terraform/state-backend/
Pulumi Examples:
- TypeScript VPC componentexamples/pulumi/typescript/vpc/
- Python ECS serviceexamples/pulumi/python/ecs-service/
- Go RDS clusterexamples/pulumi/go/rds-cluster/
- Unit tests for Pulumi programsexamples/pulumi/testing/
AWS CDK Examples:
- VPC using L2 constructsexamples/cdk/typescript/vpc-stack/
- Fargate service with ALBexamples/cdk/typescript/ecs-fargate/
- Self-mutating CDK pipelineexamples/cdk/typescript/pipeline-stack/
- CDK assertions and snapshot testsexamples/cdk/testing/
Utility Scripts
Automated validation and operational tools:
- Terraform fmt, validate, tflintscripts/validate-terraform.sh
- Infracost wrapper for cost analysisscripts/cost-estimate.sh
- Scheduled drift detectionscripts/drift-check.sh
- Checkov/tfsec security scanningscripts/security-scan.sh
- State file backup automationscripts/state-backup.sh
- Module versioning and publishingscripts/module-release.sh
Integration with Other Skills
Deployment Pipeline:
- Automate terraform plan/apply in CI/CDbuilding-ci-pipelines
- GitOps-based infrastructure deploymentgitops-workflows
Platform Engineering:
- Provision EKS, GKE, AKS clusterskubernetes-operations
- Internal developer platform infrastructureplatform-engineering
Security:
- Provision Vault, External Secrets Operatorsecret-management
- Implement infrastructure security controlssecurity-hardening
- Policy-as-code for compliancecompliance-frameworks
Operations:
- Provision monitoring infrastructure (Prometheus, Grafana)observability
- Infrastructure rebuild proceduresdisaster-recovery
- Implement cost controls via IaCcost-optimization
Data Platform:
- Provision data lakes, warehousesdata-architecture
- Provision Kafka, Kinesis infrastructurestreaming-data
Best Practices
Development Workflow:
- Write infrastructure code in feature branches
- Run
/terraform plan
locallypulumi preview - Submit pull request with plan output
- Code review focuses on security, cost, blast radius
- CI runs automated tests and security scans
- Apply only after approval and CI passes
- Monitor for drift post-deployment
State Management:
- Use remote state from day one (never local state for teams)
- Separate state files per environment
- Enable state locking to prevent concurrent modifications
- Version state storage for rollback capability
- Encrypt state at rest (contains sensitive data)
- Regular state backups to separate location
Module Development:
- Start with monolithic code, extract modules when patterns emerge
- Design for reusability but avoid premature abstraction
- Document all inputs and outputs
- Provide working examples in
directoryexamples/ - Pin provider versions in modules
- Test modules before publishing
- Use semantic versioning for releases
Security:
- Scan IaC for security issues before apply (Checkov, tfsec)
- Never commit secrets to code (use secret references)
- Mark sensitive outputs as
sensitive = true - Implement least-privilege IAM policies
- Enable resource encryption by default
- Use private module registries for internal modules
Cost Management:
- Estimate costs before applying changes (Infracost)
- Tag all resources for cost allocation
- Review cost impact in pull requests
- Set up cost alerts for drift
- Rightsize resources based on usage
Operational Excellence:
- Schedule regular drift detection
- Document disaster recovery procedures
- Maintain runbooks for common operations
- Monitor state file access logs
- Practice infrastructure rebuilds periodically
- Keep provider versions current with testing
Common Pitfalls
State File Issues:
- Manual state editing - Use terraform state commands, not direct edits
- No state locking - Race conditions corrupt state
- Local state for teams - State divergence across team members
- Large state files - Break into multiple state files by layer
Module Design:
- Over-abstraction - Too generic, hard to understand
- Under-abstraction - Copy-paste code everywhere
- No version pinning - Unexpected breaking changes
- No examples - Users don't know how to consume module
Operations:
- No drift detection - Manual changes go unnoticed
- Direct resource modification - Bypassing IaC creates drift
- No rollback plan - Can't recover from failed apply
- Ignoring plan output - Surprises during apply
Security:
- Secrets in code - Hard-coded credentials
- No security scanning - Vulnerabilities in production
- Overly permissive IAM - Excessive privileges
- No state encryption - Sensitive data exposed
Troubleshooting Guide
State Lock Issues:
terraform force-unlock <lock-id> # Use only if certain no other process running
Import Existing Resources:
terraform import aws_vpc.main vpc-12345678 pulumi import aws:ec2/vpc:Vpc main vpc-12345678
Drift Detection:
terraform plan -detailed-exitcode # Exit 2 = drift detected pulumi preview --diff
For detailed drift remediation, see
references/drift-detection.md.
State Recovery:
# Terraform: Restore from S3 versioning aws s3 cp s3://bucket/backup/terraform.tfstate terraform.tfstate # Pulumi: Restore from checkpoint pulumi stack export --version <timestamp> | pulumi stack import
Related Skills
For cloud-specific implementations:
- AWS-specific resource patternsaws-patterns
- GCP-specific resource patternsgcp-patterns
- Azure-specific resource patternsazure-patterns
For infrastructure operations:
- Manage Kubernetes clusters provisioned via IaCkubernetes-operations
- GitOps-based infrastructure deploymentgitops-workflows
- Internal developer platformsplatform-engineering
For security and compliance:
- Infrastructure security controlssecurity-hardening
- Secret injection and rotationsecret-management
- Policy-as-code for compliancecompliance-frameworks
For deployment automation:
- CI/CD for infrastructure codebuilding-ci-pipelines
- Application deployment to provisioned infrastructuredeploying-applications
For cost and observability:
- FinOps practices for infrastructurecost-optimization
- Monitoring infrastructure healthobservability