Learn-skills.dev terraform-engineer
Infrastructure as Code (IaC) expert using Terraform/OpenTofu, HCL, and modern state management.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/404kidwiz/claude-supercode-skills/terraform-engineer" ~/.claude/skills/neversight-learn-skills-dev-terraform-engineer && rm -rf "$T"
data/skills-md/404kidwiz/claude-supercode-skills/terraform-engineer/SKILL.mdTerraform Engineer
Purpose
Provides Infrastructure as Code expertise specializing in Terraform and OpenTofu for cloud provisioning. Designs modular, scalable infrastructure with proper state management, remote backends, and GitOps-driven automation pipelines.
When to Use
- Provisioning new cloud infrastructure (VPCs, EKS, RDS)
- Refactoring monolithic Terraform code into reusable modules
- Implementing "GitOps" for infrastructure (Atlantis/TFC)
- Managing remote state, locking, and backend configuration
- Writing custom providers or complex HCL logic (loops, conditionals)
- Migrating/importing existing manual infrastructure into Terraform
Examples
Example 1: Multi-Cloud Landing Zone
Scenario: Building a secure, compliant multi-cloud landing zone.
Implementation:
- Created reusable modules for VPC, IAM, security groups
- Implemented remote state with S3 backend and DynamoDB locking
- Added variable validation and preconditions
- Implemented cost estimation and budget alerts
- Set up Terraform Cloud for state management
Results:
- Infrastructure provisioning reduced from weeks to hours
- 100% consistency across environments
- Security compliance automated
- 40% reduction in cloud costs through optimization
Example 2: Kubernetes Platform with EKS
Scenario: Building a production-ready Kubernetes platform.
Implementation:
- Created EKS module with managed node groups
- Implemented RBAC and service accounts
- Added network policies and security groups
- Configured secrets management with Vault integration
- Set up monitoring and observability
Results:
- Platform deployment in under 30 minutes
- Zero configuration drift
- Built-in security controls
- Clear upgrade path for K8s versions
Example 3: Legacy Infrastructure Migration
Scenario: Importing manually provisioned infrastructure into Terraform.
Implementation:
- Used terraform import for existing resources
- Created corresponding Terraform configurations
- Implemented state mv for resource reorganization
- Verified no changes during import
- Established Terraform as source of truth
Results:
- 200+ resources migrated to Terraform
- Infrastructure now version controlled
- Enables infrastructure as code workflows
- Improved audit and compliance
Best Practices
State Management
- Remote Backend: Always use remote state (S3, GCS, Terraform Cloud)
- State Locking: Prevent concurrent modifications
- State Isolation: Separate state for environments
- Backup: Enable state versioning
Module Development
- Single Responsibility: Each module does one thing well
- Version Pinning: Lock module versions
- Documentation: Document inputs, outputs, behavior
- Testing: Test modules before publishing
Code Quality
- Formatting: Use terraform fmt consistently
- Validation: Run terraform validate
- Linting: Use tflint for provider-specific issues
- Security Scanning: Use tfsec/checkov
Collaboration
- Code Review: All changes reviewed before merge
- Workspace Strategy: Use workspaces for environment isolation
- Variable Management: Use variable files, not hardcoding
- Output Documentation: Document important outputs
2. Decision Framework
State Management Strategy
| Scale | Strategy | Backend |
|---|---|---|
| Individual | Local State | (Not recommended for prod) |
| Small Team | Remote State + Locking | + DynamoDB (AWS) / (Azure) |
| Enterprise | Managed State + Runs | Terraform Cloud / spacelift / env0 |
| GitOps | PR-driven Runs | Atlantis (Self-hosted) |
Module Architecture
What are you building? │ ├─ **Root Module** (The "Glue") │ ├─ `main.tf`: Instantiates child modules │ ├─ `providers.tf`: Provider config │ └─ `backend.tf`: State config │ ├─ **Child Modules** (Reusable) │ ├─ **Resource Modules**: Wraps single resource (e.g., `s3-secure-bucket`) │ │ └─ Enforces tagging, encryption, logging defaults. │ │ │ └─ **Infrastructure Modules**: Logical group (e.g., `vpc-with-peering`) │ └─ Combines VPC, Subnets, Route Tables, NAT Gateways. │ └─ **Composition** (Terragrunt/Workspaces) ├─ `prod/` ├─ `stage/` └─ `dev/`
Terraform vs. The World
| Tool | Approach | Best For |
|---|---|---|
| Terraform | HCL (Declarative) | Industry standard, massive ecosystem. |
| Pulumi | General Purpose Lang (TS/Py) | Devs who hate HCL, dynamic logic. |
| Crossplane | K8s Custom Resources | Control planes, self-service platforms. |
| CloudFormation | YAML/JSON | AWS purists (drift detection is native). |
Red Flags → Escalate to
:security-engineer
- Hardcoded AWS keys in
blockprovider - State files stored in git (
)terraform.tfstate - Security Groups allowing
on SSH/RDP0.0.0.0/0 - S3 buckets public by default
3. Core Workflows
Workflow 1: Production AWS VPC (Modular)
Goal: Create a 3-tier VPC network using the community module.
Steps:
-
Dependency Definition (
)versions.tfterraform { required_version = ">= 1.5.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } -
Implementation (
)main.tfmodule "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.5.1" name = "prod-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = false # High Availability enable_vpn_gateway = false tags = { Environment = "Production" Terraform = "true" } } -
Outputs (
)outputs.tfoutput "vpc_id" { description = "The ID of the VPC" value = module.vpc.vpc_id }
Workflow 3: Importing Existing Infrastructure
Goal: Bring a manually created EC2 instance under Terraform control.
Steps:
-
Identify Resource ID
- AWS Console → EC2 → Instance ID:
i-0123456789abcdef0
- AWS Console → EC2 → Instance ID:
-
Write Terraform Code
resource "aws_instance" "legacy_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t2.micro" # Fill in other known details... } -
Run Import
terraform import aws_instance.legacy_server i-0123456789abcdef0(Or use
block in TF 1.5+)importimport { to = aws_instance.legacy_server id = "i-0123456789abcdef0" } -
Reconcile
- Run
.terraform plan - Update code to match the state until "No changes" is reported.
- Run
5. Anti-Patterns & Gotchas
❌ Anti-Pattern 1: Monolithic State File
What it looks like:
- One
controlling VPC, Database, EKS, and 50 Microservices.main.tf
takes 10 minutes.terraform plan
Why it fails:
- Blast Radius: One error breaks everything.
- Performance: API rate limits (AWS Throttling).
- Locking: Dev A blocks Dev B.
Correct approach:
- Split State: Separate
,network
,data
.app-cluster - Use
data source to read outputs from other layers.terraform_remote_state
❌ Anti-Pattern 2: Hardcoding Environments
What it looks like:
,vpc-prod.tf
files with duplicated code.vpc-dev.tf
Why it fails:
- Drift between environments.
- Double maintenance.
Correct approach:
- Workspaces: Use
withterraform workspace
.var.environment - Tfvars:
vsprod.tfvars
.dev.tfvars - Modules: Reuse the same logic, pass different variables.
❌ Anti-Pattern 3: Ignoring .gitignore
.gitignoreWhat it looks like:
- Committing
directory (plugins)..terraform/ - Committing
(secrets).terraform.tfvars
Why it fails:
- Repo bloat.
- Security leak.
Correct approach:
- Standard
for Terraform:.gitignore.terraform/ *.tfstate *.tfstate.backup *.tfvars .terraform.lock.hcl (Commit this one!)
7. Quality Checklist
Code Quality:
- Formatting: Run
.terraform fmt -recursive - Validation: Run
.terraform validate - Linting: Run
for provider-specific issues.tflint - Docs: Generate README using
.terraform-docs
Security:
- Secrets: No plain text secrets (Use KMS/Vault/Secrets Manager).
- Encryption:
on all storage (EBS, S3, RDS).encrypted = true - Public Access: Locked down (S3 Block Public Access).
Reliability:
- State: Remote backend configured with locking.
- Versions: Provider and Terraform versions pinned (e.g.,
).~> 5.0 - Cleanup:
provisioners tested (or protection enabled for DBs).destroy
Anti-Patterns
State Management Anti-Patterns
- Local State: Using local state files - always use remote backends
- State Drift: Manual changes outside Terraform - use only Terraform for changes
- State Lock Contention: No state locking - implement proper locking
- State Corruption: Editing state files manually - never manually edit state
Module Anti-Patterns
- Monolithic Modules: Large, unwieldy modules - split into focused modules
- Hardcoded Values: Using values instead of variables - parameterize everything
- Module Version Chaos: No version pinning - pin module versions
- Deep Module Nesting: Over-nested module structures - keep module hierarchy flat
Resource Anti-Patterns
- Resource Spam: Many small resources instead of patterns - use resource grouping
- Lifecycle Lock: Resources that can't update - avoid create_before_destroy conflicts
- Ignored Changes: Overusing ignore_changes - understand and manage changes
- Sensitive Data Exposure: Plain text secrets in state - use sensitive flag
Code Organization Anti-Patterns
- Flat Structure: No directory organization - use modular structure
- Duplication: Repeated code blocks - use modules and for_each
- No Formatting: Unformatted HCL code - use terraform fmt
- Missing Documentation: undocumented modules - document all inputs/outputs