Claude-skill-registry iac-terraform
Infrastructure as Code with Terraform and Terragrunt. Use for creating, validating, troubleshooting, and managing Terraform configurations, modules, and state. Covers Terraform workflows, best practices, module development, state management, Terragrunt patterns, and common issue resolution.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/iac-terraform" ~/.claude/skills/majiayu000-claude-skill-registry-iac-terraform && rm -rf "$T"
skills/data/iac-terraform/SKILL.mdInfrastructure as Code - Terraform & Terragrunt
Comprehensive guidance for infrastructure as code using Terraform and Terragrunt, from development through production deployment.
When to Use This Skill
Use this skill when:
- Writing or refactoring Terraform configurations
- Creating reusable Terraform modules
- Troubleshooting Terraform/Terragrunt errors
- Managing Terraform state
- Implementing IaC best practices
- Setting up Terragrunt project structure
- Reviewing infrastructure code
- Debugging plan/apply issues
Core Workflows
1. New Infrastructure Development
Workflow Decision Tree:
Is this reusable across environments/projects? ├─ Yes → Create a Terraform module │ └─ See "Creating Terraform Modules" below └─ No → Create environment-specific configuration └─ See "Environment Configuration" below
Creating Terraform Modules
When building reusable infrastructure:
- Scaffold new module with script:
python3 scripts/init_module.py my-module-name
This automatically creates:
- Standard module file structure
- Template files with proper formatting
- Examples directory
- README with documentation
-
Use module template structure:
- See
for complete structureassets/templates/MODULE_TEMPLATE.md - Required files:
,main.tf
,variables.tf
,outputs.tf
,versions.tfREADME.md - Recommended:
directory with working examplesexamples/
- See
-
Follow module best practices:
- Single responsibility - one module, one purpose
- Sensible defaults for optional variables
- Complete descriptions for all variables and outputs
- Input validation using
blocksvalidation - Mark sensitive values with
sensitive = true
-
Validate module:
python3 scripts/validate_module.py /path/to/module
This checks for:
- Required files present
- Variables have descriptions and types
- Outputs have descriptions
- README exists and is complete
- Naming conventions followed
- Sensitive values properly marked
- Test module:
cd examples/complete terraform init terraform plan
- Document module:
- Use terraform-docs to auto-generate:
terraform-docs markdown . > README.md - Include usage examples
- Document all inputs and outputs
- Use terraform-docs to auto-generate:
Key Module Patterns:
See
references/best_practices.md "Module Design" section for:
- Composability patterns
- Variable organization
- Output design
- Module versioning strategies
Environment Configuration
For environment-specific infrastructure:
- Structure by environment:
environments/ ├── dev/ ├── staging/ └── prod/
- Use consistent file organization:
environment/ ├── main.tf # Resource definitions ├── variables.tf # Variable declarations ├── terraform.tfvars # Default values (committed) ├── secrets.auto.tfvars # Sensitive values (.gitignore) ├── backend.tf # State configuration ├── outputs.tf # Output values └── versions.tf # Version constraints
- Reference modules:
module "vpc" { source = "git::https://github.com/company/terraform-modules.git//vpc?ref=v1.2.0" name = "${var.environment}-vpc" vpc_cidr = var.vpc_cidr environment = var.environment }
2. State Management & Inspection
When to inspect state:
- Before major changes
- Investigating drift
- Debugging resource issues
- Auditing infrastructure
Inspect state and check health:
python3 scripts/inspect_state.py /path/to/terraform/directory
Check for drift:
python3 scripts/inspect_state.py /path/to/terraform/directory --check-drift
The script provides:
- Resource count and types
- Backend configuration
- Provider versions
- Issues with resources (tainted, etc.)
- Drift detection (if requested)
Manual state operations:
# List all resources terraform state list # Show specific resource terraform state show aws_instance.web # Remove from state (doesn't destroy) terraform state rm aws_instance.web # Move/rename resource terraform state mv aws_instance.web aws_instance.web_server # Import existing resource terraform import aws_instance.web i-1234567890abcdef0
State best practices: See
references/best_practices.md "State Management" section for:
- Remote backend setup (S3 + DynamoDB)
- State file organization strategies
- Encryption and security
- Backup and recovery procedures
3. Standard Terraform Workflow
# 1. Initialize (first time or after module changes) terraform init # 2. Format code terraform fmt -recursive # 3. Validate syntax terraform validate # 4. Plan changes (always review!) terraform plan -out=tfplan # 5. Apply changes terraform apply tfplan # 6. Verify outputs terraform output
With Terragrunt:
# Run for single module terragrunt plan terragrunt apply # Run for all modules in directory tree terragrunt run-all plan terragrunt run-all apply
4. Troubleshooting Issues
When encountering errors:
-
Read the complete error message - Don't skip details
-
Check common issues: See
for:references/troubleshooting.md- State lock errors
- State drift/corruption
- Provider authentication failures
- Resource errors (already exists, dependency errors, timeouts)
- Module source issues
- Terragrunt-specific issues (dependency cycles, hooks)
- Performance problems
-
Enable debug logging if needed:
export TF_LOG=DEBUG export TF_LOG_PATH=terraform-debug.log terraform plan
- Isolate the problem:
# Test specific resource terraform plan -target=aws_instance.web terraform apply -target=aws_instance.web
- Common quick fixes:
State locked:
# Verify no one else running, then: terraform force-unlock <lock-id>
Provider cache issues:
rm -rf .terraform terraform init -upgrade
Module cache issues:
rm -rf .terraform/modules terraform init
5. Code Review & Quality
Before committing:
- Format code:
terraform fmt -recursive
- Validate syntax:
terraform validate
- Lint with tflint:
tflint --module
- Security scan with checkov:
checkov -d .
- Validate modules:
python3 scripts/validate_module.py modules/vpc
- Generate documentation:
terraform-docs markdown modules/vpc > modules/vpc/README.md
Review checklist:
- All variables have descriptions
- Sensitive values marked as sensitive
- Outputs have descriptions
- Resources follow naming conventions
- No hardcoded values (use variables)
- README is complete and current
- Examples directory exists and works
- Version constraints specified
- Security best practices followed
See
references/best_practices.md for comprehensive guidelines.
Terragrunt Patterns
Project Structure
terragrunt-project/ ├── terragrunt.hcl # Root config ├── account.hcl # Account-level vars ├── region.hcl # Region-level vars └── environments/ ├── dev/ │ ├── env.hcl # Environment vars │ └── us-east-1/ │ ├── vpc/ │ │ └── terragrunt.hcl │ └── eks/ │ └── terragrunt.hcl └── prod/ └── us-east-1/ ├── vpc/ └── eks/
Dependency Management
# In eks/terragrunt.hcl dependency "vpc" { config_path = "../vpc" # Mock outputs for plan/validate mock_outputs = { vpc_id = "vpc-mock" subnet_ids = ["subnet-mock"] } mock_outputs_allowed_terraform_commands = ["validate", "plan"] } inputs = { vpc_id = dependency.vpc.outputs.vpc_id subnet_ids = dependency.vpc.outputs.private_subnet_ids }
Common Patterns
See
assets/templates/MODULE_TEMPLATE.md for complete Terragrunt configuration templates including:
- Root terragrunt.hcl with provider generation
- Remote state configuration
- Module-level terragrunt.hcl patterns
- Dependency handling
Reference Documentation
references/best_practices.md
Comprehensive best practices covering:
- Project Structure - Recommended directory layouts
- State Management - Remote state, locking, organization
- Module Design - Single responsibility, composability, versioning
- Variable Management - Declarations, files hierarchy, secrets
- Resource Naming - Conventions and standards
- Security Practices - Least privilege, encryption, secret management
- Testing & Validation - Tools and approaches
- CI/CD Integration - Pipeline patterns
Read this when:
- Setting up new Terraform projects
- Establishing team standards
- Designing reusable modules
- Implementing security controls
- Setting up CI/CD pipelines
references/troubleshooting.md
Detailed troubleshooting guide for:
- State Issues - Lock errors, drift, corruption
- Provider Issues - Version conflicts, authentication
- Resource Errors - Already exists, dependencies, timeouts
- Module Issues - Source not found, version conflicts
- Terragrunt Specific - Dependency cycles, hooks
- Performance Issues - Slow plans, optimization strategies
Read this when:
- Encountering specific error messages
- Investigating unexpected behavior
- Debugging failed deployments
- Performance tuning
Each issue includes:
- Symptom description
- Common causes
- Step-by-step resolution
- Prevention strategies
references/cost_optimization.md
Cloud cost optimization strategies for Terraform-managed infrastructure:
- Right-Sizing Resources - Compute, database, and storage optimization
- Spot and Reserved Instances - Cost-effective instance strategies
- Storage Optimization - S3 lifecycle policies, EBS volume types
- Networking Costs - VPC endpoints, data transfer optimization
- Resource Lifecycle - Scheduled shutdown, cleanup automation
- Cost Tagging - Comprehensive tagging for cost allocation
- Monitoring and Alerts - Budget alerts, anomaly detection
- Multi-Cloud - Azure, GCP cost optimization patterns
Read this when:
- Planning infrastructure to minimize costs
- Conducting cost reviews or optimization initiatives
- Implementing auto-scaling and scheduling
- Setting up cost monitoring and alerts
- Designing cost-effective architectures
CI/CD Workflows
Ready-to-use CI/CD pipeline templates in
assets/workflows/:
github-actions-terraform.yml
Complete GitHub Actions workflow including:
- Terraform validation and formatting checks
- TFLint linting
- Checkov security scanning
- Terraform plan on PRs with comment posting
- Terraform apply on main branch with approval
- OIDC authentication support
github-actions-terragrunt.yml
Terragrunt-specific workflow featuring:
- Changed module detection
- Multi-module parallel planning
- Run-all commands
- Dependency-aware apply ordering
- Manual workflow dispatch with environment selection
gitlab-ci-terraform.yml
GitLab CI/CD pipeline with:
- Multi-stage pipeline (validate, lint, security, plan, apply)
- Artifact management
- Manual deployment gates
- Multi-environment configuration examples
Use these templates as starting points for your CI/CD pipelines. Customize based on your:
- Cloud provider and authentication method
- Repository structure
- Team approval workflows
- Environment promotion strategy
Scripts
init_module.py
Scaffolds a new Terraform module with proper structure and template files.
Usage:
# Create module in current directory python3 scripts/init_module.py my-vpc # Create in specific path python3 scripts/init_module.py my-vpc --path ./modules # Get JSON output python3 scripts/init_module.py my-vpc --json
Creates:
- Resource definitions with TODO placeholdersmain.tf
- Input variables with validation examplesvariables.tf
- Output values with descriptionsoutputs.tf
- Terraform and provider version constraintsversions.tf
- Module documentation templateREADME.md
- Complete usage exampleexamples/complete/
Use when:
- Starting a new Terraform module
- Ensuring consistent module structure across team
- Quickly bootstrapping module development
- Teaching module best practices
inspect_state.py
Comprehensive state inspection and health check.
Usage:
# Basic inspection python3 scripts/inspect_state.py /path/to/terraform # Include drift detection python3 scripts/inspect_state.py /path/to/terraform --check-drift
Provides:
- State health status
- Resource counts and types
- Provider versions
- Backend configuration
- Resource issues (tainted, etc.)
- Configuration drift detection (optional)
- Actionable recommendations
Use when:
- Before major infrastructure changes
- Investigating resource issues
- Auditing infrastructure state
- Detecting configuration drift
validate_module.py
Validates Terraform modules against best practices.
Usage:
python3 scripts/validate_module.py /path/to/module
Checks:
- Required files present (main.tf, variables.tf, outputs.tf)
- Variable descriptions and types
- Output descriptions
- Sensitive value handling
- README completeness
- Version constraints
- Example configurations
- Naming conventions
- Hard-coded values that should be variables
Returns:
- Issues (must fix)
- Warnings (should fix)
- Suggestions (consider)
Use when:
- Creating new modules
- Reviewing module code
- Before releasing module versions
- Establishing quality standards
Assets
templates/MODULE_TEMPLATE.md
Complete Terraform module template including:
- File-by-file structure and examples
- main.tf patterns
- variables.tf with validation
- outputs.tf best practices
- versions.tf constraints
- README.md template
- Example usage configurations
- Terragrunt configuration templates
Use this when:
- Creating new modules from scratch
- Standardizing module structure
- Onboarding team members
- Establishing module conventions
Quick Reference
Essential Commands
# Initialize terraform init terraform init -upgrade # Update providers # Validate terraform validate terraform fmt -recursive # Plan terraform plan terraform plan -out=tfplan # Apply terraform apply terraform apply tfplan terraform apply -auto-approve # CI/CD only # State terraform state list terraform state show <resource> terraform state rm <resource> terraform state mv <old> <new> # Import terraform import <resource_address> <resource_id> # Destroy terraform destroy terraform destroy -target=<resource> # Outputs terraform output terraform output <output_name>
Terragrunt Commands
# Single module terragrunt init terragrunt plan terragrunt apply # All modules terragrunt run-all plan terragrunt run-all apply terragrunt run-all destroy # With specific modules terragrunt run-all apply --terragrunt-include-dir vpc --terragrunt-include-dir eks
Best Practices Summary
Always:
- Use remote state with locking
- Plan before apply (review changes)
- Pin Terraform and provider versions
- Use modules for reusable components
- Mark sensitive values as sensitive
- Document everything
- Test in non-production first
Never:
- Commit secrets or credentials
- Manually edit state files
- Use root AWS credentials
- Skip code review for production changes
- Deploy without testing
- Ignore security scan findings
Key Principles:
- Infrastructure as code (everything in version control)
- DRY (Don't Repeat Yourself) - use modules
- Immutable infrastructure
- Environment parity (dev/staging/prod similar)
- Security by default
- Document for future you