Learn-skills.dev infrastructure-as-code
Use when writing Terraform for OCI, troubleshooting provider errors, managing state files, or implementing Resource Manager stacks. Covers terraform-provider-oci gotchas, resource lifecycle anti-patterns, state drift/corruption recovery, authentication precedence, and OCI Landing Zones. KEYWORDS: Terraform, OCI provider, state, Resource Manager, lifecycle, drift, import, 409, availability domain, boot volume, flex shape.
git clone https://github.com/NeverSight/learn-skills.dev
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/acedergren/agentic-tools/infrastructure-as-code" ~/.claude/skills/neversight-learn-skills-dev-infrastructure-as-code && rm -rf "$T"
data/skills-md/acedergren/agentic-tools/infrastructure-as-code/SKILL.mdOCI Infrastructure as Code - Expert Knowledge
NEVER Do This
NEVER hardcode OCIDs in Terraform (breaks portability)
# WRONG - breaks when moving between regions/tenancies resource "oci_core_instance" "web" { compartment_id = "ocid1.compartment.oc1..aaaaaa..." # Hardcoded! subnet_id = "ocid1.subnet.oc1.phx.bbbbbb..." # Hardcoded! } # RIGHT - variables or data sources resource "oci_core_instance" "web" { compartment_id = var.compartment_ocid subnet_id = data.oci_core_subnet.existing.id }
NEVER hardcode availability domain names
# WRONG - AD names are tenant-specific (fMgC: prefix differs per tenancy) availability_domain = "fMgC:US-ASHBURN-AD-1" # RIGHT - query dynamically data "oci_identity_availability_domains" "ads" { compartment_id = var.tenancy_ocid } resource "oci_core_instance" "web" { availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name }
NEVER use
in dev/test (default behavior is true!)preserve_boot_volume = true
# WRONG - default! Orphans boot volumes when instance is destroyed resource "oci_core_instance" "dev" { # preserve_boot_volume not set = defaults to true } # RIGHT - explicit cleanup in dev/test resource "oci_core_instance" "dev" { preserve_boot_volume = false }
Cost impact: dev team with 10 instances cycling through testing = $50–500/month in silent orphaned volumes.
NEVER skip
blocks on production databaseslifecycle
# RIGHT - protect production resources from accidental destroy resource "oci_database_autonomous_database" "prod" { lifecycle { prevent_destroy = true ignore_changes = [defined_tags] # Ignore tag edits made via console } }
Without this: a mistyped
terraform destroy -target deletes a production database permanently.
NEVER use
for resources that shouldn't be replaced on list reordercount
# WRONG - reordering instance_names from ["web1","web2"] to ["web0","web1","web2"] # causes Terraform to RECREATE all instances resource "oci_core_instance" "web" { count = length(var.instance_names) display_name = var.instance_names[count.index] } # RIGHT - for_each with stable keys resource "oci_core_instance" "web" { for_each = toset(var.instance_names) display_name = each.value }
NEVER store Terraform state locally for team use
# WRONG - no locking, no collaboration terraform { backend "local" {} } # RIGHT - OCI Object Storage with S3-compatible backend terraform { backend "s3" { bucket = "terraform-state" key = "prod/terraform.tfstate" region = "us-phoenix-1" endpoint = "https://<namespace>.compat.objectstorage.us-phoenix-1.oraclecloud.com" skip_region_validation = true skip_credentials_validation = true skip_metadata_api_check = true use_path_style = true } }
OCI Provider Authentication Gotchas
Authentication precedence (silent override is a common footgun):
- Explicit provider block credentials
environment variablesTF_VAR_*
file (DEFAULT profile)~/.oci/config- Instance Principal (
)auth = "InstancePrincipal"
Common mistake: setting env vars but an explicit provider block overrides them without any warning.
Instance Principal for Terraform running on OCI compute:
provider "oci" { auth = "InstancePrincipal" region = var.region }
Critical: the instance must be in the dynamic group BEFORE Terraform runs. If added after, auth fails with the cryptic error:
"authorization failed or requested resource not found".
State Management
Fixing State Drift
State drift happens when resources are modified outside Terraform (console, CLI, API).
terraform plan # Shows unexpected changes — identifies drift terraform refresh # Updates state to match actual OCI reality (safe read-only op) # For new resources created outside Terraform: terraform import oci_core_vcn.main ocid1.vcn.oc1.phx.xxxxx
Suppress drift from console tag edits (common source of noise):
lifecycle { ignore_changes = [defined_tags, freeform_tags] }
Fixing "409 Conflict — Resource Already Exists"
Cause: resource exists in OCI but not in state file (e.g., created manually or previous import failure).
terraform import oci_core_vcn.main ocid1.vcn.oc1.phx.xxxxx terraform plan # Should now show no changes for that resource
State File Corruption Recovery
# 1. Backup first cp terraform.tfstate terraform.tfstate.backup # 2. Try state pull repair terraform state pull > recovered.tfstate mv recovered.tfstate terraform.tfstate # 3. If that fails, restore from Object Storage versioning # 4. Last resort: reconstruct with terraform import for each resource
Prevention: enable Object Storage bucket versioning on the state backend.
Destroy Failures (Dependency Order)
Error: Resource still in use
OCI enforces strict dependency order on destroy: instances must be terminated before subnets, subnets before VCN, etc.
# Visualize the dependency graph terraform graph | dot -Tpng > graph.png # Destroy in reverse dependency order terraform destroy -target=oci_core_instance.web terraform destroy -target=oci_core_subnet.private terraform destroy -target=oci_core_vcn.main
Timeouts for Long-Running OCI Resources
OCI resource provisioning times vary significantly. Default Terraform timeouts often cause false failures:
# Autonomous Database: 15-30 min to provision (default 20m is borderline) resource "oci_database_autonomous_database" "prod" { timeouts { create = "60m" update = "60m" delete = "30m" } } # Compute: usually fast, but capacity issues can cause retries resource "oci_core_instance" "web" { timeouts { create = "30m" } }
OCI Landing Zones
Use oracle-terraform-modules/terraform-oci-landing-zones for:
- Greenfield tenancy setup requiring CIS OCI Foundations Benchmark compliance
- Multi-environment (dev/test/prod) with hub-and-spoke networking
- Centralized logging, Cloud Guard, and Security Zones
Do NOT use Landing Zone for:
- Brownfield (existing infrastructure) — too opinionated, causes state conflicts
- Simple single-app deployments — the module overhead exceeds the value
Progressive Loading Reference
Load
when:references/oci-terraform-patterns.md
- Setting up provider configuration (multi-region, auth methods)
- Resource Manager stack operations via CLI
- Common resource patterns with full HCL examples (VCN, compute, ADB)
- Landing Zone module usage examples
Do NOT load for NEVER-list gotchas, lifecycle management, or state troubleshooting — this file covers those.