Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform loses track of instances that reaches timeout for wait_for_network in incus_instance #174

Open
vegardx opened this issue Nov 29, 2024 · 4 comments
Labels
Bug Confirmed to be a bug

Comments

@vegardx
Copy link

vegardx commented Nov 29, 2024

When you the argument wait_for_network true in incus_network and it for some reason fails to obtain an IP-address before the timeout then terraform loses track of the instance. All subsequent runs will return an error that the instance already exists, but since terraform doesn't have it in the state file, it doesn't know about it.

Example code:

resource "incus_instance" "this" {
  count = 10

  type                     = "container"
  name                     = "instance-${count.index}"
  image                    = "images:debian/12/cloud"
  wait_for_network         = true

  config = {
    "boot.autostart" = true
  }
}

output "instance_ips" {
  value = { for idx, instance in incus_instance.this : instance.name => instance.ipv4_address }
}

Expected results:

instance_ip = {
  "instance-0" = "..."
...
}

Actual results:

incus_instance.this[8]: Creating...
incus_instance.this[4]: Creating...
incus_instance.this[5]: Creating...
incus_instance.this[6]: Creating...
incus_instance.this[2]: Creating...
incus_instance.this[7]: Creating...
incus_instance.this[9]: Creating...
incus_instance.this[3]: Creating...
incus_instance.this[1]: Creating...
incus_instance.this[4]: Still creating... [10s elapsed]
...
incus_instance.this[6]: Still creating... [3m10s elapsed]
╷
│ Error: Failed to wait for instance "instance-4" to get an IP address
│
│   with incus_instance.this[4],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-9" to get an IP address
│
│   with incus_instance.this[9],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-1" to get an IP address
│
│   with incus_instance.this[1],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-6" to get an IP address
│
│   with incus_instance.this[6],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-2" to get an IP address
│
│   with incus_instance.this[2],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-3" to get an IP address
│
│   with incus_instance.this[3],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-5" to get an IP address
│
│   with incus_instance.this[5],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-8" to get an IP address
│
│   with incus_instance.this[8],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-7" to get an IP address
│
│   with incus_instance.this[7],
│   on main.tf line 74, in resource "incus_instance" "this":
│   74: resource "incus_instance" "this" {
│
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network',
│ timeout: 3m0s)
@stgraber stgraber added the Bug Confirmed to be a bug label Nov 29, 2024
@maveonair
Copy link
Member

So, what’s the expected behavior here? We can keep writing the state even if we haven’t gotten the IP address from the newly created instance. Or, we can delete the instance that was created because of this, but I’m not a big fan of that because I haven’t seen this behavior in any other Terraform provider so far, and I think our users might be confused about it.

Timing out after 3 minutes to get an IP address seems like an Incus misconfiguration issue, not the usual case, right?

@stgraber
Copy link
Member

It can be a user error though, like not attaching a NIC or passing bad network-data.

If the user asks us to wait for the IP, then we probably shouldn't make it succeed on timeout, but we still need enough state in the statefile that it's possible to then destroy the environment.

If we can't fail AND have the statefile in a state that allows destruction, then we probably should delete the instance.

@maveonair
Copy link
Member

maveonair commented Mar 7, 2025

It can be a user error though, like not attaching a NIC or passing bad network-data.

If the user asks us to wait for the IP, then we probably shouldn't make it succeed on timeout, but we still need enough state in the statefile that it's possible to then destroy the environment.

If we can't fail AND have the statefile in a state that allows destruction, then we probably should delete the instance.

I was curious and I tried to reproduce this issue with the new wait_for syntax:

resource "incus_instance" "this" {
  count = 10

  type  = "container"
  name  = "instance-${count.index}"
  image = "images:debian/12/cloud"

  wait_for {
    type = "ipv4"
    nic  = "eth1"
  }

  config = {
    "boot.autostart" = true
  }
}

output "instance_ips" {
  value = { for idx, instance in incus_instance.this : instance.name => instance.ipv4_address }
}
$ tofu apply

...

incus_instance.this[1]: Creating...
incus_instance.this[0]: Creating...
incus_instance.this[6]: Creating...
incus_instance.this[7]: Creating...
incus_instance.this[8]: Creating...
incus_instance.this[5]: Creating...
incus_instance.this[3]: Creating...
incus_instance.this[4]: Creating...
incus_instance.this[2]: Creating...
incus_instance.this[9]: Creating...
incus_instance.this[8]: Still creating... [3m0s elapsed]
incus_instance.this[3]: Still creating... [3m0s elapsed]
incus_instance.this[7]: Still creating... [3m0s elapsed]
incus_instance.this[1]: Still creating... [3m0s elapsed]
incus_instance.this[6]: Still creating... [3m0s elapsed]
incus_instance.this[2]: Still creating... [3m0s elapsed]
incus_instance.this[9]: Still creating... [3m0s elapsed]
incus_instance.this[0]: Still creating... [3m0s elapsed]
incus_instance.this[5]: Still creating... [3m0s elapsed]
incus_instance.this[4]: Still creating... [3m0s elapsed]
incus_instance.this[5]: Still creating... [3m10s elapsed]
incus_instance.this[9]: Still creating... [3m10s elapsed]
incus_instance.this[8]: Still creating... [3m10s elapsed]
incus_instance.this[1]: Still creating... [3m10s elapsed]
╷
│ Error: Failed to wait for instance "instance-4" to get an IP address
│ 
│   with incus_instance.this[4],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-0" to get an IP address
│ 
│   with incus_instance.this[0],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-7" to get an IP address
│ 
│   with incus_instance.this[7],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-6" to get an IP address
│ 
│   with incus_instance.this[6],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-2" to get an IP address
│ 
│   with incus_instance.this[2],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-8" to get an IP address
│ 
│   with incus_instance.this[8],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-9" to get an IP address
│ 
│   with incus_instance.this[9],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-1" to get an IP address
│ 
│   with incus_instance.this[1],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-3" to get an IP address
│ 
│   with incus_instance.this[3],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)
╵
╷
│ Error: Failed to wait for instance "instance-5" to get an IP address
│ 
│   with incus_instance.this[5],
│   on issue-174.tf line 1, in resource "incus_instance" "this":
│    1: resource "incus_instance" "this" {
│ 
│ timeout while waiting for state to become 'OK' (last state: 'Waiting for network', timeout: 3m0s)

When I ran tofu apply again, it first destroyed the instances that were already created and then created them again. So, I think we’re getting the expected behavior. The instances aren’t fully in the state file, but Terraform can handle the issue by re-creating the instances.

@stgraber
Copy link
Member

stgraber commented Mar 7, 2025

@maveonair so in my case I usually had problems with tofu destroy in this situation.

Basically define a new project with a new custom volume and a new instance using the volume, do tofu apply and have it fail on the instance. When running tofu destroy it deletes the volume and fails because it's in use, or tries to delete the project and fails because it has an instance in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Development

No branches or pull requests

3 participants