Bug: Cluster creation race condition #10

fhke · 2022-10-09T19:07:52Z

Overview

When using the symbiosis terraform provider to create clusters, I occasionally get these errors:

│ Error: Post "https://api.symbiosis.host/rest/v1/cluster": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
│ 
│   with module.worker_clusters["test-01"].module.cluster.symbiosis_cluster.main,
│   on modules/cluster/cluster.tf line 1, in resource "symbiosis_cluster" "main":
│    1: resource "symbiosis_cluster" "main" {
│

This is because the symbiosis-go module uses a fixed 90s timeout for all API calls. Despite the request timing out, the cluster is created, causing subsequent terraform applies to fail as the newly created cluster is not recorded in the terraform state.

Questions

Is it expected that POST requests to the /rest/v1/cluster endpoint occasionally take longer than 90 seconds? If so, the symbiosis-go module should be updated to use a longer timeout for cluster create requests.
Is it acceptable for a cluster to be created even if the client request times out? With the current API behaviour, it's possible that users could experience "cluster leak" scenarios, particularly if they are trying to create short-lived clusters for CI.
If the initial POST to /rest/v1/cluster times out, should the cluster be added to the terraform state as a tainted resource, so that it gets recreated on the next terraform apply?

The text was updated successfully, but these errors were encountered:

thecodeassassin · 2022-10-12T16:58:23Z

Hi @fhke,

Thank you for your bug report.

Is it expected that POST requests to the /rest/v1/cluster endpoint occasionally take longer than 90 seconds? If so, the symbiosis-go module should be updated to use a longer timeout for cluster create requests.
It's quite unusual for a cluster creation call to take longer than 90 seconds. Would you be able to tell me when you attempted to create this cluster? Was was the full cluster config that you used?

Is it acceptable for a cluster to be created even if the client request times out? With the current API behaviour, it's possible that users could experience "cluster leak" scenarios, particularly if they are trying to create short-lived clusters for CI.

I am not sure. I am not sure what would have caused this situation to occur. Inconsistent state is never a good thing and we'll definitely look into a good solution to this issue.

If the initial POST to /rest/v1/cluster times out, should the cluster be added to the terraform state as a tainted resource, so that it gets recreated on the next terraform apply?

Perhaps. We will definitely take this into consideration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Cluster creation race condition #10

Bug: Cluster creation race condition #10

fhke commented Oct 9, 2022

thecodeassassin commented Oct 12, 2022

Bug: Cluster creation race condition #10

Bug: Cluster creation race condition #10

Comments

fhke commented Oct 9, 2022

Overview

Questions

thecodeassassin commented Oct 12, 2022