Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Cluster creation race condition #10

Open
fhke opened this issue Oct 9, 2022 · 1 comment
Open

Bug: Cluster creation race condition #10

fhke opened this issue Oct 9, 2022 · 1 comment

Comments

@fhke
Copy link
Contributor

fhke commented Oct 9, 2022

Overview

When using the symbiosis terraform provider to create clusters, I occasionally get these errors:

│ Error: Post "https://api.symbiosis.host/rest/v1/cluster": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
│ 
│   with module.worker_clusters["test-01"].module.cluster.symbiosis_cluster.main,
│   on modules/cluster/cluster.tf line 1, in resource "symbiosis_cluster" "main":
│    1: resource "symbiosis_cluster" "main" {
│

This is because the symbiosis-go module uses a fixed 90s timeout for all API calls. Despite the request timing out, the cluster is created, causing subsequent terraform applies to fail as the newly created cluster is not recorded in the terraform state.

Questions

  • Is it expected that POST requests to the /rest/v1/cluster endpoint occasionally take longer than 90 seconds? If so, the symbiosis-go module should be updated to use a longer timeout for cluster create requests.
  • Is it acceptable for a cluster to be created even if the client request times out? With the current API behaviour, it's possible that users could experience "cluster leak" scenarios, particularly if they are trying to create short-lived clusters for CI.
  • If the initial POST to /rest/v1/cluster times out, should the cluster be added to the terraform state as a tainted resource, so that it gets recreated on the next terraform apply?
@thecodeassassin
Copy link
Member

Hi @fhke,

Thank you for your bug report.

Is it expected that POST requests to the /rest/v1/cluster endpoint occasionally take longer than 90 seconds? If so, the symbiosis-go module should be updated to use a longer timeout for cluster create requests.
It's quite unusual for a cluster creation call to take longer than 90 seconds. Would you be able to tell me when you attempted to create this cluster? Was was the full cluster config that you used?

Is it acceptable for a cluster to be created even if the client request times out? With the current API behaviour, it's possible that users could experience "cluster leak" scenarios, particularly if they are trying to create short-lived clusters for CI.

I am not sure. I am not sure what would have caused this situation to occur. Inconsistent state is never a good thing and we'll definitely look into a good solution to this issue.

If the initial POST to /rest/v1/cluster times out, should the cluster be added to the terraform state as a tainted resource, so that it gets recreated on the next terraform apply?

Perhaps. We will definitely take this into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants