resource: Retry helper exits before callback finishes #529
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On conditions where callback supplied to resource.Retry() takes a long
time to execute, a timeout error is returned to caller while the
callback is still running.
To remind the reader, WaitForState creates a separate goroutine for
calling Refresh() callback from StateChangeConf. The execution of
Refresh() goroutine is monitored in the main thread with a timeout.
However, while timeout does issue a cancellation for the Refresh()
goroutine, it does not interrupt the already started Refresh() calls.
In addition, cancellation mechanism itself gave up after grace period
expired on timeouts. This potentially left the Refresh() callback
running in the background while returning TimeoutError to the caller.
When timeout error is returned, plugin(s) incorrectly assume that it is
safe to rerun the Refresh() content. In reality, the state after
timeout is completely unknown.
Annotated example from AWS plugin:
The fix is simply to remove the grace period and the matching select
timeout. This makes WaitForState() block until Refresh() goroutine
returns something to result channel.