Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retry logic #149

Merged
merged 1 commit into from
Nov 30, 2023
Merged

Fix retry logic #149

merged 1 commit into from
Nov 30, 2023

Conversation

samtkaplan
Copy link
Member

Ensure retry logic on failed worker startup is triggered. In recent weeks, we observed that machines are failing to join the cluster. Further investigation showed that one cause was the initial worker/coordinator hand-shake. In this handshake the workers were seeing 0's instead of the cluster cookie. I do not understand why this is happening, but here we ensure that we have a functioning work-a-round. In particular, when we receive 0's at the worker during this handshake, we abandon and retry the process of joining the cluster.

@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2023

Codecov Report

Attention: 71 lines in your changes are missing coverage. Please review.

Comparison is base (799c046) 47.11% compared to head (d63f347) 45.07%.

Files Patch % Lines
src/AzManagers.jl 0.00% 71 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
- Coverage   47.11%   45.07%   -2.04%     
==========================================
  Files           3        3              
  Lines        1696     1726      +30     
==========================================
- Hits          799      778      -21     
- Misses        897      948      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Ensure retry logic on failed worker startup is triggered.
@samtkaplan samtkaplan merged commit 0e6443b into master Nov 30, 2023
@samtkaplan samtkaplan deleted the retrystart branch November 30, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants