-
Notifications
You must be signed in to change notification settings - Fork 623
[kubectl-plugin] avoid race in jobId #4071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[kubectl-plugin] avoid race in jobId #4071
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing it, I have tested locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Hi, @JosefNagelschmidt
Would you like to connect on Ray Slack? I’d love to learn more about how I can help.
My name is Han-Ju Chen (ray team) (GitHub - Future-Outlier)
cc @rueian for review, I am confident that this can be merged. |
if err != nil { | ||
return fmt.Errorf("Failed to get latest version of Ray job: %w", err) | ||
} | ||
options.RayJob.Spec.JobId = rayJobID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a reason why we set the job id after creating the CR. Could you resolve the conflict of the object has been modified
by retrying the update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my thought:
- Before this PR, when a user specified a
submissionID
, theRayJob
CR had already been created without it. To keep the CR aligned with the actual Ray submission, we had to do a post-createGet/Update
to setspec.jobId
. - In this PR, we generate (or specified by user) the
submissionID
upfront and embed it into theRayJob
before applying/creating it (e.g., viarayJobApplyConfig.Spec.JobId
). This makes thesubmissionID
ensures consistency, and removes the need for a follow-up Get/Update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a reason why we set the job id after creating the CR. Could you resolve the conflict of
the object has been modified
by retrying the update?
It is an option, but I wonder why this would be necessary (what is the exact reason for setting the job id after creating the CR? It seems to work seamlessly for me the other way). The suggested fix looks cleaner for me at least.
Why are these changes needed?
There is currently a race condition when setting
spec.jobId
in an update call of the CR. Additionally, ray job submit is started asynchronously in a goroutine, which can lead toexec: not started
.See issue #4070.
Related issue number
Closes #4070.
Checks