-
Notifications
You must be signed in to change notification settings - Fork 2k
[scheduler] fix scheduling behavior of batch job allocs #26961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
879b348 to
3ad0ff5
Compare
3ad0ff5 to
f148392
Compare
f148392 to
c1a383c
Compare
c1a383c to
dddc7db
Compare
dddc7db to
5c370fe
Compare
5c370fe to
02dc6ec
Compare
02dc6ec to
284d3e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've left a few very small comments, but nothing critical. We'll need a changelog entry before we can merge it.
In addition to the specific behaviors discussed here, I made a build from this and double-checked we were still getting the expected behavior when a batch allocation simply fails and gets rescheduled. That's still all working as expected. Nice work on this.
| // long pauses on this API call. | ||
| // | ||
| // BREAKING: This method will have the following signature in 1.6.0 | ||
| // func (a *Allocations) Stop(allocID string, w *WriteOptions) (*AllocStopResponse, error) { | ||
| func (a *Allocations) Stop(alloc *Allocation, q *QueryOptions) (*AllocStopResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having a versioned API makes this super painful to fix these kinds of things. You're right better just to back out that intention to change it, and live with it.
| // Wait for allocs to be replaced | ||
| finalAllocs := waitForAllocsStop(t, store, n1.ID, nil) | ||
| waitForPlacedAllocs(t, store, n2.ID, 5) | ||
| waitForPlacedAllocs(t, store, n2.ID, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, no wonder this bug lurked for so long!
command/agent/alloc_endpoint.go
Outdated
| reschedule := false | ||
| if rescheduleQS := req.URL.Query().Get("reschedule"); rescheduleQS != "" { | ||
| var err error | ||
| reschedule, err = strconv.ParseBool(rescheduleQS) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("reschedule value is not a boolean: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a parseBool helper in command/agent/http.go that you can use like
reschedule, err := parseBool(req, "reschedule")
if err != nil {
return nil, err
}(which we could use above for no_shutdown_delay as well)
scheduler/reconciler/filters.go
Outdated
| // filterServerTerminalAllocs returns a new allocSet that includes only | ||
| // batch job type that are not marked for rescheduling or non-server-terminal | ||
| // allocations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying the compound clause a bit:
| // filterServerTerminalAllocs returns a new allocSet that includes only | |
| // batch job type that are not marked for rescheduling or non-server-terminal | |
| // allocations. | |
| // filterServerTerminalAllocs returns a new allocSet that includes only | |
| // non-server-terminal allocations, and batch job allocs that are not marked for rescheduling. |
284d3e5 to
384a984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Allocations of batch jobs have a few defined behaviors documented which do not work as expected: First, on node drain, the allocation is allowed to complete unless the deadline is reached at which point the allocation is killed. The allocation is note replaced. Second, when using the `alloc stop` command, the allocation is stopped and then rescheduled according to its reschedule policy. Third, on job restart if the `-reschedule` flag is used the allocation will be migrated and its reschedule policy will be ignored. This update removes the change introduced in dfa07e1 (#26025) that forced batch job allocations into a failed state when migrating. The reported issue it was attempting to resolve was itself incorrect behavior. The reconciler has been adjusted to properly handle batch job allocations as documented.
384a984 to
70f55aa
Compare
Description
Batch jobs have a few documented behaviors which are described here:
nomad alloc stop- the allocation should be rescheduledfailed- the allocation should be rescheduledjob restartis used with the-rescheduleflag - stops and migrates allocations instead of restarting in-place (this command migrates but does not reschedule allocations, so it ignores therescheduleblock)This changeset updates Nomad's behavior with batch job allocations so they behave as documented. Within this changeset, modifications introduced in dfa07e1 (#26025) that forced batch job allocations into a failed state when migrating. The reported issue it was attempting to resolve was itself incorrect behavior. The reconciler has been adjusted to properly handle batch job allocations as documented.
Changes of note
Eval trigger reasons
A new new eval trigger reason was added to provide better information to the user. It is shown and explained in the last examples below.
Allocations API
The Allocations.Stop function was using an old helper which was resulting in any defined query parameters being silently dropped. This was updated to use the newer helper when passes the full query.
The
Allocation.DesiredTransitionwas also updated to match its counterpart along with the same helper functions.New desired transition field
A new field is added to the
DesiredTransition- MigrateDisablePlacement. This is used when draining to allow the allocation to be stopped, but prevents it from being placed to achieve the desired draining behavior.Testing & Reproduction steps
batch jobspec
Behavior on main
alloc stop command
This shows the behavior of the
alloc stopcommand on a batch job allocation. The job is started and then a single allocation is stopped:Here we can see the result of the
alloc stopcommand is the allocation is stopped in a failed state and the allocation is immediately replaced. The desired behavior here is that the allocation should be stopped with acompletestatus, and the allocation should be rescheduled based on the reschedule policy.drain behavior
This shows the behavior of a node drain on batch job allocations. The job is started and then a single node is drained with a one second deadline:
The drain stops the two allocations on the node in a failed state, and immediately places two new allocations. For drains, the allocations should be stopped with a
completestatus and the allocations should not be replaced.Behavior with this changeset
alloc stop command
Now the allocation is stopped, in a complete state, and a new allocation hasn't immediately replaced it. Instead, the allocation has been rescheduled based on the reschedule policy as expected from the documented behavior. Once the delayed evaluation is executed, the new allocation is placed.
drain behavior
This shows the behavior of a node drain on batch job allocations. The job is started and then a single node is drained with a one second deadline:
The drain stops the two allocations on the node in a completed state, and the allocations are not replaced. This matches the documented expected behavior.
New evaluation trigger reason
The current behavior of nomad when rescheduling an allocation is to assume the allocation being replaced has failed. When stopping an allocation, this results in an
eval statuswith the following:The
TriggeredByinsinuates that the eval was triggered by the allocation failing, but it was triggered by the allocation being rescheduled due to thealloc stopcommand. To more correctly describe the reason, theEvalTriggerAllocRescheduleconstant was introduced and used in this situation, which gives the valuealloc-rescheduleas shown below:Links
Fixes #26929
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.