Skip to content

Commit 39f153d

Browse files
committed
docs: clarify reschedule, migrate, and replacement terminology
Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: #24918
1 parent c1dc9ed commit 39f153d

File tree

8 files changed

+79
-46
lines changed

8 files changed

+79
-46
lines changed

command/job_restart.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ Usage: nomad job restart [options] <job>
132132
groups are restarted.
133133
134134
When rescheduling, the current allocations are stopped triggering the Nomad
135-
scheduler to create replacement allocations that may be placed in different
135+
scheduler to create new allocations that may be placed in different
136136
clients. The command waits until the new allocations have client status
137137
'ready' before proceeding with the remaining batches. Services health checks
138138
are not taken into account.

scheduler/generic_sched.go

+9-8
Original file line numberDiff line numberDiff line change
@@ -470,7 +470,8 @@ func (s *GenericScheduler) computeJobAllocs() error {
470470
return s.computePlacements(destructive, place, results.taskGroupAllocNameIndexes)
471471
}
472472

473-
// downgradedJobForPlacement returns the job appropriate for non-canary placement replacement
473+
// downgradedJobForPlacement returns the previous stable version of the job for
474+
// downgrading a placement for non-canaries
474475
func (s *GenericScheduler) downgradedJobForPlacement(p placementResult) (string, *structs.Job, error) {
475476
ns, jobID := s.job.Namespace, s.job.ID
476477
tgName := p.TaskGroup().Name
@@ -588,8 +589,8 @@ func (s *GenericScheduler) computePlacements(destructive, place []placementResul
588589
}
589590

590591
// Check if we should stop the previous allocation upon successful
591-
// placement of its replacement. This allow atomic placements/stops. We
592-
// stop the allocation before trying to find a replacement because this
592+
// placement of the new alloc. This allow atomic placements/stops. We
593+
// stop the allocation before trying to place the new alloc because this
593594
// frees the resources currently used by the previous allocation.
594595
stopPrevAlloc, stopPrevAllocDesc := missing.StopPreviousAlloc()
595596
prevAllocation := missing.PreviousAllocation()
@@ -740,7 +741,7 @@ func (s *GenericScheduler) computePlacements(destructive, place []placementResul
740741
// Track the fact that we didn't find a placement
741742
s.failedTGAllocs[tg.Name] = s.ctx.Metrics()
742743

743-
// If we weren't able to find a replacement for the allocation, back
744+
// If we weren't able to find a placement for the allocation, back
744745
// out the fact that we asked to stop the allocation.
745746
if stopPrevAlloc {
746747
s.plan.PopUpdate(prevAllocation)
@@ -802,10 +803,10 @@ func needsToSetNodes(a, b *structs.Job) bool {
802803
a.NodePool != b.NodePool
803804
}
804805

805-
// propagateTaskState copies task handles from previous allocations to
806-
// replacement allocations when the previous allocation is being drained or was
807-
// lost. Remote task drivers rely on this to reconnect to remote tasks when the
808-
// allocation managing them changes due to a down or draining node.
806+
// propagateTaskState copies task handles from previous allocations to migrated
807+
// or replacement allocations when the previous allocation is being drained or
808+
// was lost. Remote task drivers rely on this to reconnect to remote tasks when
809+
// the allocation managing them changes due to a down or draining node.
809810
//
810811
// The previous allocation will be marked as lost after task state has been
811812
// propagated (when the plan is applied), so its ClientStatus is not yet marked

website/content/docs/commands/job/restart.mdx

+4-4
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ When both groups and tasks are defined only the tasks for the allocations of
4040
those groups are restarted.
4141

4242
When rescheduling, the current allocations are stopped triggering the Nomad
43-
scheduler to create replacement allocations that may be placed in different
44-
clients. The command waits until the new allocations have client status `ready`
45-
before proceeding with the remaining batches. Services health checks are not
46-
taken into account.
43+
scheduler to create new allocations that may be placed in different clients. The
44+
command waits until the new allocations have client status `ready` before
45+
proceeding with the remaining batches. Services health checks are not taken into
46+
account.
4747

4848
By default the command restarts all running tasks in-place with one allocation
4949
per batch.

website/content/docs/configuration/server.mdx

+4-4
Original file line numberDiff line numberDiff line change
@@ -438,17 +438,17 @@ Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
438438
operating as expected. Nomad Clients which do not heartbeat in the specified
439439
amount of time are considered `down` and their allocations are marked as `lost`
440440
or `disconnected` (if [`disconnect.lost_after`][disconnect.lost_after] is set)
441-
and rescheduled.
441+
and replaced.
442442

443443
The various heartbeat related parameters allow you to tune the following
444444
tradeoffs:
445445

446446
- The longer the heartbeat period, the longer a `down` Client's workload will
447-
take to be rescheduled.
447+
take to be replaced.
448448
- The shorter the heartbeat period, the more likely transient network issues,
449449
leader elections, and other temporary issues could cause a perfectly
450450
functional Client and its workloads to be marked as `down` and the work
451-
rescheduled.
451+
replaced.
452452

453453
While Nomad Clients can connect to any Server, all heartbeats are forwarded to
454454
the leader for processing. Since this heartbeat processing consumes resources,
@@ -510,7 +510,7 @@ system has for a delay in noticing crashed Clients. For example a
510510
`failover_heartbeat_ttl` of 30 minutes may give even the slowest clients in the
511511
largest clusters ample time to heartbeat after an election. However if the
512512
election was due to a datacenter-wide failure affecting Clients, it will be 30
513-
minutes before Nomad recognizes that they are `down` and reschedules their
513+
minutes before Nomad recognizes that they are `down` and replaces their
514514
work.
515515

516516
[encryption]: /nomad/tutorials/transport-security/security-gossip-encryption 'Nomad Encryption Overview'

website/content/docs/job-specification/disconnect.mdx

+22-10
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@ description: |-
1414
The `disconnect` block describes the system's behavior in case of a network
1515
partition. By default, without a `disconnect` block, if an allocation is on a
1616
node that misses heartbeats, the allocation will be marked `lost` and will be
17-
rescheduled.
17+
replaced.
18+
19+
Replacement happens when a node is lost. When a node is drained, Nomad
20+
[migrates][] the allocations instead and the `disconnect` block does not
21+
apply. When a Nomad agent fails to setup the allocation or the tasks of an
22+
allocation fail more than their [`restart`][] block allows, Nomad
23+
[reschedules][] the allocations and the `disconnect` block does not apply.
24+
1825

1926
```hcl
2027
job "docs" {
@@ -51,11 +58,12 @@ same `disconnect` block.
5158

5259
Refer to [the Lost After section][lost-after] for more details.
5360

54-
- `replace` `(bool: false)` - Specifies if the disconnected allocation should
55-
be replaced by a new one rescheduled on a different node. If false and the
56-
node it is running on becomes disconnected or goes down, this allocation
57-
won't be rescheduled and will be reported as `unknown` until the node reconnects,
58-
or until the allocation is manually stopped:
61+
- `replace` `(bool: false)` - Specifies if the disconnected allocation should be
62+
replaced by a new one rescheduled on a different node. The replacement
63+
allocation is considered a reschedule and will obey the job's [`reschedule`][]
64+
block. If false and the node it is running on becomes disconnected or goes
65+
down, this allocation won't be replaced and will be reported as `unknown`
66+
until the node reconnects, or until the allocation is manually stopped:
5967

6068
```plaintext
6169
`nomad alloc stop <alloc ID>`
@@ -84,7 +92,7 @@ same `disconnect` block.
8492
- `keep_original`: Always keep the original allocation. Bear in mind
8593
when choosing this option, it can have crashed while the client was
8694
disconnected.
87-
- `keep_replacement`: Always keep the allocation that was rescheduled
95+
- `keep_replacement`: Always keep the allocation that was replaced
8896
to replace the disconnected one.
8997
- `best_score`: Keep the allocation running on the node with the best
9098
score.
@@ -102,17 +110,17 @@ The following examples only show the `disconnect` blocks. Remember that the
102110
This example shows how `stop_on_client_after` interacts with
103111
other blocks. For the `first` group, after the default 10 second
104112
[`heartbeat_grace`] window expires and 90 more seconds passes, the
105-
server will reschedule the allocation. The client will wait 90 seconds
113+
server will replace the allocation. The client will wait 90 seconds
106114
before sending a stop signal (`SIGTERM`) to the `first-task`
107115
task. After 15 more seconds because of the task's `kill_timeout`, the
108116
client will send `SIGKILL`. The `second` group does not have
109-
`stop_on_client_after`, so the server will reschedule the
117+
`stop_on_client_after`, so the server will replace the
110118
allocation after the 10 second [`heartbeat_grace`] expires. It will
111119
not be stopped on the client, regardless of how long the client is out
112120
of touch.
113121

114122
Note that if the server's clocks are not closely synchronized with
115-
each other, the server may reschedule the group before the client has
123+
each other, the server may replace the group before the client has
116124
stopped the allocation. Operators should ensure that clock drift
117125
between servers is as small as possible.
118126

@@ -217,3 +225,7 @@ group "second" {
217225
[stop-after]: /nomad/docs/job-specification/disconnect#stop-after
218226
[lost-after]: /nomad/docs/job-specification/disconnect#lost-after
219227
[`reconcile`]: /nomad/docs/job-specification/disconnect#reconcile
228+
[migrates]: /nomad/docs/job-specification/migrate
229+
[`restart`]: /nomad/docs/job-specification/restart
230+
[reschedules]: /nomad/docs/job-specification/reschedule
231+
[`reschedule`]: /nomad/docs/job-specification/reschedule

website/content/docs/job-specification/group.mdx

+19-19
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,9 @@ job "docs" {
4848
ephemeral disk requirements of the group. Ephemeral disks can be marked as
4949
sticky and support live data migrations.
5050

51-
- `disconnect` <code>([disconnect][]: nil)</code> - Specifies the disconnect
52-
strategy for the server and client for all tasks in this group in case of a
53-
network partition. The tasks can be left unconnected, stopped or replaced
51+
- `disconnect` <code>([disconnect][]: nil)</code> - Specifies the disconnect
52+
strategy for the server and client for all tasks in this group in case of a
53+
network partition. The tasks can be left unconnected, stopped or replaced
5454
when the client disconnects. The policy for reconciliation in case the client
5555
regains connectivity is also specified here.
5656

@@ -65,14 +65,14 @@ job "docs" {
6565
requirements and configuration, including static and dynamic port allocations,
6666
for the group.
6767

68-
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the reschedule behaviour
69-
of an allocation when the node it is running on misses heartbeats.
70-
When enabled, if the node it is running on becomes disconnected
71-
or goes down, this allocations wont be rescheduled and will show up as `unknown`
72-
until the node comes back up or it is manually restarted.
68+
- `prevent_reschedule_on_lost` `(bool: false)` - Defines the replacement
69+
behaviour of an allocation when the node it is running on misses heartbeats.
70+
When enabled, if the node it is running on becomes disconnected or goes down,
71+
this allocations wont be replaced and will show up as `unknown` until the node
72+
comes back up or it is manually restarted.
7373

74-
This behaviour will only modify the reschedule process on the server.
75-
To modify the allocation behaviour on the client, see
74+
This behaviour will only modify the replacement process on the server. To
75+
modify the allocation behaviour on the client, see
7676
[`stop_after_client_disconnect`](#stop_after_client_disconnect).
7777

7878
The `unknown` allocation has to be manually stopped to run it again.
@@ -84,7 +84,7 @@ job "docs" {
8484
Setting `max_client_disconnect` and `prevent_reschedule_on_lost = true` at the
8585
same time requires that [rescheduling is disabled entirely][`disable_rescheduling`].
8686

87-
This field was deprecated in favour of `replace` on the [`disconnect`] block,
87+
This field was deprecated in favour of `replace` on the [`disconnect`] block,
8888
see [example below][disconect_migration] for more details about migrating.
8989

9090
- `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
@@ -299,18 +299,18 @@ issues with stateful tasks or tasks with long restart times.
299299

300300
Instead, an operator may desire that these allocations reconnect without a
301301
restart. When `max_client_disconnect` or `disconnect.lost_after` is specified,
302-
the Nomad server will mark clients that fail to heartbeat as "disconnected"
302+
the Nomad server will mark clients that fail to heartbeat as "disconnected"
303303
rather than "down", and will mark allocations on a disconnected client as
304304
"unknown" rather than "lost". These allocations may continue to run on the
305305
disconnected client. Replacement allocations will be scheduled according to the
306-
allocations' `disconnect.replace` settings. until the disconnected client
307-
reconnects. Once a disconnected client reconnects, Nomad will compare the "unknown"
308-
allocations with their replacements and will decide which ones to keep according
309-
to the `disconnect.replace` setting. If the `max_client_disconnect` or
310-
`disconnect.losta_after` duration expires before the client reconnects,
306+
allocations' `disconnect.replace` settings. until the disconnected client
307+
reconnects. Once a disconnected client reconnects, Nomad will compare the "unknown"
308+
allocations with their replacements and will decide which ones to keep according
309+
to the `disconnect.replace` setting. If the `max_client_disconnect` or
310+
`disconnect.losta_after` duration expires before the client reconnects,
311311
the allocations will be marked "lost".
312312
Clients that contain "unknown" allocations will transition to "disconnected"
313-
rather than "down" until the last `max_client_disconnect` or `disconnect.lost_after`
313+
rather than "down" until the last `max_client_disconnect` or `disconnect.lost_after`
314314
duration has expired.
315315

316316
In the example code below, if both of these task groups were placed on the same
@@ -390,7 +390,7 @@ will remain as `unknown` and won't be rescheduled.
390390
#### Migration to `disconnect` block
391391

392392
The new configuration fileds in the disconnect block work exactly the same as the
393-
ones they are replacing:
393+
ones they are replacing:
394394
* `stop_after_client_disconnect` is replaced by `stop_after`
395395
* `max_client_disconnect` is replaced by `lost_after`
396396
* `prevent_reschedule_on_lost` is replaced by `replace`

website/content/docs/job-specification/migrate.mdx

+10
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@ If specified at the job level, the configuration will apply to all groups
2222
within the job. Only service jobs with a count greater than 1 support migrate
2323
blocks.
2424

25+
Migrating happens when a Nomad node is drained. When a node is lost, Nomad
26+
[replaces][] the allocations instead and the `migrate` block does not
27+
apply. When the agent fails to setup the allocation or the tasks of an
28+
allocation more than their [`restart`][] block allows, Nomad [reschedules][] the
29+
allocations instead and the `migrate` block does not apply.
30+
31+
2532
```hcl
2633
job "docs" {
2734
migrate {
@@ -78,3 +85,6 @@ on node draining.
7885
[count]: /nomad/docs/job-specification/group#count
7986
[drain]: /nomad/docs/commands/node/drain
8087
[deadline]: /nomad/docs/commands/node/drain#deadline
88+
[replaces]: /nomad/docs/job-specification/disconnect#replace
89+
[`restart`]: /nomad/docs/job-specification/restart
90+
[reschedules]: /nomad/docs/job-specification/reschedule

website/content/docs/job-specification/reschedule.mdx

+10
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ Nomad will attempt to schedule the allocation on another node if any of its
3131
task statuses become `failed`. The scheduler prefers to create a replacement
3232
allocation on a node that was not used by a previous allocation.
3333

34+
Rescheduling happens when the Nomad agent fails to setup the allocation or the
35+
tasks of an allocation fail more than their [`restart`][] block allows. When a
36+
node is drained, Nomad [migrates][] the allocations instead and the `reschedule`
37+
block does not apply. When a node is lost, Nomad [replaces][] the allocations
38+
instead and the `reschedule` block does not apply.
39+
3440

3541
```hcl
3642
job "docs" {
@@ -131,3 +137,7 @@ job "docs" {
131137
```
132138

133139
[`progress_deadline`]: /nomad/docs/job-specification/update#progress_deadline
140+
[`restart`]: /nomad/docs/job-specification/restart
141+
[migrates]: /nomad/docs/job-specification/migrate
142+
[replaces]: /nomad/docs/job-specification/disconnect#replace
143+
[reschedules]: /nomad/docs/job-specification/reschedule

0 commit comments

Comments
 (0)