Skip to content

Conversation

@pkazmierczak
Copy link
Contributor

@pkazmierczak pkazmierczak commented Oct 15, 2025

Canaries for system jobs are placed on a tg.update.canary percent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment.

The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called evictUnneededCanaries is called (much like evictAndPlace is for honoring MaxParallel) which removes those canary placements that are not needed. We also change the behavior of computePlacements which no longer performs node feasibility checks, as these are performed earlier for every allocation and node. This way we get accurate counts of all feasible nodes that let us correctly set deployment state fields.

Fixes: #26885
Fixes: #26886

@pkazmierczak pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from 838bcd8 to e1234c1 Compare October 28, 2025 08:30
@pkazmierczak pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from e1234c1 to a6cd581 Compare October 28, 2025 17:33
tgross added a commit that referenced this pull request Oct 28, 2025
Two groups on the same job cannot both have a static port assignment, but this
ends up getting configured in the update block test for system deployments. This
test setup bug has complicated landing the fix in #26953.
tgross added a commit that referenced this pull request Oct 28, 2025
Two groups on the same job cannot both have a static port assignment, but this
ends up getting configured in the update block test for system deployments. This
test setup bug has complicated landing the fix in #26953.
@pkazmierczak pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from 697d0ff to 0c70ac8 Compare October 28, 2025 18:29
pkazmierczak and others added 11 commits October 31, 2025 09:07
…uting placements

Canaries for system jobs are placed on a tg.update.canary percent of
eligible nodes. Some of these nodes may not be feasible, and until now
we removed infeasible nodes during placement computation. However, if it
happens to be that the first eligible node we picked to place a canary
on is infeasible, this will lead to the scheduler halting deployment.

The solution presented here simplifies canary deployments: initially,
system jobs that use canary updates get allocations placed on all
eligible nodes, but before we start computing actual placements, a
method called `evictCanaries` is called (much like `evictAndPlace` is
for honoring MaxParallel), and performs a feasibility check on each node
up to the amount of required canaries per task group. Feasibility checks
are expensive, but this way we only check all the nodes in the worst
case scenario (with canary=100), otherwise we stop checks once we know
we are ready to place enough canaries.
…uting placements

Canaries for system jobs are placed on a tg.update.canary percent of
eligible nodes. Some of these nodes may not be feasible, and until now
we removed infeasible nodes during placement computation. However, if it
happens to be that the first eligible node we picked to place a canary
on is infeasible, this will lead to the scheduler halting deployment.

The solution presented here simplifies canary deployments: initially,
system jobs that use canary updates get allocations placed on all
eligible nodes, but before we start computing actual placements, a
method called `evictCanaries` is called (much like `evictAndPlace` is
for honoring MaxParallel), and performs a feasibility check on each node
up to the amount of required canaries per task group. Feasibility checks
are expensive, but this way we only check all the nodes in the worst
case scenario (with canary=100), otherwise we stop checks once we know
we are ready to place enough canaries.
Fix some previously broken counts in the system deployments test and add
comments to most of the placement counts to make it a little easier to read the
test and verify it's correct.
If we leave empty keys in that map, the plan will not be considered a
no-op, which it otherwise should be.
@pkazmierczak pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from 54b7110 to e51cce3 Compare October 31, 2025 08:08
@pkazmierczak pkazmierczak marked this pull request as ready for review October 31, 2025 08:10
@pkazmierczak pkazmierczak requested review from a team as code owners October 31, 2025 08:10
}
}

func mergeNodeFiltered(acc, curr *structs.AllocMetric) *structs.AllocMetric {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a function comment here? I couldn't figure out if acc and curr we intended to have specific meanings and what the merge ordering was initially.

Comment on lines 382 to 394
// no deployment for this TG
if _, ok := s.deployment.TaskGroups[tg.Name]; !ok {
continue
}

// we can set the desired total now, it's always the amount of all
// feasible nodes
s.deployment.TaskGroups[tg.Name].DesiredTotal = len(feasibleNodes)

dstate, ok := s.deployment.TaskGroups[tg.Name]
if !ok {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task group deployment is checked twice, do we need both?

The dstate variable (pointer to the map entry) is used in the isCanarying check, then we modify the map directly within the isCanarying conditional; is there a reason for this or could we use a single method for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, the check on line 383 is the only one we need. Apologies, this code evolved and I must've missed a spot while cleaning up.

Comment on lines +863 to +868
idx := slices.IndexFunc(s.plan.NodeUpdate[alloc.NodeID], func(a *structs.Allocation) bool {
return a.ID == alloc.PreviousAllocation
})
if idx > -1 {
s.plan.NodeUpdate[alloc.NodeID] = append(s.plan.NodeUpdate[alloc.NodeID][0:idx], s.plan.NodeUpdate[alloc.NodeID][idx+1:]...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could simplify this by using slices.DeleteFunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would simplify it but Chris' solution here avoids allocating an additional slice I believe? whereas slices.DeleteFunc returns a copy of the slice it deletes from. I think?


// if it's a canary, we only keep up to desiredCanaries amount of
// them
if alloc.DeploymentStatus.Canary {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume DeploymentStatus is not nil at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, canaries always have deployment status. We set it in computePlacements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants