- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2k
scheduler: perform feasibility checks for system canaries before computing placements #26953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b057f84    to
    88f1fb4      
    Compare
  
    838bcd8    to
    e1234c1      
    Compare
  
    e1234c1    to
    a6cd581      
    Compare
  
    Two groups on the same job cannot both have a static port assignment, but this ends up getting configured in the update block test for system deployments. This test setup bug has complicated landing the fix in #26953.
Two groups on the same job cannot both have a static port assignment, but this ends up getting configured in the update block test for system deployments. This test setup bug has complicated landing the fix in #26953.
697d0ff    to
    0c70ac8      
    Compare
  
    …uting placements Canaries for system jobs are placed on a tg.update.canary percent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment. The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called `evictCanaries` is called (much like `evictAndPlace` is for honoring MaxParallel), and performs a feasibility check on each node up to the amount of required canaries per task group. Feasibility checks are expensive, but this way we only check all the nodes in the worst case scenario (with canary=100), otherwise we stop checks once we know we are ready to place enough canaries.
…uting placements Canaries for system jobs are placed on a tg.update.canary percent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment. The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called `evictCanaries` is called (much like `evictAndPlace` is for honoring MaxParallel), and performs a feasibility check on each node up to the amount of required canaries per task group. Feasibility checks are expensive, but this way we only check all the nodes in the worst case scenario (with canary=100), otherwise we stop checks once we know we are ready to place enough canaries.
Fix some previously broken counts in the system deployments test and add comments to most of the placement counts to make it a little easier to read the test and verify it's correct.
If we leave empty keys in that map, the plan will not be considered a no-op, which it otherwise should be.
54b7110    to
    e51cce3      
    Compare
  
    | } | ||
| } | ||
|  | ||
| func mergeNodeFiltered(acc, curr *structs.AllocMetric) *structs.AllocMetric { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a function comment here? I couldn't figure out if acc and curr we intended to have specific meanings and what the merge ordering was initially.
        
          
                scheduler/scheduler_system.go
              
                Outdated
          
        
      | // no deployment for this TG | ||
| if _, ok := s.deployment.TaskGroups[tg.Name]; !ok { | ||
| continue | ||
| } | ||
|  | ||
| // we can set the desired total now, it's always the amount of all | ||
| // feasible nodes | ||
| s.deployment.TaskGroups[tg.Name].DesiredTotal = len(feasibleNodes) | ||
|  | ||
| dstate, ok := s.deployment.TaskGroups[tg.Name] | ||
| if !ok { | ||
| continue | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task group deployment is checked twice, do we need both?
The dstate variable (pointer to the map entry) is used in the isCanarying check, then we modify the map directly within the isCanarying conditional; is there a reason for this or could we use a single method for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, the check on line 383 is the only one we need. Apologies, this code evolved and I must've missed a spot while cleaning up.
| idx := slices.IndexFunc(s.plan.NodeUpdate[alloc.NodeID], func(a *structs.Allocation) bool { | ||
| return a.ID == alloc.PreviousAllocation | ||
| }) | ||
| if idx > -1 { | ||
| s.plan.NodeUpdate[alloc.NodeID] = append(s.plan.NodeUpdate[alloc.NodeID][0:idx], s.plan.NodeUpdate[alloc.NodeID][idx+1:]...) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could simplify this by using slices.DeleteFunc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would simplify it but Chris' solution here avoids allocating an additional slice I believe? whereas slices.DeleteFunc returns a copy of the slice it deletes from. I think?
|  | ||
| // if it's a canary, we only keep up to desiredCanaries amount of | ||
| // them | ||
| if alloc.DeploymentStatus.Canary { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume DeploymentStatus is not nil at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, canaries always have deployment status. We set it in computePlacements.
Canaries for system jobs are placed on a
tg.update.canarypercent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment.The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called
evictUnneededCanariesis called (much likeevictAndPlaceis for honoring MaxParallel) which removes those canary placements that are not needed. We also change the behavior ofcomputePlacementswhich no longer performs node feasibility checks, as these are performed earlier for every allocation and node. This way we get accurate counts of all feasible nodes that let us correctly set deployment state fields.Fixes: #26885
Fixes: #26886