-
Notifications
You must be signed in to change notification settings - Fork 256
[YUNIKORN-3140] Quota change through preemption config changes #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1034 +/- ##
==========================================
+ Coverage 80.91% 80.92% +0.01%
==========================================
Files 99 99
Lines 12882 12915 +33
==========================================
+ Hits 10424 10452 +28
- Misses 2201 2206 +5
Partials 257 257 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the config change and nothing else in this PR. Leave out the Timer inclusion because that needs to be discussed further.
I definitely don't like having a bunch of timers running & firing, doing things in parallel. I understand that this is mentioned in the design docs, but I think we should maintain our "timer" as an abstract concept - eg. calculating when the preemption should occur by refreshing a value inside the Queue object. We call ClusterContext.schedule() in every 100ms anyway, so this should not be a big deal.
Look at the link I sent about Timer.Reset(). It's much safer if we maintain delays/deadlines on our own.
cc @wilfred-s
Ok, we can discuss in a separate PR and keep only config changes here. Quota change through preemption is a one off event does not occur often, so not really sure whether clubbing this with critical core scheduling path is a reasonable thing to do, and need to ensure doing this does not impact other existing process in the cycle and vice-versa. |
|
@manirajv06 could you rebase the changes? |
edefe75 to
82d406f
Compare
Rebased the changes. Please check. |
| case resources.IsZero(newMaxResource) && !resources.IsZero(oldMaxResource): | ||
| reset = true | ||
| // Set max res now but not earlier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is not covered by tests, please extend the coverage.
| if set { | ||
| sq.quotaChangePreemptionDelay = conf.Preemption.Delay | ||
| } | ||
|
|
||
| // Reset preemption settings | ||
| if reset { | ||
| if sq.quotaChangePreemptionDelay != 0 { | ||
| sq.quotaChangePreemptionDelay = 0 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's simpler to move these one-liners to their respective case branches, eg.
case resources.StrictlyGreaterThan(oldMaxResource, newMaxResource) && conf.Preemption.Delay != 0:
sq.quotaChangePreemptionDelay = conf.Preemption.Delay
and then we can get rid of set && reset vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits
What is this PR for?
Introduced a config "preemption.delay" to let admins to set the graceful period before triggering the preemption in case of any quota decrease.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-3140
How should this be tested?
Screenshots (if appropriate)
Questions: