Skip to content

Conversation

@manirajv06
Copy link
Contributor

What is this PR for?

Introduced a config "preemption.delay" to let admins to set the graceful period before triggering the preemption in case of any quota decrease.

What type of PR is it?

  • - Feature

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-3140

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 85.41667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.92%. Comparing base (fde27e5) to head (82d406f).

Files with missing lines Patch % Lines
pkg/scheduler/objects/queue.go 84.78% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1034      +/-   ##
==========================================
+ Coverage   80.91%   80.92%   +0.01%     
==========================================
  Files          99       99              
  Lines       12882    12915      +33     
==========================================
+ Hits        10424    10452      +28     
- Misses       2201     2206       +5     
  Partials      257      257              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the config change and nothing else in this PR. Leave out the Timer inclusion because that needs to be discussed further.

I definitely don't like having a bunch of timers running & firing, doing things in parallel. I understand that this is mentioned in the design docs, but I think we should maintain our "timer" as an abstract concept - eg. calculating when the preemption should occur by refreshing a value inside the Queue object. We call ClusterContext.schedule() in every 100ms anyway, so this should not be a big deal.

Look at the link I sent about Timer.Reset(). It's much safer if we maintain delays/deadlines on our own.

cc @wilfred-s

@manirajv06
Copy link
Contributor Author

Let's make the config change and nothing else in this PR. Leave out the Timer inclusion because that needs to be discussed further.

Ok, we can discuss in a separate PR and keep only config changes here.

Quota change through preemption is a one off event does not occur often, so not really sure whether clubbing this with critical core scheduling path is a reasonable thing to do, and need to ensure doing this does not impact other existing process in the cycle and vice-versa.

@manirajv06 manirajv06 requested a review from pbacsko October 23, 2025 11:53
@pbacsko
Copy link
Contributor

pbacsko commented Oct 30, 2025

@manirajv06 could you rebase the changes?

@manirajv06
Copy link
Contributor Author

@manirajv06 could you rebase the changes?

Rebased the changes. Please check.

Comment on lines +402 to +404
case resources.IsZero(newMaxResource) && !resources.IsZero(oldMaxResource):
reset = true
// Set max res now but not earlier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is not covered by tests, please extend the coverage.

Comment on lines +425 to +434
if set {
sq.quotaChangePreemptionDelay = conf.Preemption.Delay
}

// Reset preemption settings
if reset {
if sq.quotaChangePreemptionDelay != 0 {
sq.quotaChangePreemptionDelay = 0
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's simpler to move these one-liners to their respective case branches, eg.

case resources.StrictlyGreaterThan(oldMaxResource, newMaxResource) && conf.Preemption.Delay != 0:
			sq.quotaChangePreemptionDelay = conf.Preemption.Delay

and then we can get rid of set && reset vars.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants