Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add NodeResourcesFitPlus and ScarceResourceAvoidance plugin #2302

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LY-today
Copy link
Contributor

@LY-today LY-today commented Dec 20, 2024

What is your proposal:
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
. It is therefore hoped that both strategies can be extended to address this business need.

Why is this needed:
There are related descriptions above

Is there a suggested solution, if so, please add it:

plugin-one

config:

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

config description:
image

node score:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

plugin-two

config:

resources: 
- nvidia.com/gpu 

config description:
image

node score:

finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 70.08929% with 67 lines in your changes missing coverage. Please review.

Project coverage is 66.04%. Comparing base (a62dd49) to head (1b77ba5).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...ins/noderesourcefitplus/node_resources_fit_plus.go 66.96% 26 Missing and 11 partials ⚠️
...arceresourceavoidance/scarce_resource_avoidance.go 73.21% 22 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2302      +/-   ##
==========================================
+ Coverage   66.02%   66.04%   +0.02%     
==========================================
  Files         458      460       +2     
  Lines       53868    54147     +279     
==========================================
+ Hits        35564    35764     +200     
- Misses      15749    15807      +58     
- Partials     2555     2576      +21     
Flag Coverage Δ
unittests 66.04% <70.08%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@songchh
Copy link

songchh commented Dec 23, 2024

Expressing benefit, NodeResourcesFit is indeed very limited in some scenarios, especially in non-traditional CPU and Memory clusters, expensive resources like GPU do require special treatment.

@LY-today LY-today changed the title feat: add noderesourcefitplus and scarceresourceavoidance scheduler p… feat: add NodeResourcesFitPlus and ScarceResourceAvoidance plugin Dec 24, 2024
Signed-off-by: LY-today <[email protected]>

func New(args runtime.Object, handle framework.Handle) (framework.Plugin, error) {

sampleArgs2, ok := args.(*config.NodeResourcesFitPlusArgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

return 0, framework.NewStatus(framework.Error, fmt.Sprintf("get State node %q from PreScore: %v", nodeName, err))
}

podRequest, _ := fitsRequest(scoreState.Resource, nodeInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it mean fitsRequest and why we need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to:k8s.io/[email protected]/pkg/scheduler/framework/plugins/noderesources/fit.go:fitsRequest

Copy link
Contributor Author

@LY-today LY-today Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource types used to sense pods and nodes

return 0, framework.NewStatus(framework.Error, err.Error())
}

resourceScore, status := fit.(framework.ScorePlugin).Score(ctx, state, p, nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend you just copy the upstream code and update the license "Copyright 2022 The Koordinator Authors" to "Copyright 2022 The Kubernetes Authors"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to suggest duplicating MostAllocated and LeastAllocated's implementation of Score? Rather than suggesting this form of encapsulating score?

Copy link
Contributor Author

@LY-today LY-today Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of encapsulation here is to reduce the implementation of repeated logic. Don't quite understand the advantages of copying existing logic?


func New(args runtime.Object, handle framework.Handle) (framework.Plugin, error) {

sampleArgs2, ok := args.(*config.ScarceResourceAvoidanceArgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants