You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your proposal:
The current Yarn with Koordinator solution synchronizes all batch resources to the YARN RM. We hope to improve this solution so that BE pods and YARN tasks can share batch resources. Therefore, it is necessary to enhance the mechanism for synchronizing and managing batch resources between Koord and YARN. It is necessary to introduce a new configuration thirdPartyResourceConfig to calculate the amount of batch resources that can be used by YARN and implementing real-time control over YARN tasks' cgroup based on this configuration.
Do you want to control the upper limit of batch resources that Yarn can use?
If I understand correctly, I would like to know what scenarios require setting an upper limit
Yes, in our scenario, we want to deploy three kinds of workloads on k8s nodes:
online serving pods: use prod resources.
training pods: run as BE pods and use batch resources.
yarn tasks: managed as containers(not pod) by YARN to use the remaining batch resources after the training containers have been allocated their resources. Setting upper limit of batch resources that Yarn can use is to keep some buffer in case of immediate training pods.
Yes, in our scenario, we want to deploy three kinds of workloads on k8s nodes:
online serving pods: use prod resources.
training pods: run as BE pods and use batch resources.
yarn tasks: managed as containers(not pod) by YARN to use the remaining batch resources after the training containers have been allocated their resources. Setting upper limit of batch resources that Yarn can use is to keep some buffer in case of immediate training pods.
NodeManager is running as Pod, yarn tasks(like spark driver/executors) are running as containers managed by NodeManager.
The proposal is based on this solution: https://koordinator.sh/docs/designs/koordinator-yarn
What is your proposal:
The current Yarn with Koordinator solution synchronizes all batch resources to the YARN RM. We hope to improve this solution so that BE pods and YARN tasks can share batch resources. Therefore, it is necessary to enhance the mechanism for synchronizing and managing batch resources between Koord and YARN. It is necessary to introduce a new configuration thirdPartyResourceConfig to calculate the amount of batch resources that can be used by YARN and implementing real-time control over YARN tasks' cgroup based on this configuration.
Example:
Why is this needed:
Described above
Is there a suggested solution, if so, please add it:
Here is an initial draft for the detailed design
The text was updated successfully, but these errors were encountered: