-
Notifications
You must be signed in to change notification settings - Fork 456
[KEP-2349]: Add MultiKueue support for external custom job #5981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP-2349]: Add MultiKueue support for external custom job #5981
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
Welcome @khrm! |
|
Hi @khrm. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
khrm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8a01afb to
6ce068d
Compare
|
/ok-to-test |
6ce068d to
01edd11
Compare
| kind: "PipelineRun" | ||
| multiKueueAdapter: | ||
| managedBy: ".spec.managedBy" | ||
| creationPatches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this feels like something that should live with the external controller. This sounds like you are asking Kueue to create a mutating webhook for your controller.
Why can't you add this logic into tekton-kueue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we want to remove something from manager cluster before copying to worker cluster, how will external controller handle this? The external controller itself shouldn't require the secret or mechanism to copy from manager to worker. Same during the sync from worker to manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for joining the party late. I feel the same - It looks like this is the responsibility of external controller. I am wondering what is wrong with external controller handling it? Is the cognitive load on plugin developer a concern. I felt like the user should be able to submit to the custom type -> have the logic to translate it a workload object -> Submit the workload to the management cluster -> Get the name of the cluster suggested by Multikueue -> Create and Sync the workload there. There are some workflow changes that can be tweaked but I'm wondering what is wrong with putting this burden on plugin developer and making it Multi-Kueue just a workload placement engine - similar to core k8s scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's still the responsibility of external controller to generate workload.
The only thing we have exposed or made configurable is just ability of MultiKueue adaptor to sync resource and status between clusters.
The issue isn't the code. The issue is that we will need to run a sync controller of workload in the same namespace and give it access to the same secret which MultiKueue has access to. And there's no benefit to doing all this. It's just a sync behaviour which is common for all CRs.
Of course, if someone want to create an external controller, there's nothing stopping from doing that.
It made sense to have an external controller for local workload creation due to the complexity of lifecycle involved, but I don't see that in case of MultiKueue sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's still the responsibility of external controller to generate workload.
You mean the kueue workload CR? I thought you are a registered type, you cannot?
The issue is that we will need to run a sync controller of workload in the same namespace and give it access to the same secret which MultiKueue has access to. And there's no benefit to doing all this. It's just a sync behaviour which is common for all CRs.
What I am suggesting is much simpler, you can create your own controller in another namespace and have RBAC specific to CRD to do syncing for workload clusters, no need to rely on secret present in multi-kueue namespace. This way, it truly makes sure that Multi-kueue is just deciding on the cluster where the workload should run and rest of the work is done by controller giving us clear separation of clusters.(This also means we might have to change Multi-Adapter interface without SyncJob method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravisantoshgudimetla As I described in https://kubernetes.slack.com/archives/C032ZE66A2X/p1755181736825669, this KEP aims to delegate only Workload resource creation capability. In other words, the kueue-controller-manager is still responsible to synchronize Workload between manager and worker clusters.
But, as I mentioned in #5981 (comment), I would like to make an effort to generalize the Workload synchronization, but it should be handled as a separate KEP (enhancement). So, if you are interested in that, I'm happy to review your proposal.
| - group: "pipelinerun.tekton.dev" | ||
| version: "v1" | ||
| kind: "PipelineRun" | ||
| multiKueueAdapter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the idea of putting this information into the kueue configuration. In other KEPs we explored the idea of singleton CRs. We could explore a CRD for this.
I'm still not sure on this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about a CRD too, but configmap could also suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the CRD paradigm too, but until we have it, I'm ok in the configMap.
|
Instead of just a Jsonpatch, we should use CEL format like we have for mutating-admission-policy I will modify KEP with that approach. |
I’d be careful with MAP. Are you interested in supporting this feature on older K8s clusters? I think MAP will be beta in 1.34 which is not out yet. |
|
I would prefer an alternative approach for supporting external frameworks which is similar to how the jobframework works. In short, I think that external frameworks should run their own multikueue admission check and Kueue should provide them the framework to write it. I've explained this approach in #6117 |
|
@kannon92 I don't mean supporting MAP. I meant changing our design to how MAP works. So for this KEP, changing it to use CEL instead of only Jsonpatch. |
01edd11 to
9a327d0
Compare
cee2cac to
bd17f44
Compare
| // Controller specifies which controller should manage this custom job. | ||
| // Can be "generic" or "external". | ||
| // Defaults to "generic". | ||
| // +optional | ||
| Controller ControllerType `json:"controller,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest dropping this for now, we can introduce when proven this is needed.
|
Let me summarize my view, and reasoning why I like this approach. First, over time three major alternatives are proposed:
I think (1.) is clearly inferior to (2.) and (3.) as it requires opening API to external MultiKueue controllers which is very committal, and the most complex. Writing ACs is complex, and MK AC is particularly complex, so I really want users to avoid implementing them. The downsides of (3.): I think the most likely outcome of (1.) and (3.) due to complexity is users forking or copy/pasting the Kueue's implementation and hacking on top of it. Design which essentially requires users to copy-paste Kueue code is far from ideal. A valid concern raised by @tenzen-y about this proposal is that it requires commitment to the API to be maintained long-term. However:
Yes, in the extended form the API is a more complex list of patches resembling MAP, but we don't yet have an indication it needs to be extended (AFAIK). Yes, Pods would not be supported by managedBy, but we can support them by a dedicated adapter. I expect most external jobs to be simple managedBy + status sync. |
| ..... | ||
| // ExternalFrameworks lists the external frameworks to be supported by MultiKueue. | ||
| // +optional | ||
| ExternalFrameworks []ExternalFramework `json:"externalFrameworks,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ExternalFrameworks []ExternalFramework `json:"externalFrameworks,omitempty"` | |
| ExternalFrameworks []string `json:"externalFrameworks,omitempty"` |
As I mentioned in #5981 (comment), we can extract gvk from Kind.version.group.com, which is a standardized form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion in https://kubernetes.slack.com/archives/C032ZE66A2X/p1755181736825669, we decided to introduce ExternalFramework type, which has only a name field. In the future, we might introduce controllerType. But, it is not introduced in the alpha phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed on slack, we are conservative about the API, but some degree of extensibility might be needed in the future. Some ideas:
- allows to specifiy controllerType: Generic / External to discriminate between the mechanisms
- position of the managedBy (but we should be careful about it and not plan for Alpha)
- syncPatches to customize the synchronization (but we should be careful about it and not plan for Alpha)
So, we propose to expose just
type MultiKueueExternalFramework struct {
Name string `json:"name"`
}
I also want external-controller developers to rely on the kueue-controller-manager AC approval mechanism. So, I am thinking that we should establish an interface to make those mechanisms easy for the external. That's my proposal and don't indicate that external developers implement (or copy/paste) AC approval mechanism. But, if we can introduce only As a summary, in the short-term solution, we can introduce ONLY |
Me too, this KEP is achieving this actually.
I'm not clear how to achieve it other than in the KEP. I expect this would require exposing huge amount of code as a function at compile time. I think it will be really hard to ensure the interface does not break users in the future.
The KEP didn't mention ControllerName but ControllerType. The idea of controllerType is to discriminate between "External" and "Generic" in the future if we want to support both (2.) and (3.) long term. To me, I hope (2.) would be enough long term actually. EDIT: I don't see use cases for the patches yet, but they may arise. Also, we may need to be able to specify the location of the |
|
|
||
| The `controller` field is optional and can have two values: | ||
| - `generic` (default): Use the built-in generic MultiKueue adapter. This adapter performs a default set of operations to manage the job on a worker cluster. It assumes a `.spec.managedBy` field in the custom job and will handle removing it before creation on the worker, as well as syncing the `/status` field back from the worker to the management cluster. | ||
| - `external`: Expect an external, multi-cluster aware controller to manage the job. In this mode, MultiKueue will only sync the `Workload` object to the worker cluster. The external controller is responsible for creating the job on the worker and syncing its status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to be implemented as Alpha for the KEP, so we can drop and move this part to the Alternatives section as a potential extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I would move this to alternative.
We can break the interface anytime, even with a significant break. That's why I want to select the interface rather than exposed API.
Yes, |
bd17f44 to
d857028
Compare
tenzen-y
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update.
Otherwise LGTM.
This KEP documents how we can add MultiKueue support for external custom job by using configmap and generic multikueue adaptor.
c630a98 to
135555c
Compare
tenzen-y
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
LGTM label has been added. Git tree hash: d5788a62018bed4e6778dddf116d7d296bad7f05
|
mimowo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/unhold
Thank you! Looking forward to the implementation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khrm, mimowo, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This KEP documents how we can add MultiKueue support for external custom job by using configmap and generic multikueue adaptor.
What type of PR is this?
/kind documentation
What this PR does / why we need it:
This introduces a new design for adding MultiKueue support for custom job.
Which issue(s) this PR fixes:
KEP for #2349
Special notes for your reviewer:
Does this PR introduce a user-facing change?