Skip to content

Conversation

@khrm
Copy link
Contributor

@khrm khrm commented Jul 15, 2025

This KEP documents how we can add MultiKueue support for external custom job by using configmap and generic multikueue adaptor.

What type of PR is this?

/kind documentation

What this PR does / why we need it:

This introduces a new design for adding MultiKueue support for custom job.

Which issue(s) this PR fixes:

KEP for #2349

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. labels Jul 15, 2025
@netlify
Copy link

netlify bot commented Jul 15, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 135555c
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68b5de1008625a0008059dba

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 15, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @khrm!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @khrm. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2025
Copy link
Contributor Author

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @tenzen-y @mimowo

@khrm khrm force-pushed the f/kep-externaljob-multikueue branch 2 times, most recently from 8a01afb to 6ce068d Compare July 15, 2025 13:52
@mimowo
Copy link
Contributor

mimowo commented Jul 16, 2025

/ok-to-test
I will return to reviewing this after 0.13 is out.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 16, 2025
@khrm khrm force-pushed the f/kep-externaljob-multikueue branch from 6ce068d to 01edd11 Compare July 16, 2025 12:08
kind: "PipelineRun"
multiKueueAdapter:
managedBy: ".spec.managedBy"
creationPatches:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this feels like something that should live with the external controller. This sounds like you are asking Kueue to create a mutating webhook for your controller.

Why can't you add this logic into tekton-kueue?

Copy link
Contributor Author

@khrm khrm Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we want to remove something from manager cluster before copying to worker cluster, how will external controller handle this? The external controller itself shouldn't require the secret or mechanism to copy from manager to worker. Same during the sync from worker to manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for joining the party late. I feel the same - It looks like this is the responsibility of external controller. I am wondering what is wrong with external controller handling it? Is the cognitive load on plugin developer a concern. I felt like the user should be able to submit to the custom type -> have the logic to translate it a workload object -> Submit the workload to the management cluster -> Get the name of the cluster suggested by Multikueue -> Create and Sync the workload there. There are some workflow changes that can be tweaked but I'm wondering what is wrong with putting this burden on plugin developer and making it Multi-Kueue just a workload placement engine - similar to core k8s scheduler.

Copy link
Contributor Author

@khrm khrm Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's still the responsibility of external controller to generate workload.

The only thing we have exposed or made configurable is just ability of MultiKueue adaptor to sync resource and status between clusters.

The issue isn't the code. The issue is that we will need to run a sync controller of workload in the same namespace and give it access to the same secret which MultiKueue has access to. And there's no benefit to doing all this. It's just a sync behaviour which is common for all CRs.

Of course, if someone want to create an external controller, there's nothing stopping from doing that.

It made sense to have an external controller for local workload creation due to the complexity of lifecycle involved, but I don't see that in case of MultiKueue sync.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's still the responsibility of external controller to generate workload.

You mean the kueue workload CR? I thought you are a registered type, you cannot?

The issue is that we will need to run a sync controller of workload in the same namespace and give it access to the same secret which MultiKueue has access to. And there's no benefit to doing all this. It's just a sync behaviour which is common for all CRs.

What I am suggesting is much simpler, you can create your own controller in another namespace and have RBAC specific to CRD to do syncing for workload clusters, no need to rely on secret present in multi-kueue namespace. This way, it truly makes sure that Multi-kueue is just deciding on the cluster where the workload should run and rest of the work is done by controller giving us clear separation of clusters.(This also means we might have to change Multi-Adapter interface without SyncJob method).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravisantoshgudimetla As I described in https://kubernetes.slack.com/archives/C032ZE66A2X/p1755181736825669, this KEP aims to delegate only Workload resource creation capability. In other words, the kueue-controller-manager is still responsible to synchronize Workload between manager and worker clusters.

But, as I mentioned in #5981 (comment), I would like to make an effort to generalize the Workload synchronization, but it should be handled as a separate KEP (enhancement). So, if you are interested in that, I'm happy to review your proposal.

- group: "pipelinerun.tekton.dev"
version: "v1"
kind: "PipelineRun"
multiKueueAdapter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the idea of putting this information into the kueue configuration. In other KEPs we explored the idea of singleton CRs. We could explore a CRD for this.

I'm still not sure on this design.

Copy link
Contributor Author

@khrm khrm Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about a CRD too, but configmap could also suffice.

Copy link
Contributor

@mimowo mimowo Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the CRD paradigm too, but until we have it, I'm ok in the configMap.

@kannon92
Copy link
Contributor

kannon92 commented Jul 18, 2025

Maybe @mszadkow or @mwielgus could be a good reviewer here?

@khrm
Copy link
Contributor Author

khrm commented Jul 21, 2025

Instead of just a Jsonpatch, we should use CEL format like we have for mutating-admission-policy

I will modify KEP with that approach.

@kannon92
Copy link
Contributor

Instead of just a Jsonpatch, we should use CEL format like we have for mutating-admission-policy

I will modify KEP with that approach.

I’d be careful with MAP. Are you interested in supporting this feature on older K8s clusters? I think MAP will be beta in 1.34 which is not out yet.

@gbenhaim
Copy link

I would prefer an alternative approach for supporting external frameworks which is similar to how the jobframework works. In short, I think that external frameworks should run their own multikueue admission check and Kueue should provide them the framework to write it. I've explained this approach in #6117

@khrm
Copy link
Contributor Author

khrm commented Jul 21, 2025

@kannon92 I don't mean supporting MAP. I meant changing our design to how MAP works. So for this KEP, changing it to use CEL instead of only Jsonpatch.

@khrm khrm force-pushed the f/kep-externaljob-multikueue branch from 01edd11 to 9a327d0 Compare July 21, 2025 17:10
@khrm khrm force-pushed the f/kep-externaljob-multikueue branch from cee2cac to bd17f44 Compare September 1, 2025 14:01
Comment on lines 97 to 101
// Controller specifies which controller should manage this custom job.
// Can be "generic" or "external".
// Defaults to "generic".
// +optional
Controller ControllerType `json:"controller,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest dropping this for now, we can introduce when proven this is needed.

@mimowo
Copy link
Contributor

mimowo commented Sep 1, 2025

Let me summarize my view, and reasoning why I like this approach.

First, over time three major alternatives are proposed:

  1. Support creating separate MultiKueue-like ACs by users and managing them KEP
  2. Declarative API indicating how to create and sync Jobs (this KEP)
  3. De-couple AC management and Job management and delegate Job creation and syncing to external controller

I think (1.) is clearly inferior to (2.) and (3.) as it requires opening API to external MultiKueue controllers which is very committal, and the most complex. Writing ACs is complex, and MK AC is particularly complex, so I really want users to avoid implementing them.

The downsides of (3.):
a) still API is needed to skip handling of Jobs by the built-in MultiKueue admission check of unknown Job CRDs. This would need to be a list of GVK which are externally handled.
b) it requires huge effort on the developers of the external controllers to read the secret and maintain connections with all the worker clusters. I think it would remain a legitimate feature request to support (2.) for the ease of implementing MK support even if we succeeded with (3.).
c) connecting with external clusters may likely result in errors which need to be reported by API, and the best place would be workload.admissionChecks[*] which is reserved to ACs, not external Job controllers.

I think the most likely outcome of (1.) and (3.) due to complexity is users forking or copy/pasting the Kueue's implementation and hacking on top of it. Design which essentially requires users to copy-paste Kueue code is far from ideal.

A valid concern raised by @tenzen-y about this proposal is that it requires commitment to the API to be maintained long-term. However:

  • in the minimal form the API is really small (just the list of GVKs for which the mechanism is enabled). I think this is also unavoidable if we go with (3.).
  • it is not blocking the external solution if we need to ever. For example, as the KEP proposes we could have ControllerType field indicating "External" if we ever need to have it.
  • long term does not need to be forever - there is deprecation guideline in k8s which allows to deprecate and remove beta features

Yes, in the extended form the API is a more complex list of patches resembling MAP, but we don't yet have an indication it needs to be extended (AFAIK). Yes, Pods would not be supported by managedBy, but we can support them by a dedicated adapter. I expect most external jobs to be simple managedBy + status sync.

.....
// ExternalFrameworks lists the external frameworks to be supported by MultiKueue.
// +optional
ExternalFrameworks []ExternalFramework `json:"externalFrameworks,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ExternalFrameworks []ExternalFramework `json:"externalFrameworks,omitempty"`
ExternalFrameworks []string `json:"externalFrameworks,omitempty"`

As I mentioned in #5981 (comment), we can extract gvk from Kind.version.group.com, which is a standardized form.

Copy link
Member

@tenzen-y tenzen-y Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in https://kubernetes.slack.com/archives/C032ZE66A2X/p1755181736825669, we decided to introduce ExternalFramework type, which has only a name field. In the future, we might introduce controllerType. But, it is not introduced in the alpha phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on slack, we are conservative about the API, but some degree of extensibility might be needed in the future. Some ideas:

  • allows to specifiy controllerType: Generic / External to discriminate between the mechanisms
  • position of the managedBy (but we should be careful about it and not plan for Alpha)
  • syncPatches to customize the synchronization (but we should be careful about it and not plan for Alpha)

So, we propose to expose just

type MultiKueueExternalFramework struct {
	Name string `json:"name"`
}

@tenzen-y
Copy link
Member

tenzen-y commented Sep 1, 2025

I think (1.) is clearly inferior to (2.) and (3.) as it requires opening API to external MultiKueue controllers which is very committal, and the most complex. Writing ACs is complex, and MK AC is particularly complex, so I really want users to avoid implementing them.

I also want external-controller developers to rely on the kueue-controller-manager AC approval mechanism. So, I am thinking that we should establish an interface to make those mechanisms easy for the external. That's my proposal and don't indicate that external developers implement (or copy/paste) AC approval mechanism. But, if we can introduce only externalFrameworks to realize the possibilities for the external Jobs. That's also acceptable. If we want more flexibility (Patch / ControllerName) as proposed in this KEP, we should consider establishing interface as opposed to expose some of APIs.

As a summary, in the short-term solution, we can introduce ONLY .multiKueue.externalFrameworks to make the duration of feature shipment short.
In the long term, or for more flexibility, we should establish the AC approval mechanism and expose it as an interface to external controllers.

@mimowo
Copy link
Contributor

mimowo commented Sep 1, 2025

I also want external-controller developers to rely on the kueue-controller-manager AC approval mechanism.

Me too, this KEP is achieving this actually.

So, I am thinking that we should establish an interface to make those mechanisms easy for the external. That's my proposal and don't indicate that external developers implement (or copy/paste) AC approval mechanism.

I'm not clear how to achieve it other than in the KEP. I expect this would require exposing huge amount of code as a function at compile time. I think it will be really hard to ensure the interface does not break users in the future.

If we want more flexibility (Patch / ControllerName) as proposed in this KEP, we should consider establishing interface as opposed to expose some of APIs.

The KEP didn't mention ControllerName but ControllerType. The idea of controllerType is to discriminate between "External" and "Generic" in the future if we want to support both (2.) and (3.) long term. To me, I hope (2.) would be enough long term actually.

EDIT: I don't see use cases for the patches yet, but they may arise. Also, we may need to be able to specify the location of the managedBy field. In Kubeflow 1.x it is under spec.runPolicy.managedBy. Since Kubeflow 1.x has a built-in integration we don't need the API field for now, but it shows that some flexibility might be needed.


The `controller` field is optional and can have two values:
- `generic` (default): Use the built-in generic MultiKueue adapter. This adapter performs a default set of operations to manage the job on a worker cluster. It assumes a `.spec.managedBy` field in the custom job and will handle removing it before creation on the worker, as well as syncing the `/status` field back from the worker to the management cluster.
- `external`: Expect an external, multi-cluster aware controller to manage the job. In this mode, MultiKueue will only sync the `Workload` object to the worker cluster. The external controller is responsible for creating the job on the worker and syncing its status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to be implemented as Alpha for the KEP, so we can drop and move this part to the Alternatives section as a potential extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I would move this to alternative.

@tenzen-y
Copy link
Member

tenzen-y commented Sep 1, 2025

I'm not clear how to achieve it other than in the KEP. I expect this would require exposing huge amount of code as a function at compile time. I think it will be really hard to ensure the interface does not break users in the future.

We can break the interface anytime, even with a significant break. That's why I want to select the interface rather than exposed API.

The KEP didn't mention ControllerName but ControllerType. The idea of controllerType is to discriminate between "External" and "Generic" in the future if we want to support both (2.) and (3.) long term. To me, I hope (2.) would be enough long term actually.

EDIT: I don't see use cases for the patches yet, but they may arise. Also, we may need to be able to specify the location of the managedBy field. In Kubeflow 1.x it is under spec.runPolicy.managedBy. Since Kubeflow 1.x has a built-in integration we don't need the API field for now, but it shows that some flexibility might be needed.

Yes, controllerType is what I wanted to mention. Rather than exposing the placement for managedBy, I would like to select (3). But, I agree that (3) needs significant effort. So, I would like to scrape the use cases by basic line proposed in this KEP. Which step should be customizable, which step should not be customizable.

@khrm khrm force-pushed the f/kep-externaljob-multikueue branch from bd17f44 to d857028 Compare September 1, 2025 17:36
@khrm
Copy link
Contributor Author

khrm commented Sep 1, 2025

@tenzen-y @mimowo I updated the KEP based on the slack discussion.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for update.
Otherwise LGTM.

This KEP documents how we can add MultiKueue support for external custom job
by using configmap and generic multikueue adaptor.
@khrm khrm force-pushed the f/kep-externaljob-multikueue branch from c630a98 to 135555c Compare September 1, 2025 17:55
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khrm LGTM, Thank you!
/approve
/lgtm

/hold for @mimowo

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 1, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d5788a62018bed4e6778dddf116d7d296bad7f05

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2025
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/unhold
Thank you! Looking forward to the implementation

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khrm, mimowo, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3c3266b into kubernetes-sigs:main Sep 2, 2025
8 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.14 milestone Sep 2, 2025
@github-project-automation github-project-automation bot moved this from Design in review to Done in Kueue Release Tracking Sep 2, 2025
@khrm khrm deleted the f/kep-externaljob-multikueue branch September 2, 2025 10:13
@khrm khrm restored the f/kep-externaljob-multikueue branch September 2, 2025 10:26
@khrm khrm deleted the f/kep-externaljob-multikueue branch September 2, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants