Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add argo integration based on pod integration #3897

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kannon92
Copy link
Contributor

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Add an integration with Kueue and Argo Workflows

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Credit goes to sam-leitch-oxb.
Nice work!

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 20, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 20, 2024
Copy link

netlify bot commented Dec 20, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 30558fc
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/679931db4dc18000087cbc54
😎 Deploy Preview https://deploy-preview-3897--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@trasc
Copy link
Contributor

trasc commented Jan 14, 2025

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from trasc January 14, 2025 11:55
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs!

integrations:
frameworks:
- "pod"
podOptions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of podOptions we are migrating to use managedJobsNamespaceSelector as a more generic solution, also for plain pods. Could you update as currently in https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/

cc @dgrove-oss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just link to that section.

it should be specified in the `spec.podMetadata` section of the Workflow configuration.

```yaml
apiVersion: argoproj.io/v1alpha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please more the examples to https://github.com/kubernetes-sigs/kueue/tree/main/site/static/examples/jobs for consistency as a single source of examples easy to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

container:
image: docker/whalesay
command: [cowsay]
args: ["{{inputs.parameters.message}}"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be expanded to make the example runnable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is runnable.


This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).

As of the writing of this doc, Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of the writing of this doc, Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,
Currently Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,

nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

As of the writing of this doc, Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,
but you can take advantage of the ability for Kueue to [manage plain pods](/docs/tasks/run_plain_pods) to integrate them.

1. By default, the integration for `v1/pod` is not enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap the step in "Before you begin" for consistency, example: https://kueue.sigs.k8s.io/docs/tasks/run/jobsets/. And please add a point about installing the Argo Workflow, along with a link indicating how to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -44,6 +44,7 @@ As a batch user, you can learn how to:
- [Run a Kueue managed plain Pod](run/plain_pods).
- [Run a Kueue managed JobSet](run/jobsets).
- [Submit jobs to MultiKueue](run/multikueue).
- [Run an argo workflow](run/argo_workload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. We also have a similar way of support for the Notebook integration: #3878. I think we could have a dedicated section for them like "Run workloads with Pods managed by Kueue", with this doc being the first of this kind, but Notebook a potential follow up. WDYT @kannon92 @varshaprasad96 @tenzen-y ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have #3898. So I think this is a good idea to document examples that use pod integration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, I missed that, I would put them together in one sub-folder. We can maybe first merge this one, then align the other.

@kannon92 kannon92 requested a review from mimowo January 17, 2025 18:30
@kannon92
Copy link
Contributor Author

@mimowo PTAL

@mimowo
Copy link
Contributor

mimowo commented Jan 23, 2025

/approve
/hold
LGTM overall, but would like to get a pass from @PBundyra who is recently looking at the ways to improve our documentation.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kannon92, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2025
@@ -0,0 +1,10 @@
---
title: "Pod Based Workloads"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could change the title to: "External integrations", or something similar? Pod Based Workloads can be a little misleading, since we already have Plain Pods section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is worth thinking about a good name, but we already have a concept of "external integrations" / custom jobs: https://kueue.sigs.k8s.io/docs/tasks/dev/integrate_a_custom_job/
So, it could be misleading too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, now thinking more about this, Plain Pods is one of the mechanisms for supporting external frameworks, as suggested here: #3897 (review).

So, I'm ok with the more generic section like "External integrations", then it should also probably group https://kueue.sigs.k8s.io/docs/tasks/run/flux_miniclusters/ and https://kueue.sigs.k8s.io/docs/tasks/run/wrapped_custom_workload/, and refactor the structure. WDYT @PBundyra @kannon92 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

@PBundyra
Copy link
Contributor

PBundyra commented Jan 23, 2025

Other than the one comment I left, LGTM! Feel free to unhold your PR after addressing it.
Thanks!
/lgtm

@PBundyra
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c034e48c52ccc8eb568493ca2b67a9b23abdb2fe

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, please additionally extend the list of options in https://kueue.sigs.k8s.io/docs/tasks/dev/integrate_a_custom_job/ by adding a point like "Leverage the Plain Pods integration", linking to the new tasks page.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@k8s-ci-robot k8s-ci-robot requested a review from PBundyra January 23, 2025 15:20
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@kannon92
Copy link
Contributor Author

/hold

Addressing #3897 (comment).

Need to confirm that this worked.

@kannon92
Copy link
Contributor Author

@PBundyra @mimowo PTAL when you get a chance.

I moved around the files as you suggested.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would like to consider the following sturcture:

Run Workloads:
- Kubernetes Jobs
- Kubernetes CronJobs
- AppWrappers
- RayClusters
- RayJobs
- Deployment
- StatefulSet
- Plain Pods
- Kubeflow Jobs
- Python
- Jobsets
- External Frameworks
  * Using AppWrapper (renamed from Custom Workload)
  * Flux MiniClusters
  * Argo Workflows 
- Multi-Cluster

I don't think we need more structure under "External frameworks" at this point, because there are only three items underneath, and extra layers mean more clicking to expand.

@@ -63,6 +63,7 @@ As a platform developer, you can learn how to:
- [Integrate a custom Job with Kueue](dev/integrate_a_custom_job).
- [Enable pprof endpoints](dev/enabling_pprof_endpoints).
- [Develop a custom AdmissionCheck Controller](dev/develop-acc).
- [Integration a custom workload with existing frameworks](dev/external_frameworks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Integration a custom workload with existing frameworks](dev/external_frameworks)
- [Integrate a custom workload with Kueue using built-in frameworks](dev/external_frameworks)

Also, move it up just below "Integrate a custom Job with Kueue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

site/content/en/docs/tasks/_index.md Outdated Show resolved Hide resolved
- [Submit Kueue jobs from Python](run/python_jobs).
- [Run a Kueue managed plain Pod](run/plain_pods).
- [Run a Kueue managed JobSet](run/jobsets).
- [Submit jobs to MultiKueue](run/multikueue).
- [Support external workloads](run/external_workloads).
Kueue allows one to use existing integrations to support external workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kueue allows one to use existing integrations to support external workloads.
Kueue allows one to use built-in integrations (such as Pods or Jobs) to run external workloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

### Job-based Workloads Integration
- [Run a Flux Miniclusters using job integration](/docs/tasks/run/external_workloads/flux_miniclusters).

### Pod-Based Workloads Integration
- [Run an Argo Workflow using pod integration](/docs/tasks/run/external_workloads/pod_based_workloads/argo_workflow).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to combine the two, because conceptually this is the same mechanism:

Suggested change
### Job-based Workloads Integration
- [Run a Flux Miniclusters using job integration](/docs/tasks/run/external_workloads/flux_miniclusters).
### Pod-Based Workloads Integration
- [Run an Argo Workflow using pod integration](/docs/tasks/run/external_workloads/pod_based_workloads/argo_workflow).
### Integrations based on built-in frameworks
- [Run a Flux Miniclusters using job integration](/docs/tasks/run/external_workloads/flux_miniclusters).
- [Run an Argo Workflow using pod integration](/docs/tasks/run/external_workloads/pod_based_workloads/argo_workflow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -1,6 +1,5 @@
---
title: "Run An AppWrapper"
linkTitle: "AppWrappers"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to remove. Let's also keep AppWrapper where it was as it is a built-in framework currently. We have another page for showing how to use AppWrapper for custom Jobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to remove. Let's also keep AppWrapper where it was as it is a built-in framework currently. We have another page for showing how to use AppWrapper for custom Jobs.

I think this makes sense.

Once Kueue 0.11 comes out, we plan to simplify the AppWrapper controller, remove the code that is no longer needed since it is now a built-in framework, and make a new release of AppWrapper. As part of doing that, I will revise the Kueue documention and make sure it refers to the right historic pieces of external-framework code in the AppWrapper 1.0 git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/documentation Categorizes issue or PR as related to documentation. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants