Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add argo integration based on pod integration #3897

Merged
merged 4 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions site/content/en/docs/tasks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,16 @@ batch user is a researcher, AI/ML engineer, data scientist, among others.

As a batch user, you can learn how to:
- [Run a Kueue managed batch/Job](run/jobs).
- [Run a Kueue managed Flux MiniCluster](run/flux_miniclusters).
- [Run a Kueue managed Kubeflow Job](run/kubeflow).
Kueue supports MPIJob v2beta1, PyTorchJob, TFJob, XGBoostJob and PaddleJob.
- [Run a Kueue managed KubeRay RayJob](run/rayjobs).
- [Run a Kueue managed KubeRay RayCluster](run/rayclusters).
- [Run a Kueue managed AppWrapper](run/appwrappers).
- [Submit Kueue jobs from Python](run/python_jobs).
- [Run a Kueue managed plain Pod](run/plain_pods).
- [Run a Kueue managed JobSet](run/jobsets).
- [Submit jobs to MultiKueue](run/multikueue).
- [Run external workloads](run/external_workloads).
Kueue allows one to use built-in integrations (such as Pods or Jobs) to run external workloads.

### Serving user

Expand All @@ -61,6 +61,7 @@ A _platform developer_ integrates Kueue with other software and/or contributes t

As a platform developer, you can learn how to:
- [Integrate a custom Job with Kueue](dev/integrate_a_custom_job).
- [Integrate a custom workload with Kueue using built-in frameworks](dev/external_frameworks).
- [Enable pprof endpoints](dev/enabling_pprof_endpoints).
- [Develop a custom AdmissionCheck Controller](dev/develop-acc).

Expand Down
10 changes: 10 additions & 0 deletions site/content/en/docs/tasks/dev/external_frameworks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: "External Frameworks"
weight: 8
date: 2025-01-17
description: >
How to run Kueue with external frameworks
---

See [external frameworks](/docs/tasks/run/external_workloads) for examples of using existing
integrations to integrate external frameworks.
19 changes: 19 additions & 0 deletions site/content/en/docs/tasks/run/external_workloads/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---

title: "Supporting External Frameworks"
linkTitle: "External Frameworks"
weight: 9
date: 2025-01-23
description: >
How to run Kueue with external frameworks
---

The tasks below show you how to build a custom integration.
You can use AppWrapper, job-based workloads and pod-based workloads.

### [AppWrapper](https://project-codeflare.github.io/appwrapper/) Integration
- [Run a custom workload using Appwrappers](/docs/tasks/run/external_workloads/wrapped_custom_workload).

### Integrations based on built-in frameworks
- [Run a Flux Miniclusters using job integration](/docs/tasks/run/external_workloads/flux_miniclusters).
- [Run an Argo Workflow using pod integration](/docs/tasks/run/external_workloads/pod_based_workloads/argo_workflow).
51 changes: 51 additions & 0 deletions site/content/en/docs/tasks/run/external_workloads/argo_workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: "Run An Argo Workflow"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add linkTitle: Argo Workflow to make the display consistent.

date: 2025-01-23
weight: 3
description: >
Integrate Kueue with Argo Workflows.
---

This page shows how to leverage Kueue's scheduling and resource management capabilities when running [Argo Workflows](https://argo-workflows.readthedocs.io/en/latest/).

This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).

Currently Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,
but you can take advantage of the ability for Kueue to [manage plain pods](/docs/tasks/run_plain_pods) to integrate them.

## Before you begin

1. Learn how to [install Kueue with a custom manager configuration](/docs/installation/#install-a-custom-configured-released-version).

2. Follow steps in [Run Plain Pods](/docs/tasks/run/plain_pods/#before-you-begin)
to learn how to enable and configure the `v1/pod` integration.

3. Install [Argo Workflows](https://argo-workflows.readthedocs.io/en/latest/installation/#installation)

## Workflow definition

### a. Targeting a single LocalQueue

If you want the entire workflow to target a single [local queue](/docs/concepts/local_queue),
it should be specified in the `spec.podMetadata` section of the Workflow configuration.

{{< include "examples/pod-based-workloads/workflow-single-queue.yaml" "yaml" >}}

### b. Targeting a different LocalQueue per template

If prefer to target a different [local queue](/docs/concepts/local_queue) for each step of your Workflow,
you can define the queue in the `spec.templates[].metadata` section of the Workflow configuration.

In this example `hello1` and `hello2a` will target `user-queue` and `hello2b` will
target `user-queue-2`.

{{< include "examples/pod-based-workloads/workflow-queue-per-template.yaml" "yaml" >}}

### c. Limitations

- Kueue will only manage pods created by Argo Workflows. It does not manage the Argo Workflows resources in any way.
- Each pod in a Workflow will create a new Workload resource and must wait for admission by Kueue.
- There is no way to ensure that a Workflow will complete before it is started. If one step of a multi-step Workflow does not have
available quota, Argo Workflows will run all previous steps and then wait for quota to become available.
- Kueue does not understand Argo Workflows `suspend` flag and will not manage it.
- Kueue does not manage `suspend`, `http`, or `resource` template types since they do not create pods.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Run A Flux MiniCluster"
linkTitle: "Flux MiniClusters"
date: 2022-02-14
weight: 6
weight: 2
description: >
Run a Kueue scheduled Flux MiniCluster.
---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Run A Wrapped Custom Workload"
linkTitle: "Custom Workload"
date: 2025-01-14
weight: 7
weight: 1
description: >
Use an AppWrapper to Run a Custom Workload on Kueue.
---
Expand Down
14 changes: 7 additions & 7 deletions site/static/_redirects
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@
/docs/tasks/enabling_pprof_endpoints /docs/tasks/dev/enabling_pprof_endpoints 301
/docs/tasks/integrate_a_custom_job /docs/tasks/dev/integrate_a_custom_job 301

/docs/tasks/run_flux_minicluster /docs/tasks/run/flux_miniclusters 301
/docs/tasks/run_jobs /docs/tasks/run/jobs 301
/docs/tasks/run_jobsets /docs/tasks/run/jobsets 301
/docs/tasks/run_kubeflow_jobs /docs/tasks/run/kubeflow 301
/docs/tasks/run_plain_pods /docs/tasks/run/plain_pods 301
/docs/tasks/run_rayclusters /docs/tasks/run/rayclusters 301
/docs/tasks/run_rayjobs /docs/tasks/run/rayjobs 301
/docs/tasks/run_jobs /docs/tasks/run/jobs 301
/docs/tasks/run_jobsets /docs/tasks/run/jobsets 301
/docs/tasks/run_kubeflow_jobs /docs/tasks/run/kubeflow 301
/docs/tasks/run_plain_pods /docs/tasks/run/plain_pods 301
/docs/tasks/run_rayclusters /docs/tasks/run/rayclusters 301
/docs/tasks/run_rayjobs /docs/tasks/run/rayjobs 301

/docs/tasks/run_kubeflow_jobs/run_mpijobs /docs/tasks/run/kubeflow/mpijobs 301
/docs/tasks/run_kubeflow_jobs/run_paddlejobs /docs/tasks/run/kubeflow/paddlejobs 301
/docs/tasks/run_kubeflow_jobs/run_pytorchjobs /docs/tasks/run/kubeflow/pytorchjobs 301
/docs/tasks/run_kubeflow_jobs/run_tfjobs /docs/tasks/run/kubeflow/tfjobs 301
/docs/tasks/run_kubeflow_jobs/run_xgboostjobs /docs/tasks/run/kubeflow/xgboostjobs 301

Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: steps-
spec:
entrypoint: hello-hello-hello

templates:
- name: hello-hello-hello
steps:
- - name: hello1 # hello1 is run before the following steps
template: whalesay
arguments:
parameters:
- name: message
value: "hello1"
- - name: hello2a # double dash => run after previous step
template: whalesay
arguments:
parameters:
- name: message
value: "hello2a"
- name: hello2b # single dash => run in parallel with previous step
template: whalesay-queue-2
arguments:
parameters:
- name: message
value: "hello2b"

- name: whalesay
metadata:
labels:
kueue.x-k8s.io/queue-name: user-queue # Pods from this template will target user-queue
inputs:
parameters:
- name: message
container:
image: docker/whalesay
command: [cowsay]
args: ["{{inputs.parameters.message}}"]

- name: whalesay-queue-2
metadata:
labels:
kueue.x-k8s.io/queue-name: user-queue-2 # Pods from this template will target user-queue-2
inputs:
parameters:
- name: message
container:
image: docker/whalesay
command: [cowsay]
args: ["{{inputs.parameters.message}}"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
podMetadata:
labels:
kueue.x-k8s.io/queue-name: user-queue # All pods will target user-queue
templates:
- name: whalesay
container:
image: docker/whalesay
command: [ cowsay ]
args: [ "hello world" ]
resources:
limits:
memory: 32Mi
cpu: 100m