Skip to content

[Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking #3171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
9 tasks done
win5923 opened this issue Mar 10, 2025 · 6 comments
Closed
9 tasks done

[Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking #3171

win5923 opened this issue Mar 10, 2025 · 6 comments
Assignees
Labels
1.4.0 enhancement New feature or request

Comments

@win5923
Copy link
Contributor

win5923 commented Mar 10, 2025

Description

https://docs.google.com/document/d/1zNiE7lVZYjhrxlTbh1UXOVpR6hh1GIeSfCfE9Lt5v6Y/edit?tab=t.0

Context

In production, SRE teams typically define Service Level Indicators (SLIs) to ensure that services meet expected performance and reliability standards. However, there are currently no dedicated SLIs for Ray Cluster, Ray Service, and Ray Job, which makes it challenging to monitor their health and performance.

Solution

We propose new metrics to enhance KubeRay's observability and providing better insights into the status and performance of Ray Cluster, Ray Service, and Ray Job.

sub-issues

@win5923 win5923 added enhancement New feature or request triage labels Mar 10, 2025
@win5923 win5923 changed the title [Epic][Feature] KubeRay v1.4.0 - SLI Metrics Tracking [Epic][Feature] KubeRay v1.4.0 - Operator SLI Metrics Tracking Mar 10, 2025
@kevin85421 kevin85421 added 1.4.0 and removed triage labels Mar 10, 2025
@kevin85421
Copy link
Member

/assign @troychiu

@troychiu
Copy link
Contributor

I'll be working on this together.

@win5923 win5923 changed the title [Epic][Feature] KubeRay v1.4.0 - Operator SLI Metrics Tracking [Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking Mar 11, 2025
@dushulin
Copy link
Contributor

I use v1.2.2, Is this version adding kuberay metric reporting?And I found no helm chart for metric export, Will version 1.4.0 be added? Thanks

@win5923
Copy link
Contributor Author

win5923 commented Mar 18, 2025

Hi @dushulin,

I use v1.2.2, Is this version adding kuberay metric reporting?

The feature is only supported in version 1.4.0 and above.

And I found no helm chart for metric export, Will version 1.4.0 be added? Thanks

This is mainly focused on the metrics for the Operator, so by default, these metrics will be exposed internally on the kuberay-operator pod at :8080/metrics.

If you're using the Prometheus Operator, you can use PodMonitor or ServiceMonitor to export the metrics to Prometheus.

We are also planning to add a flag that will allow users to enable or disable the metrics feature as needed.

@dushulin
Copy link
Contributor

@win5923 Thanks for the reply, I'm recently supplementing the podMonitor and serviceMonitor of kuberay, and I want to contribute this part of the code if the community needs it.

@kevin85421
Copy link
Member

Close this issue because all metrics for 1.4.0 have already been done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.4.0 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants