-
Notifications
You must be signed in to change notification settings - Fork 545
[Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking #3171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/assign @troychiu |
I'll be working on this together. |
I use v1.2.2, Is this version adding kuberay metric reporting?And I found no helm chart for metric export, Will version 1.4.0 be added? Thanks |
Hi @dushulin,
The feature is only supported in version 1.4.0 and above.
This is mainly focused on the metrics for the Operator, so by default, these metrics will be exposed internally on the If you're using the Prometheus Operator, you can use We are also planning to add a flag that will allow users to enable or disable the metrics feature as needed. |
@win5923 Thanks for the reply, I'm recently supplementing the podMonitor and serviceMonitor of kuberay, and I want to contribute this part of the code if the community needs it. |
Close this issue because all metrics for 1.4.0 have already been done. |
Uh oh!
There was an error while loading. Please reload this page.
Description
https://docs.google.com/document/d/1zNiE7lVZYjhrxlTbh1UXOVpR6hh1GIeSfCfE9Lt5v6Y/edit?tab=t.0
Context
In production, SRE teams typically define Service Level Indicators (SLIs) to ensure that services meet expected performance and reliability standards. However, there are currently no dedicated SLIs for Ray Cluster, Ray Service, and Ray Job, which makes it challenging to monitor their health and performance.
Solution
We propose new metrics to enhance KubeRay's observability and providing better insights into the status and performance of Ray Cluster, Ray Service, and Ray Job.
sub-issues
The text was updated successfully, but these errors were encountered: