-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing service definition for metrics/prometheus endpoint #121
Comments
actually it makes sense the way it is, because each weaviate replica will have it's own metric endpoitn accessible under weaviate-$X:2112/metrics . I will close the bug |
Hey @samos123 👋 I came across the same issue and I don't understand when you say that makes sense the way it is. The What solves this issue is to change the Service ports spec to this: spec:
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080
- name: metrics
protocol: TCP
port: 2112
targetPort: 2112 Am I missing something? |
Each Weaviate instance (when replicas is more than 1) needs to have it's own accessible metrics endpoint. The Weaviate replicas will each expose the metrics on 2112. So you should access the metrics for your first replica using Give it a try. It should work. |
The issue lies not in the availability of the metrics endpoint within the |
My view is that the metrics endpoint should be removed from the Service definition. That should remove the confusion that you and I both shared around thinking that we should use the Service endpoint to query metrics. Note you should not use the service endpoint, you should use the metric endpoint of each individual pod. I'm clearly not doing a good job in explaining here either. Maybe a question would help: If you have 3 replicas and you query metrics using the Service endpoint. How would I know whether the metrics are from replica 1, replica 2 or replica 3? |
To distinguish metrics from different replicas when querying through the Service endpoint, we would use Prometheus labels. Example: weaviate_requests_total{instance="weaviate-0", method="GET"} 100
weaviate_requests_total{instance="weaviate-1", method="GET"} 150
weaviate_requests_total{instance="weaviate-2", method="GET"} 120 In this example, the instance label is used to specify the unique identifier for each Weaviate replica. These metrics are typically collected at the Service level rather than directly from individual pods in the Prometheus Stack that is currently in use. |
I guess that does make sense as well if you have 3 replicas. What if you have 1000 replicas? Now you pull metrics from the service endpoint every 10 seconds. So every 10 seconds only 1 of the 1000 replicas will have it's metrics pulled into prometheus. That would be problematic. What you would want instead is that every 10 seconds, all 1000 replicas have their metrics endpoint pulled. If you use the service endpoint it would do round robin load balancing and you will probably miss out on metrics if you have 1000 replicas. |
You're correct. In a scenario with 1000 replicas, relying on the service endpoint alone for metric collection might lead to missed data due to round-robin load balancing. To ensure that metrics are collected from all replicas within the 10-second window, a more scalable approach would be needed. One solution in that case is setup the Prometheus server to scrapes metrics from each replica as you mentioned. Anyway, I've added support for this feature in the PR #197. |
Prometheus by default is exposed on port 2112 when
env.PROMETHEUS_MONITORING_ENABLED: true
. However the weaviate service definition has a hardcoded targetport of 8080. So either the weaviate service should also expose the prometheus service or we should create a new service definition for prometheus.The text was updated successfully, but these errors were encountered: