Missing service definition for metrics/prometheus endpoint #121

samos123 · 2023-02-11T01:28:28Z

Prometheus by default is exposed on port 2112 when env.PROMETHEUS_MONITORING_ENABLED: true. However the weaviate service definition has a hardcoded targetport of 8080. So either the weaviate service should also expose the prometheus service or we should create a new service definition for prometheus.

The text was updated successfully, but these errors were encountered:

samos123 · 2023-02-11T06:01:26Z

actually it makes sense the way it is, because each weaviate replica will have it's own metric endpoitn accessible under weaviate-$X:2112/metrics . I will close the bug

jfcmartins · 2023-12-06T18:17:50Z

Hey @samos123 👋 I came across the same issue and I don't understand when you say that makes sense the way it is. The targetPort is hardcoded to 8080, so the metrics will not be available for Prometheus to scrape in the Weaviate Service.

What solves this issue is to change the Service ports spec to this:

spec:
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 8080
    - name: metrics
      protocol: TCP
      port: 2112
      targetPort: 2112

Am I missing something?

samos123 · 2023-12-07T03:15:02Z

Each Weaviate instance (when replicas is more than 1) needs to have it's own accessible metrics endpoint. The Weaviate replicas will each expose the metrics on 2112. So you should access the metrics for your first replica using curl http://weaviate-0:2112/metrics

Give it a try. It should work.

jfcmartins · 2023-12-11T13:53:36Z

Each Weaviate instance (when replicas is more than 1) needs to have it's own accessible metrics endpoint. The Weaviate replicas will each expose the metrics on 2112. So you should access the metrics for your first replica using curl http://weaviate-0:2112/metrics

Give it a try. It should work.

The issue lies not in the availability of the metrics endpoint within the Pods but rather in the capability of the corresponding Service to expose the metrics generated by these Pods. To address this, the Service needs to be configured to expose two ports, as demonstrated in the example I provided in the initial comment.

samos123 · 2023-12-12T06:38:23Z

My view is that the metrics endpoint should be removed from the Service definition. That should remove the confusion that you and I both shared around thinking that we should use the Service endpoint to query metrics. Note you should not use the service endpoint, you should use the metric endpoint of each individual pod.

I'm clearly not doing a good job in explaining here either. Maybe a question would help: If you have 3 replicas and you query metrics using the Service endpoint. How would I know whether the metrics are from replica 1, replica 2 or replica 3?

jfcmartins · 2023-12-12T21:00:48Z

If you have 3 replicas and you query metrics using the Service endpoint. How would I know whether the metrics are from replica 1, replica 2 or replica 3?

To distinguish metrics from different replicas when querying through the Service endpoint, we would use Prometheus labels.

Example:

weaviate_requests_total{instance="weaviate-0", method="GET"}  100
weaviate_requests_total{instance="weaviate-1", method="GET"}  150
weaviate_requests_total{instance="weaviate-2", method="GET"}  120

In this example, the instance label is used to specify the unique identifier for each Weaviate replica. These metrics are typically collected at the Service level rather than directly from individual pods in the Prometheus Stack that is currently in use.

samos123 · 2023-12-15T18:01:20Z

I guess that does make sense as well if you have 3 replicas. What if you have 1000 replicas?

Now you pull metrics from the service endpoint every 10 seconds. So every 10 seconds only 1 of the 1000 replicas will have it's metrics pulled into prometheus. That would be problematic. What you would want instead is that every 10 seconds, all 1000 replicas have their metrics endpoint pulled. If you use the service endpoint it would do round robin load balancing and you will probably miss out on metrics if you have 1000 replicas.

jfcmartins · 2023-12-19T20:10:03Z

I guess that does make sense as well if you have 3 replicas. What if you have 1000 replicas?

Now you pull metrics from the service endpoint every 10 seconds. So every 10 seconds only 1 of the 1000 replicas will have it's metrics pulled into prometheus. That would be problematic. What you would want instead is that every 10 seconds, all 1000 replicas have their metrics endpoint pulled. If you use the service endpoint it would do round-robin load balancing and you will probably miss out on metrics if you have 1000 replicas.

You're correct. In a scenario with 1000 replicas, relying on the service endpoint alone for metric collection might lead to missed data due to round-robin load balancing. To ensure that metrics are collected from all replicas within the 10-second window, a more scalable approach would be needed. One solution in that case is setup the Prometheus server to scrapes metrics from each replica as you mentioned.

Anyway, I've added support for this feature in the PR #197.

samos123 closed this as completed Feb 11, 2023

jfcmartins mentioned this issue Dec 11, 2023

Add ability to template ServiceMonitor labels and add metrics port to Service (#192 & #121) #197

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing service definition for metrics/prometheus endpoint #121

Missing service definition for metrics/prometheus endpoint #121

samos123 commented Feb 11, 2023

samos123 commented Feb 11, 2023

jfcmartins commented Dec 6, 2023

samos123 commented Dec 7, 2023

jfcmartins commented Dec 11, 2023

samos123 commented Dec 12, 2023 •

edited

Loading

jfcmartins commented Dec 12, 2023

samos123 commented Dec 15, 2023

jfcmartins commented Dec 19, 2023

Missing service definition for metrics/prometheus endpoint #121

Missing service definition for metrics/prometheus endpoint #121

Comments

samos123 commented Feb 11, 2023

samos123 commented Feb 11, 2023

jfcmartins commented Dec 6, 2023

samos123 commented Dec 7, 2023

jfcmartins commented Dec 11, 2023

samos123 commented Dec 12, 2023 • edited Loading

jfcmartins commented Dec 12, 2023

samos123 commented Dec 15, 2023

jfcmartins commented Dec 19, 2023

samos123 commented Dec 12, 2023 •

edited

Loading