-
Notifications
You must be signed in to change notification settings - Fork 85
OpenCensus metric export fail sometimes #474
Comments
/kind bug |
/assign @icco |
Yup we've been looking into this. Current theory is it's a big in open census. We have verified that we aren't losing any data, as uploads are retired. Why data is getting sent more than once a minute, we have no idea. |
https://github.com/census-instrumentation/opencensus-go/blob/v0.22.4/stats/view/worker.go#L117 Looks like we did not set reporting period and are in fact reporting every 10 seconds. |
Without this, the default reporting period of 10 seconds will be used[1]. This is likely the cause of [2]. [1]: https://github.com/census-instrumentation/opencensus-go/blob/v0.22.4/stats/view/worker.go#L117 [2]: google/exposure-notifications-verification-server#474
Hmm #474 (comment) is incorrect. I was looking at an old commit and we do set reporting period to 2min right now. |
variable. This allows me to fine-tuning these values and see if adjusting the value can eliminate google/exposure-notifications-verification-server#474 The value used should be backward compatible. The value for BundleDelayThreshold/BundleCountThreshold is from https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/db101e30979316cca594a74b3181a9a3b6094086/trace.go#L72-L81
* Making stackdriver exporter options configurable in environment variable. This allows me to fine-tuning these values and see if adjusting the value can eliminate google/exposure-notifications-verification-server#474 The value used should be backward compatible. The value for BundleDelayThreshold/BundleCountThreshold is from https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/db101e30979316cca594a74b3181a9a3b6094086/trace.go#L72-L81 * fixup! Making stackdriver exporter options configurable in environment variable. * Add document explaining the additional knobs. * Change BundleCountThreshold type to unit. * fixup! Change BundleCountThreshold type to unit.
/assign @yegle as he's going deep on this :D |
I haven't made much progress by just read the code. I'm adding some more debug logging and that should help me understand the bundler behavior better. |
The previous attempt to add debug logging won't work as the OpenCensus Stackdriver exporter uses gRPC instead of HTTP. Working on binary logging support to gain more insight. |
After tweaking the source code and adding a lot more logging statement, I think I finally figure out why... TLDR: we misused the stackdriver exporter and double-registered it to export metrics, causing the same data to be exported multiple times, and got throttled by Cloud Monitoring API. The correct way to start the exporter, per the example, is to call the In our code, we called Why did we call the latter one? Probably because 1) the OpenCensus Agent exporter requires it (see https://github.com/google/exposure-notifications-server/blob/f0596149c8380dfc38abfbc42629e764885a7be6/pkg/observability/opencensus.go#L56), and 2) implementing The stackdriver exporter project is also considering removing the |
This is fixed! |
TL;DR
Example log:
This log is from
apiserver
service, but it also happen a lot one2e-runner
service.The text was updated successfully, but these errors were encountered: