Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sdk-metrics] Promote overflow attribute from experimental to stable #5909
[sdk-metrics] Promote overflow attribute from experimental to stable #5909
Changes from 13 commits
7070393
a349acf
737c3c9
9e4fe9b
bc41d1a
127025f
f3d1257
0823e2d
aeceb18
ba83250
b72090e
6c67e3f
a118842
e4e21ca
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feel very misleading to me. The measurements are still dropped. The overflow attribute is not a fix for that, it is just a detection mechanism to alert users when they have an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your definition of "dropped" vs. not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the right question. I'm not the audience. The question is, what is user's definition of dropped and expectation when something is recorded? I promise you it is not "my value and dimensions get folded into a flag and thrown away" 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the specification perspective, "dropped" means the value is ignored, it has no effect to the final metrics. Since overflow attribute keeps the accurate total sum for Counter (which means the user will always get the correct total sum, whether overflow happened or not), and correct percentile for Histogram, it shouldn't be considered "dropped".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. My perspective is... I'm some user and I look at my backend and I see all my metrics, I'm happy. Then a day later I look and everything is gone. Or suddenly everything is empty? Certainly not what I saw before, right? So naturally I'm going to think data is missing and must be getting dropped somewhere. But I read the SDK CHANGELOG and it said it no longer drops anything! Now I'm upset and declare the SDK bugged and open an issue to vent my anger 🔥
I'm not saying the spec is wrong. But I don't think the users will understand the nuance 🤔
IMO this is going to be like lawyer jargon for users 😄 I think we should break it down in more layman/accessible terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused. Would you explain more? E.g. could you give a concrete example about "everything is empty" or "everything is gone"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the wording may not easily be understood by all end-users. The https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#cardinality-limits doc is very good, and perhaps we can make some additions to it to explain how the sdk behaves when limit is hit, and how to interpret overflow attribute correctly.
One important think I'd like called out is, if user has a query like sum of all requests, where route=foo, and an overflow exists - then that query is no longer trustable, as there is no way to tell if a route=foo measurement was folded into overflow. The only thing trustable in the event an overflow exists is the total metrics (i.e the one which do not filter based on any dimensions).
If none volunteers to make this change in the doc, I can cover it. (I am implementing similar thing for OTel Rust right now, so I can hopefully steal some wordings! Ideally this should be covered in otel docs website, so every language can benefit)
https://github.com/utpilla/MetricOverflowAttribute?tab=readme-ov-file can be a good starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue #5939 to track this.