-
Notifications
You must be signed in to change notification settings - Fork 14.7k
KAFKA-19747: Update ClientTelemetryReporter telemetry push error handling #20661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-19747: Update ClientTelemetryReporter telemetry push error handling #20661
Conversation
clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java
Outdated
Show resolved
Hide resolved
clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java
Outdated
Show resolved
Hide resolved
@apoorvmittal10 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix.
clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java
Outdated
Show resolved
Hide resolved
clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM, I am just not sure if we make it too complex, tracing down the full chain for "causes"?
839a65f
to
026728b
Compare
…g between retryable and fatal exceptions and add test cases
026728b
to
900be01
Compare
@mjsax I've updated the logic for checking exceptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. LGTM.
Merged #20661 into trunk |
…ling (#20661) When a failure occurs with a push telemetry request, any exception is treated as fatal, increasing the time interval to `Integer.MAX_VALUE` effectively turning telemetry off. This PR updates the error handling to check if the exception is a transient one with expected recovery and keeps the telemetry interval value the same in those cases since a recovery is expected. Reviewers: Apoorv Mittal <[email protected]>, Matthias Sax<[email protected]>
…ling (#20661) When a failure occurs with a push telemetry request, any exception is treated as fatal, increasing the time interval to `Integer.MAX_VALUE` effectively turning telemetry off. This PR updates the error handling to check if the exception is a transient one with expected recovery and keeps the telemetry interval value the same in those cases since a recovery is expected. Reviewers: Apoorv Mittal <[email protected]>, Matthias Sax<[email protected]>
cherry-picked to 4.1 aceb32d |
cherry-picked to 4.0 3243300 |
When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to
Integer.MAX_VALUE
effectively turning telemetry off. This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.
Reviewers: Apoorv Mittal [email protected], Matthias
Sax[email protected]