Skip to content

Conversation

bbejeck
Copy link
Member

@bbejeck bbejeck commented Oct 8, 2025

When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to Integer.MAX_VALUE
effectively turning telemetry off. This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.

Reviewers: Apoorv Mittal [email protected], Matthias
Sax[email protected]

@bbejeck bbejeck requested a review from apoorvmittal10 October 8, 2025 14:35
@bbejeck
Copy link
Member Author

bbejeck commented Oct 8, 2025

@apoorvmittal10 PTAL

Copy link
Contributor

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM, I am just not sure if we make it too complex, tracing down the full chain for "causes"?

@bbejeck bbejeck force-pushed the KAFKA-19747_improve_failed_push_handling branch from 839a65f to 026728b Compare October 8, 2025 22:19
@bbejeck bbejeck force-pushed the KAFKA-19747_improve_failed_push_handling branch from 026728b to 900be01 Compare October 9, 2025 13:18
@bbejeck
Copy link
Member Author

bbejeck commented Oct 9, 2025

@mjsax I've updated the logic for checking exceptions
@apoorvmittal10 not sure if you would like to take another look

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. LGTM.

@bbejeck bbejeck merged commit 1e95d04 into apache:trunk Oct 9, 2025
32 of 35 checks passed
@bbejeck
Copy link
Member Author

bbejeck commented Oct 9, 2025

Merged #20661 into trunk

@bbejeck bbejeck deleted the KAFKA-19747_improve_failed_push_handling branch October 9, 2025 20:32
bbejeck added a commit that referenced this pull request Oct 9, 2025
…ling (#20661)

When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to `Integer.MAX_VALUE`
effectively turning telemetry off.  This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.

Reviewers: Apoorv Mittal <[email protected]>, Matthias
 Sax<[email protected]>
bbejeck added a commit that referenced this pull request Oct 9, 2025
…ling (#20661)

When a failure occurs with a push telemetry request, any exception is
treated as fatal, increasing the time interval to `Integer.MAX_VALUE`
effectively turning telemetry off.  This PR updates the error handling
to check if the exception is a transient one with expected recovery and
keeps the telemetry interval value the same in those cases since a
recovery is expected.

Reviewers: Apoorv Mittal <[email protected]>, Matthias
 Sax<[email protected]>
@bbejeck
Copy link
Member Author

bbejeck commented Oct 9, 2025

cherry-picked to 4.1 aceb32d

@bbejeck
Copy link
Member Author

bbejeck commented Oct 9, 2025

cherry-picked to 4.0 3243300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants