Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBZ-PGYB] Restrict retries when retry count reaches limit #166

Merged

Conversation

vaibhav-yb
Copy link
Collaborator

@vaibhav-yb vaibhav-yb commented Nov 27, 2024

Problem

With the current retry model, it was being noticed that the connector ended up retrying infinitely whenever an exception was thrown. This could lead to false positives that the connector is still running while the retries will keep failing.

Solution

This PR addresses the issue by adding a check to the task layer so now if the retry count reaches the maximum value, the connector will exit and the task will reach in a failed state - this will help the end user know the status of the task and act accordingly.

This PR also redefines the following properties and changes their default values:

  1. errors.max.retries - new default value is 60
  2. retriable.restart.connector.wait.ms - new default value is 30000 (30s)

With the above change, the complete retry duration with the above default configuration will now be 30 minutes. This effectively means that if the connector/s task fails after exhausting all the retries then it will go into a FAILED state.

For example, if the connector is needed to retry for a total of 30 minutes then we can handle it in 2 ways:

  1. By fixing the number of retries: Let's say we want the number of retries to be fixed to 15 so we can now configure our retry delay accordingly i.e. 30 / 15 = 2 minutes = 120 s = 120000 ms, so the configuration will now add:
"retriable.restart.connector.wait.ms":"120000",
"errors.max.retries":"15"
  1. By fixing the retry delay: If we want to have a retry delay of a minute then we will configure the number of retries accordingly i.e. 30 / 1 = 30 retries, so the configuration will now be:
"retriable.restart.connector.wait.ms":"60000",
"errors.max.retries":"30"

@vaibhav-yb vaibhav-yb requested a review from suranjan November 27, 2024 11:46
@vaibhav-yb vaibhav-yb self-assigned this Nov 27, 2024
@suranjan
Copy link
Collaborator

Please add a description on how to set the config for 30 minutes of retry for example.
Also lets make the default values so that it retries for 30 minutes before failing

Copy link
Collaborator

@suranjan suranjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a description on how to set the config for 30 minutes of retry for example.
Also lets make the default values so that it retries for 30 minutes before failing

@vaibhav-yb vaibhav-yb merged commit 922a5c9 into yugabyte:ybdb-debezium-2.5.2 Nov 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants