Skip to content

Conversation

@proud-parselmouth
Copy link
Collaborator

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

Presently any top state downward transition message from MASTER->SLAVE or (LEADER->STANDBY) gets evaluated as a load balance message. This is because when determining message type helix checks how many present participants are SLAVE and if that is >= min active required number(almost always 1), it treats the top state downward transition message MASTER->SLAVE as a load balance because it is increasing the number of slave replicas.
This causes an issue when it is urgently required to transfer the topstate and the load balance messages are throttled due to ST throttle configs.

(apache#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)

Description

Added a new config ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION to handle such cases. In Intermediate State calc if the above config is true we treat any top state downward ST message as RECOVERY_REBALANCE.

  • Here are some details about my PR, including screenshots of any UI changes:

(Write a concise description including what, why, how)

Tests

  • The following tests are written for this issue:
    TestIntermediateStateCalcStage.testEnableRecoveryRebalanceForTopStateDownwardStateTransition

(List the names of added unit/integration tests)

  • The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

RebalanceType rebalanceType =
getRebalanceTypePerMessage(requiredState, message, derivedCurrentStateMap);

if(recoveryRebalanceForTopStateDownwardTransition) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the logic to getRebalanceTypePerMessage() method itself - instead of overriding the decision here.

if (secondTopState == null) {
return false;
}
return msg.getFromState() == topState && msg.getToState() == secondTopState;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return msg.getFromState() == topState && msg.getToState() == secondTopState;
return topState.equals(msg.getFromState()) && secondTopState.equals(msg.getToState());

This might not work. We should use .equals for String comparision. IIRC == compares object reference and not the content.

return getParticipantDeregistrationTimeout() > -1;
}

public boolean isRecoveryBalanceForTopStateDownwardTransitionEnabled() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public boolean isRecoveryBalanceForTopStateDownwardTransitionEnabled() {
public boolean isRecoveryRebalanceForTopStateDownwardTransitionEnabled() {

Rebalance**

return _record.getBooleanField(ClusterConfigProperty.ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION.name(), false);
}

public void setRecoveryBalanceForTopStateDownwardTransitionEnabled(boolean enabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void setRecoveryBalanceForTopStateDownwardTransitionEnabled(boolean enabled) {
public void setRecoveryRebalanceForTopStateDownwardTransitionEnabled(boolean enabled) {

Rebalance**

}

public void setRecoveryBalanceForTopStateDownwardTransitionEnabled(boolean enabled) {
_record.setBooleanField(ClusterConfigProperty.ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION.name(),enabled);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_record.setBooleanField(ClusterConfigProperty.ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION.name(),enabled);
_record.setBooleanField(ClusterConfigProperty.ENABLE_RECOVERY_REBALANCE_FOR_TOPSTATE_DOWNWARD_TRANSITION.name(), enabled);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants