Skip to content

Conversation

proud-parselmouth
Copy link

Issues

  • Currently isEvacuateFinished doesn't check if there is any resource with customized rebalance mode in current state. For Pinot usecase this isEvacuateFinished is returning true even when there are resources with customized rebalance mode in current state.

(#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)

Description

  • This change returns false if the current state contains any resource with rebalance mode full auto or customized
  • Because isEvactuateFinished also checks if the instance operations is set to evacaute, added another API in HelixAdmin isInstanceDrained to check if the instance doesn't has any resource with rebalance mode full auto or customized in the current state
  • Exposed the isInstanceDrained check via API
  • Added a logic in CustomRebalancer to not update the assigned mapping of instances marked as evacuted.

(Write a concise description including what, why, how)

Tests

  • The following tests are written for this issue:

  • TestInstanceOperation.testEvacuateWithCustomizedResource()

  • TestCustomRebalancer.testDisabledBootstrappingPartitions() have been adjusted appropriately

  • The following is the result of the "mvn test" command on the appropriate module:

mvn test -o -Dtest=TestInstanceOperation -pl=helix-core


[INFO] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 597.813 s - in org.apache.helix.integration.rebalancer.TestInstanceOperation
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- jacoco:0.8.6:report (generate-code-coverage-report) @ helix-core ---
[INFO] Loading execution data file /Users/anubagar/workspace/helix-projects/apache-helix/helix/helix-core/target/jacoco.exec
[INFO] Analyzed bundle 'Apache Helix :: Core' with 962 classes
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  10:11 min
[INFO] Finished at: 2025-04-21T22:23:14+05:30
[INFO] ------------------------------------------------------------------------

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@xyuanlu
Copy link
Contributor

xyuanlu commented Apr 22, 2025

Thanks for contributing to open source!
I had a discussion with @MarkGaox, we feel like

  1. Lets keep the behavior of IsEvacuateFinished the same, witch only check full-auto resources
  2. Having the new API isInstanceDrained check all resources, including full-auto, semi-auto and customized. (not only aull-auto and customized)
  3. in customized rebalancer, lets rename the param isInstanceEvacuate to sth like isInstanceAssignableForCustomized and include Liveinstance with Evacuate tag. Currently it includes online and offline instances, we only want online instances.

Comment on lines 137 to 139
InstanceConfig instanceConfig = cache.getInstanceConfigMap().get(instance);
boolean isInstanceEvacuated = instanceConfig != null &&
instanceConfig.getInstanceOperation().getOperation() == InstanceConstants.InstanceOperation.EVACUATE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename the boolean field toisAssignableForCustomizedResource? This will make it more extensible for future use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that when an instance is offline but tagged EVACUATE, its partition isn’t dropped. This seems unintended because we should make it be consistent with the old behavior. Could we add a check to ensure nodes marked EVACUATE are live?

isAssignableForCustomizedResource = isLive && hasEvacuateTag

 - Add isInstanceDrained method in HelixAdmin
 - Expose the method via instance update rest end point
 - Change the conditional checks order in isEvacuateFinished to improve
   latency
@proud-parselmouth proud-parselmouth force-pushed the anubagar/add_custom_resources_check_in_isevacuateFinished branch from 24bd881 to 86abe3f Compare May 9, 2025 09:11
Comment on lines +137 to +140
InstanceConfig instanceConfig = cache.getInstanceConfigMap().get(instance);
boolean hasEvacuatedOp = instanceConfig != null &&
instanceConfig.getInstanceOperation().getOperation() == InstanceConstants.InstanceOperation.EVACUATE;
boolean isAssignableForCustomizedResource = cache.getLiveInstances().containsKey(instance) && hasEvacuatedOp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave custom rebalancer away with this logic currently? I think this requires a comprehensive design with custom rebalance users in open source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants