-
Notifications
You must be signed in to change notification settings - Fork 1k
PHOENIX-7566 ZK to SystemTable Sync and Event Reactor for Failover #2302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: PHOENIX-7562-feature
Are you sure you want to change the base?
Conversation
ccdad34
to
cf86560
Compare
* @param fromState the previous state before the transition | ||
* can be null for initial state. | ||
* Also, can be inaccurate in case there is | ||
* connection loss to ZK and multiple state changes happen in between. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Are not we using persistent watchers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in case of connection loss there can be events which are missed. Although this doesn't affect failover IMO, clients need to be aware of this limitation.
* Syncs data from ZooKeeper (source of truth) to the system table. | ||
* This method is called periodically to ensure consistency. | ||
*/ | ||
private void syncZKToSystemTable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of syncing every time, can we check if sync needed by reading the HA record from the HA group system table first, and then sync only if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and also added IT to check the same
…checking system table record before writing
= systemTableRecord.getClusterRole().getDefaultHAGroupState(); | ||
Long lastSyncTimeInMs | ||
= defaultHAGroupState.equals(HAGroupStoreRecord.HAGroupState.ACTIVE_NOT_IN_SYNC) | ||
? System.currentTimeMillis() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ZNode for the HA group is not present, should we set lastSyncTimeInMs to zero always? I do not think it is safe to set to System.currentTimeMillis(). Also what does setting to null mean here? Is it equivalent to setting to zero?
Summary of changes
Event Reactor for Failover, this reactor observes state transitions and makes state updates accordingly.
a. Add peer event reactor (for reaction to peer state transitions)
b. Add local event reactor (for reaction to local state transitions)
New flow to sync between ZK and HA System Table
a. ZK is the source of truth for config and state both. But only local state is maintained in ZK.
b. Updating System table will be best effort basis
c. Regular job with jitter to sync from ZK to System Table in case of missed update.
d. Introduced adminVersion to handle any admin updates to config.
Removed Degraded for reader/writer in favor of Degraded Standby
In update calls check if the state is already updated, double checking now once with local cache and once with direct ZK. This is needed as all the HAGroupStoreManagers will try to update the state on same trigger.
Added fromState to event subscription callback. Note that fromState can be null when ZNode is initialized.
Not blocking any transitions to HA_GROUP table in IndexRegionObserver even if we are in ACTIVE_TO_STANDBY state.
Testing
a. E2E Failover Happy Path
b. E2E Failover with Abort
c. Peer Reaction when ACTIVE moves from Sync to Store&Forward mode and back
a. Existing tests passing
b. Added test for regular job with jitter to sync from ZK to System Table in case of missed update.