-
Notifications
You must be signed in to change notification settings - Fork 3.6k
avoid double DownIndirectlyConnected when indirectly-connected is between independently-known unreachable #32794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…t change overall reachability
patriknw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good optimization with the disclaimer that it might have unforeseen consequences.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more tests that are covering similar thing for the other strategies? KeepOldest, DownAllNodes, LeaseMajority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be good to expand or add new multi-jvm test that cover this scenario. You can run locally with:
multi-jvm:testOnly akka.cluster.sbr.IndirectlyConnected5NodeSpec
Note that we also have SBR tests in akka-cluster-sharding/src/multi-jvm/scala/akka/cluster/sbr/
| val unreachableByDirectlyConnectedObservers = | ||
| reachability.records.iterator.flatMap { r => | ||
| if (directlyConnectedObservers(r.observer)) Some(r.subject) else None | ||
| }.toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that a collect?
reachability.records.iterator.collect { case r if directlyConnectedObservers(r.observer) =>
r.subject
}.toSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tend to forget about collect...
| throw new IllegalStateException( | ||
| s"SBR double $additionalDecision decision, downing all instead. " + | ||
| s"originalReachability: [$originalReachability], filtered reachability [$reachability], " + | ||
| s"still indirectlyConnected: [$indirectlyConnected], seenBy: [$seenBy]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should keep this message around and include it in the final IllegalStateException, so that we have full information about both decisions?
Scenario:
Thus A is indirectly connected (both a subject and observer of unreachability) but B is not. SBR resolution will be
DownIndirectlyConnected(adding A to the downing decision) and then consider a secondary decision.In considering that decision, unreachability edges between nodes where both are in the set of observers and subjects are removed, as are all edges where both of "the subject has seen the latest gossip" and "observer has observed a subject which had seen the latest gossip" are true. If indirectly connected nodes remain, the decision effectively becomes
DownAll. These nodes can only be indirectly connected by the first criterion (intersection between observers and subjects): the definition of the second criterion (unreachable but seen latest gossip) is always removed (as any observers of that subject are also indirectly connected by the second criterion).In this scenario, the SBR on the other side from A & B will
DownAll: all nodes on either side of the partition after this are either crashed or downed.The double-down-indirectly-connected is safe, but it seems to be literal overkill.