-
Notifications
You must be signed in to change notification settings - Fork 407
Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915
Conversation
|
|
||
| @Override | ||
| public CompletableFuture<LookupResult> lookup(InternalRow prefixKey) { | ||
| public synchronized CompletableFuture<LookupResult> lookup(InternalRow prefixKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using synchronized on the entire lookup method might affect client-side lookup efficiency. May be we could narrow the lock scope to only cover thread-unsafe components instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @beryllw I'll try to optimize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xx789633 What's your solution to optimize it? IIUC, Flink lookup join operator should call lookup synchronously.. The mutiple thread calling should happen when retry happen which is call in a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luoyuxia @platinumhamburg @beryllw some members in lookuper are stateful, so I agree that there should be some lock protection for those members.
But what I'm more curious about is: in what scenarios would Flink perform concurrent lookups using the same lookuper instance? Are there any more detailed logs available for that particular bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luoyuxia , I don't think the retry logic would cause concurrent lookups because the result future defined in asyncLookup will only be completed when the retry fails (with resultFuture.completeExceptionally) or successes (resultFuture.complete). In either case, Flink will wait for the result future to finish before its own retry logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luoyuxia @platinumhamburg @beryllw some members in lookuper are stateful, so I agree that there should be some lock protection for those members.
But what I'm more curious about is: in what scenarios would Flink perform concurrent lookups using the same lookuper instance? Are there any more detailed logs available for that particular bug?
Hi, @xx789633. This case occurs during the tabletServer restart process (cluster upgrade). Currently, what I have observed is that it can be stably reproduced when restarting the Flink lookup job during the upgrade. I have tried to reproduce it in other scenarios, but without success.
Purpose
Linked issue: close #1914
Brief change log
Tests
API and Format
Documentation