Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915

platinumhamburg · 2025-10-31T02:12:11Z

Purpose

Linked issue: close #1914

Brief change log

Tests

API and Format

Documentation

beryllw · 2025-10-31T02:31:01Z

fluss-client/src/main/java/org/apache/fluss/client/lookup/PrefixKeyLookuper.java


    @Override
-    public CompletableFuture<LookupResult> lookup(InternalRow prefixKey) {
+    public synchronized CompletableFuture<LookupResult> lookup(InternalRow prefixKey) {


Using synchronized on the entire lookup method might affect client-side lookup efficiency. May be we could narrow the lock scope to only cover thread-unsafe components instead?

Hi @beryllw I'll try to optimize that.

@xx789633 What's your solution to optimize it? IIUC, Flink lookup join operator should call lookup synchronously.. The mutiple thread calling should happen when retry happen which is call in a callback.

Hi @luoyuxia @platinumhamburg @beryllw some members in lookuper are stateful, so I agree that there should be some lock protection for those members.

But what I'm more curious about is: in what scenarios would Flink perform concurrent lookups using the same lookuper instance? Are there any more detailed logs available for that particular bug?

Hi @luoyuxia , I don't think the retry logic would cause concurrent lookups because the result future defined in asyncLookup will only be completed when the retry fails (with resultFuture.completeExceptionally) or successes (resultFuture.complete). In either case, Flink will wait for the result future to finish before its own retry logic.

Hi @luoyuxia @platinumhamburg @beryllw some members in lookuper are stateful, so I agree that there should be some lock protection for those members.

But what I'm more curious about is: in what scenarios would Flink perform concurrent lookups using the same lookuper instance? Are there any more detailed logs available for that particular bug?

Hi, @xx789633. This case occurs during the tabletServer restart process (cluster upgrade). Currently, what I have observed is that it can be stably reproduced when restarting the Flink lookup job during the upgrade. I have tried to reproduce it in other scenarios, but without success.

fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper

6ec6e33

beryllw reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915

Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915

platinumhamburg commented Oct 31, 2025

Uh oh!

beryllw Oct 31, 2025

Uh oh!

xx789633 Oct 31, 2025

Uh oh!

luoyuxia Oct 31, 2025

Uh oh!

xx789633 Nov 1, 2025

Uh oh!

xx789633 Nov 1, 2025

Uh oh!

swuferhong Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915

Are you sure you want to change the base?

Fix thread-safety problem of PrimaryKeyLoookuper and PrefixKeyLookuper #1915

Conversation

platinumhamburg commented Oct 31, 2025

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

beryllw Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

xx789633 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

luoyuxia Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

xx789633 Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

xx789633 Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

swuferhong Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants