Add support for listing Kafka offsets in bulk #26168

pmw-rp · 2025-07-10T17:32:27Z

Description

This PR modifies how the Trino Kafka integration performs translation of timestamps into offsets.

The current implementation makes a Kafka API call per partition to translate the timestamp, however the API can accept a list of partitions as part of the call, allowing for a bulk translation.

By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time.

Release notes

(X) This is not user-visible or is docs only, and no release notes are required.

Since the only impact for end users is increased query performance, release notes are probably optional.

…le call to Kafka.

cla-bot · 2025-07-10T17:32:31Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

findinpath · 2025-07-10T20:43:24Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

Change commit comment to
pull all partition offsets in a single call to Kafka. -> Retrieve in bulk partition offsets

findinpath · 2025-07-10T20:43:49Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

@@ -37,6 +37,7 @@
 import org.apache.kafka.common.config.ConfigResource;


Squash the two commits into one

findinpath · 2025-07-10T20:46:34Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

@@ -37,6 +37,7 @@
 import org.apache.kafka.common.config.ConfigResource;

 import java.util.Collections;
+import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;


In the description

"By changing the call to a bulk operation, the number of API calls can be significantly reduced, improving query startup time."

please add some specific numbers to add the reviewers understand the impact of this change.

findinpath · 2025-07-10T20:48:55Z

https://github.com/trinodb/trino/actions/runs/16201941001/job/45742893962?pr=26168

Commit 97525136936f7faffd10b4ed3519939d170416e1 is an merge commit: https://api.github.com/repos/trinodb/trino/commits/97525136936f7faffd10b4ed3519939d170416e1
Error: PR requires a rebase. Found: 1 merge commits.

git rebase origin/master

findinpath · 2025-07-10T20:55:37Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

+        Map<TopicPartition, Long> topicPartitionOffsets = new HashMap<>();
+        topicPartitionOffsetAndTimestamps.forEach((topicPartition, offsetAndTimestamp) -> {
+            if (offsetAndTimestamp != null) {
+                topicPartitionOffsets.put(topicPartition, offsetAndTimestamp.offset());
+            }
+        });
+        return topicPartitionOffsets;


return topicPartitionOffsetAndTimestamps.entrySet().stream() .filter(entry -> entry.getValue() != null) .collect(Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().offset()));

findinpath · 2025-07-10T21:06:51Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

+                        Map<TopicPartition, Long> partitionBeginTimestamps = new HashMap<>();
+                        partitionBeginOffsets.forEach((partition, partitionIndex) -> {
+                            partitionBeginTimestamps.put(partition, offsetTimestampRanged.get().begin());
+                        });


long partitionBeginTimestamp = floorDiv(offsetTimestampRanged.get().begin(), MICROSECONDS_PER_MILLISECOND); Map<TopicPartition, Long> partitionBeginTimestamps = partitionBeginOffsets.entrySet().stream() .collect(Collectors.toMap(Map.Entry::getKey, _ -> partitionBeginTimestamp));

No need to mutate the map anymore

timestamps.replaceAll((k, v) -> floorDiv(v, MICROSECONDS_PER_MILLISECOND));

in findOffsetsForTimestampGreaterOrEqual method.

findinpath · 2025-07-10T21:11:38Z

plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaFilterManager.java

@@ -172,11 +182,17 @@ private boolean isTimestampUpperBoundPushdownEnabled(ConnectorSession session, S
        return KafkaSessionProperties.isTimestampUpperBoundPushdownEnabled(session);
    }

-    private static Optional<Long> findOffsetsForTimestampGreaterOrEqual(KafkaConsumer<byte[], byte[]> kafkaConsumer, TopicPartition topicPartition, long timestamp)
+    private static Map<TopicPartition, Long> findOffsetsForTimestampGreaterOrEqual(KafkaConsumer<byte[], byte[]> kafkaConsumer, Map<TopicPartition, Long> timestamps)


optional: Maybe we could think rather returning Map<TopicPartition, Optional<Long> instead

It is better to avoid having null values.

pmw-rp and others added 2 commits July 7, 2025 23:08

Changing offset list strategy to pull all partition offsets in a sing…

eba0245

…le call to Kafka.

Merge branch 'trinodb:master' into list-offsets

9752513

github-actions bot added the kafka Kafka connector label Jul 10, 2025

findinpath requested a review from wendigo July 10, 2025 20:41

findinpath reviewed Jul 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for listing Kafka offsets in bulk #26168

Add support for listing Kafka offsets in bulk #26168

Uh oh!

pmw-rp commented Jul 10, 2025

Uh oh!

cla-bot bot commented Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath commented Jul 10, 2025 •

edited

Loading

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Uh oh!

Uh oh!

		@@ -37,6 +37,7 @@
		import org.apache.kafka.common.config.ConfigResource;

Add support for listing Kafka offsets in bulk #26168

Are you sure you want to change the base?

Add support for listing Kafka offsets in bulk #26168

Uh oh!

Conversation

pmw-rp commented Jul 10, 2025

Description

Release notes

Uh oh!

cla-bot bot commented Jul 10, 2025

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

findinpath commented Jul 10, 2025 •

edited

Loading