Skip to content

Conversation

xijiu
Copy link
Collaborator

@xijiu xijiu commented Oct 9, 2025

Add a cache for the automatic offset commit operation. If the offsets to
be committed are identical between two consecutive commits, the cache
will be hit and a success response will be returned quickly. Note: This
only applies to automatic offset commit operations in subscribe mode.

@github-actions github-actions bot added triage PRs from the community consumer clients labels Oct 9, 2025
@chia7712
Copy link
Member

chia7712 commented Oct 9, 2025

@xijiu could you share the benchmark of your scenario with us?

@xijiu
Copy link
Collaborator Author

xijiu commented Oct 9, 2025

@xijiu could you share the benchmark of your scenario with us?

Sure.

I ran the test with the configuration auto.commit.interval.ms = 100, under two scenarios: cache enabled and cache disabled. The test ran continuously for one minute, after which I observed the LEO of _consumer_offset and the number of OFFSET_COMMIT RPCs.
image
image

@xijiu
Copy link
Collaborator Author

xijiu commented Oct 9, 2025

I think the cache should ideally only take effect in subscribe mode. This is because, in assign mode, apart from the current consumer being able to modify the offset of the corresponding TopicPartition, the Admin can also make modifications. Let’s assume such a scenario: the offset cached by the consumer is 10, and all subsequent requests to commit offset 10 will return successfully quickly without being sent to the broker. At this point, if the Admin is used to set the offset to 11, the consumer will not be aware of this change. As a result, the consumer caches an invalid offset, which is inconsistent with expectations.

Additionally, although we could try to check if the cache is hit in manual commit mode, I feel that manual commit is an active user action, and it's better to send the request to the broker. Alternatively, we can consider adding cache support for manual commits later once this PR stabilizes.

@TaiJuWu
Copy link
Collaborator

TaiJuWu commented Oct 9, 2025

Hi @xijiu , is there normal case (consumer can poll and get data from broker) comparison?
If there is not any obvious performance downgrade, I think it is a great improvement.

@xijiu
Copy link
Collaborator Author

xijiu commented Oct 9, 2025

Hi @xijiu , is there normal case (consumer can poll and get data from broker) comparison? If there is not any obvious performance downgrade, I think it is a great improvement.

Thanks for reply, and that’s a great suggestion. I will create a comparison chart for the benchmark test results and share it later. But I don’t think it should have any impact on performance.

@xijiu
Copy link
Collaborator Author

xijiu commented Oct 9, 2025

@TaiJuWu I conducted a simple benchmark test. First, I launched a cluster consisting of 3 brokers, then created a topic with 12 partitions named topic12 using the following command:

sh kafka-topics.sh --bootstrap-server 10.255.225.107:9092 --create --topic topic12 --partitions 12 --replication-factor 2

Next, I sent a sufficient amount of data to topic12—each message was 1MB, with a total size of approximately 100GB. After that, I performed consumer stress tests using the trunk branch and the 19735 branch respectively, using the command:

sh kafka-consumer-perf-test.sh --bootstrap-server 10.255.225.107:9092 --topic topic12 --command-property auto.offset.reset=earliest --num-records 1000000 --group groupXX

The aggregated consumer throughput results are as follows:

image

The performance of the two is nearly identical.

@TaiJuWu
Copy link
Collaborator

TaiJuWu commented Oct 9, 2025

@TaiJuWu I conducted a simple benchmark test. First, I launched a cluster consisting of 3 brokers, then created a topic with 12 partitions named topic12 using the following command:

sh kafka-topics.sh --bootstrap-server 10.255.225.107:9092 --create --topic topic12 --partitions 12 --replication-factor 2

Next, I sent a sufficient amount of data to topic12—each message was 1MB, with a total size of approximately 100GB. After that, I performed consumer stress tests using the trunk branch and the 19735 branch respectively, using the command:

sh kafka-consumer-perf-test.sh --bootstrap-server 10.255.225.107:9092 --topic topic12 --command-property auto.offset.reset=earliest --num-records 1000000 --group groupXX

The aggregated consumer throughput results are as follows:

image The performance of the two is nearly identical.

The result LGTM. Thanks for your sharing and hard work.

Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants