-
Notifications
You must be signed in to change notification settings - Fork 14.7k
KAFKA-19735: Add automatic commit offset caching in subscribe mode #20669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
@xijiu could you share the benchmark of your scenario with us? |
Sure. I ran the test with the configuration auto.commit.interval.ms = 100, under two scenarios: cache enabled and cache disabled. The test ran continuously for one minute, after which I observed the LEO of |
I think the cache should ideally only take effect in subscribe mode. This is because, in assign mode, apart from the current consumer being able to modify the offset of the corresponding TopicPartition, the Admin can also make modifications. Let’s assume such a scenario: the offset cached by the consumer is 10, and all subsequent requests to commit offset 10 will return successfully quickly without being sent to the broker. At this point, if the Admin is used to set the offset to 11, the consumer will not be aware of this change. As a result, the consumer caches an invalid offset, which is inconsistent with expectations. Additionally, although we could try to check if the cache is hit in manual commit mode, I feel that manual commit is an active user action, and it's better to send the request to the broker. Alternatively, we can consider adding cache support for manual commits later once this PR stabilizes. |
Hi @xijiu , is there normal case (consumer can poll and get data from broker) comparison? |
Thanks for reply, and that’s a great suggestion. I will create a comparison chart for the benchmark test results and share it later. But I don’t think it should have any impact on performance. |
@TaiJuWu I conducted a simple benchmark test. First, I launched a cluster consisting of 3 brokers, then created a topic with 12 partitions named topic12 using the following command:
Next, I sent a sufficient amount of data to topic12—each message was 1MB, with a total size of approximately 100GB. After that, I performed consumer stress tests using the trunk branch and the 19735 branch respectively, using the command:
The aggregated consumer throughput results are as follows: ![]() The performance of the two is nearly identical. |
The result LGTM. Thanks for your sharing and hard work. |
A label of 'needs-attention' was automatically added to this PR in order to raise the |
Add a cache for the automatic offset commit operation. If the offsets to
be committed are identical between two consecutive commits, the cache
will be hit and a success response will be returned quickly. Note: This
only applies to automatic offset commit operations in subscribe mode.