-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: set replication factor for kafka stability #1606
base: develop
Are you sure you want to change the base?
fix: set replication factor for kafka stability #1606
Conversation
@Mokto Please review the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it backward compatible?
Yes, the sentry-kafka-provisioning job creates topics using the --if-not-exists flag. This ensures that existing topics remain unaffected when the job runs; the configuration only applies to newly created topics:
There are two ways to apply the new configuration to an existing cluster:
|
@fedeabih change code
to
and add @Mokto I tested upgrade
|
ae1c4e5
to
27099dd
Compare
@patsevanton Done, let me know if the comment is what you were looking for. Thanks! |
@fedeabih Is it backward compatible? try to set sentry with default parameters, then change the ReplicationFactor value to 3, then change the ReplicationFactor value to 1 |
I do not know how to properly test this pull request. I tested it like this sentry install:
replicationFactor=3:
replicationFactor=1:
all pods work. |
@patsevanton Yes, you can see the full explanation here . If you already have a setup with ReplicationFactor value of 1, if you apply the new configuration (ReplicationFactor 3), the current topics will remain with 1 replication unless you remove the topics and recreate them as I mentioned in the comment, or you manually modify each topic. More information about Replication Factor:
|
Description
This change resolves the issue
Failed to get watermark offsets: Local: Unknown partition
. The root cause was related to the Kafka replication configuration. By setting the replicationFactor to 3 (matching the number of Kafka brokers/controllers), this fix ensures consistent behavior when retrieving high watermark offsets. This issue was reported in sentry-kubernetes/charts#1458.Technical Explanation
The issue arises because the replicationFactor was previously set to 1, meaning that each partition only had a single replica. In this configuration, the high watermark offset—a key value in Kafka that indicates the maximum offset successfully replicated to all in-sync replicas (ISRs)—becomes unreliable.
Without sufficient replication, the loss of a single broker or temporary unavailability can result in Kafka being unable to compute or provide the high watermark for affected partitions. This leads to the error:
Failed to get watermark offsets: Local: Unknown partition.
By increasing the replicationFactor from 1 to 3, each partition is replicated across all three brokers/controllers. This ensures that the high watermark offset remains consistently available, even if a broker becomes unavailable or experiences minor instability. Additionally, the increased replication enhances fault tolerance and improves the overall availability of partition data across the cluster.
For more details on how Kafka replication works and the role of the high watermark, refer to the official documentation: Replication in Apache Kafka.