KAFKA−17999: Fix flaky test DynamicConnectionQuotaTest.testDynamicConnectionQuota #20657
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[KAFKA-17999] Deflake
DynamicConnectionQuotaTest
by unifying loopback family and stabilizing connection-count waitsWhat
This PR eliminates nondeterminism in
DynamicConnectionQuotaTest
caused by IPv4/IPv6 resolution differences and timing sensitivity in connection accounting.JIRA
KAFKA-17999
Reproduction of flakiness
To force the mismatch and make the original test fail locally, you can bias the system toward IPv6 and ensure
"localhost"
resolves to::1
while the test counts on127.0.0.1
.These steps modify
/etc/hosts
; revert instructions are below.Backup and edit
/etc/hosts
to favor IPv6 localhostForce JVM to prefer IPv6 and disable DNS caching
Run the test repeatedly
Observed: Test fails on every alternate run due to
java.net.SocketException: Broken pipe
when the broker applies the per-IP limit to one literal while the client connects via another, causing unexpected disconnects at the quota boundary.IMPORTANT (Do at the end, when done verifying the stability of the fix): Revert the host/network changes after reproducing
Why this flakes
Environment-dependent IPv4/IPv6 mismatch
The test previously used:
connectionCount(localAddress)
wherelocalAddress
was127.0.0.1
(IPv4), butconnect()
dialed"localhost"
which may resolve to either127.0.0.1
or::1
depending on the machine’s/etc/hosts
and JVM preferences.Because Kafka enforces connection quotas per literal remote IP, counting on
127.0.0.1
while connecting via::1
(or vice versa) breaks the test’s assumptions and leads to intermittent failuresHow (the fix)
::1
vs127.0.0.1
) and use that same literal everywhere in the test.setUp()
to avoid first-use side effects before taking connection-count baselines.connect()
(client sockets),MAX_CONNECTIONS_PER_IP_OVERRIDES_CONFIG
),connectionCount(localAddress)
).Verification of stability
With the IPv6-favored setup above, the test below now consistently passes because counting, dialing, and per-IP override all use the same detected literal:
Scope