Replies: 1 comment 1 reply
-
@malmetom that's not enough information for us to conclude much. All we can tell is that a quorum queue follower is behind the leader, so it asks the leader for the delta to reinstall the (follower's) Raft log. We don't know anything about what happens on the leader. Without full logs from all cluster nodes or a reasonably reliable way to reproduce we can only guess as to what's going on, and we do not guess in this community. Guessing is a very expensive way of troubleshooting distributed infrastructure. This could be a different manifestation of #13101, which is a known open issue for quorum queues. #14237 and #14241 are two competing solutions that we'll get to some time after |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Getting execptions from a client trying to consume from a quorum queue, client error state
Exception (541) Reason: "INTERNAL_ERROR - timed out consuming from quorum queue 'xxx' in vhost '/': {'%2F_yyy'}
logs on one Rabbit node state
2025-10-03 22:01:37.917907+02:00 [info] <0.863.0> queue 'xxx' in vhost '/': term mismatch - follower had entry at 658 with term 81 but not with term 82. Asking leader {'%2F_yyy','[email protected]'} to resend from 659
2025-10-03 22:02:07.919957+02:00 [info] <0.863.0> queue 'xxx' in vhost '/': term mismatch - follower had entry at 658 with term 81 but not with term 82. Asking leader {'%2F_yyy,'[email protected]'} to resend from 659
2025-10-03 22:02:11.782388+02:00 [error] <0.188296.0> Error on AMQP connection <0.188296.0> (10.20.128.0:60716 -> 10.20.222.109:5672, vhost: '/', user: 'USER', state: running), channel 5:
2025-10-03 22:02:11.782388+02:00 [error] <0.188296.0> operation basic.consume caused a connection exception internal_error: "timed out consuming from quorum queue 'xxx' in vhost '/': {'%2F_yyy',\n '[email protected]'}"
2025-10-03 22:02:11.806517+02:00 [info] <0.188296.0> closing AMQP connection (10.20.128.0:60716 -> 10.20.222.109:5672, vhost: '/', user: 'USER', duration: '2M, 15s')
This happens again and again, and seems like Rabbit cannot restore
Happens after netowork partition. Any idea if this is a potential bug or wrong use of quorum queus
Reproduction steps
Hard to say, will add steps when I have found a way to reproduce
Expected behavior
Should be able to survive network partition
Additional context
Running version 4.1.1
Beta Was this translation helpful? Give feedback.
All reactions