[Questions] Quorum queue setup on 2 WAN datacenters with exactly two availability zones (racks) each #13165

jonenst · 2025-01-28T08:18:27Z

jonenst
Jan 28, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.5

Erlang version used

27.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Community Docker image

rabbitmq-diagnostics status output

N/A

Logs from node 1 (with sensitive values edited out)

N/A

Logs from node 2 (if applicable, with sensitive values edited out)

N/A

Logs from node 3 (if applicable, with sensitive values edited out)

N/A

rabbitmq.conf

N/A

Steps to deploy RabbitMQ cluster

N/A

Steps to reproduce the behavior in question

N/A

advanced.config

No response

Application code

No response

Kubernetes deployment file

No response

What problem are you trying to solve?

Hi,
I want to design a HA rabbitmq using only two availibity zones. The cluster should tolerate the loss of a whole availibity zone. Unlike #11877 which correctly explains why this is not possible, in my case another secondary datacenter is available over a WAN link. The docs state that we shouldn't just simply make the cluster span over the 2 datacenters, but is there a possibility to have everything (control plane + data) running on the main datacenter during normal operations, and only use the secondary datacenter as a some kind of tie breaker for the control plane, but never for data, and only in the temporary case of the loss of one of the availability zones in the primary datacenter (the cluser would still work, but without any margin for more failures anymore)

If this is possible, and if the WAN network works well enough (no lost connections, only higher latencies) during the period of outage, is this kind of setup supported ?

This could be compared to running an elasticsearch cluster with a dedicated tie breaker like in this post https://discuss.elastic.co/t/tie-breaker-master-node-with-higher-latency/89683

Thanks in advance

Answered by michaelklishin

Jan 28, 2025

@jonenst it is not possible, witness replicas/node types have never made it to our Raft implementation despite two attempts, and this idea does not fit the quorum queue or stream replication "model" and user-exposed features.

Four node clusters are explicitly not recommended, you'd need 3 or 5 nodes.

If the latency is low (I'd say, a few ms), you can try spanning a cluster across a WAN. Some users successfully do it over availability zones on AWS, for example. However, for as long as the Mnesia is supported in any way, and thus partition handling strategies still exist, you will run the risk of a latency spike creating completely unnecessary leader migrations and such.

View full answer

michaelklishin · 2025-01-28T12:54:09Z

michaelklishin
Jan 28, 2025
Maintainer

@jonenst it is not possible, witness replicas/node types have never made it to our Raft implementation despite two attempts, and this idea does not fit the quorum queue or stream replication "model" and user-exposed features.

Four node clusters are explicitly not recommended, you'd need 3 or 5 nodes.

If the latency is low (I'd say, a few ms), you can try spanning a cluster across a WAN. Some users successfully do it over availability zones on AWS, for example. However, for as long as the Mnesia is supported in any way, and thus partition handling strategies still exist, you will run the risk of a latency spike creating completely unnecessary leader migrations and such.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] Quorum queue setup on 2 WAN datacenters with exactly two availability zones (racks) each #13165

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

[Questions] Quorum queue setup on 2 WAN datacenters with exactly two availability zones (racks) each #13165

Uh oh!

Uh oh!

jonenst Jan 28, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 1 comment

Uh oh!

Uh oh!

michaelklishin Jan 28, 2025 Maintainer

jonenst
Jan 28, 2025

michaelklishin
Jan 28, 2025
Maintainer