RabbitMQ – How to Automatically Rejoin Node in Quorum Queues

I am exploring RabbitMQ quorum queues to improve HA for some services in a Kubernetes cluster. As I am reading, they are designed with data safety in mind.

However, the chapter "Managing Replicas" states:

Replicas of a quorum queue are explicitly managed by the operator.
When a new node is added to the cluster, it will host no quorum queue
replicas unless the operator explicitly adds it to a member (replica)
list of a quorum queue or a set of quorum queues.

It seems therefore that, in case of disruptions (especially involuntary), the following situation could arise (for a 3-nodes cluster):

after a disruption a node would go down: the other two nodes still compose the majority and will "keep the queue alive", possibly electing a new leader;
kubernetes will provide a new node (pod) to replace the failed node; the new node will automatically rejoin the RabbitMQ cluster, but
unless the operator manually intervenes, the new node will not contribute to the existing quorum queues;
for a 3-nodes cluster, this means that there is no HA anymore: if, sometime in the future, one of the other nodes fails, the queue is effectively lost;

Is there any way to mitigate this scenario? Is it, for example, possible to have nodes automatically rejoin all existing quorum queue clusters? Maybe by maintaining a list of "startup commands" (which run after RabbitMQ starts) to which we could add the rejoin commands?

Best Answer

The RabbitMQ team highly recommends the use of the official Kubernetes operator - https://www.rabbitmq.com/kubernetes/operator/operator-overview.html

Aside from that, here's what the local k8s expert has to say:

Kubernetes will not just randomly delete a persistent volume - if the node went down for some reason, it will start with the same name and the same data

As long as the same name and data is used, the "new" node will join just as if it were the old one.

There are probably scenarios that require manual intervention but they aren't as frequent as you'd think.

_{NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.}

Best Answer

Related Solutions

RabbitMQ – unexpected empty and unsynchronized queues after cluster node failure

RabbitMQ keeps messages in memory (memory overflow)

Related Topic