Getting RabbitMQ Autocluster to produce a cluster on Rabbit MQ after Consul registration

rabbitmq

Using rabbitmq-autocluster configured for Consul, I can spin up 2 RMQ instances which properly register with Consul. I can verify this by looking at the Consul web GUI to see "2 passing" on the "rabbitmq" Service button. However, rabbitmqctl cluster_status does not indicate the RMQ instances are clustered.

Below you can see an error about not being able to connect to rabbit@node1, yet nothing is configured with that name. Is that the root or my problem? Do I need to set the RMQ Docker Container hostnames to make this work? I assumed rabbitmq-autocluster allowed a cluster setup without needing to mess with hostnames (/etc/hosts or DNS) because Consul would handle all that. Is that wrong?

Both RMQ and Consul are being run in Docker Containers run like so:

Containers:

RMQ 1

docker run --name rmq1 -d \
  -e AUTOCLUSTER_TYPE=consul \
  -e CONSUL_SCHEME=http \
  -e CONSUL_HOST=192.168.99.100 \
  -e CONSUL_PORT=8500 \
  -e CONSUL_SERVICE=rabbitmq \
  -e CLUSTER_NAME=rmqcluster \
  -l consul \
  -p 4369:4369 \
  -p 5672:5672 \
  -p 15672:15672 \
  -p 25672:25672 \
  gavinmroy/alpine-rabbitmq-autocluster

RMQ 2

Note the asymmetric port forwards

docker run --name rmq2 -d \
  -e AUTOCLUSTER_TYPE=consul \
  -e CONSUL_SCHEME=http \
  -e CONSUL_HOST=192.168.99.100 \
  -e CONSUL_PORT=8500 \
  -e CONSUL_SERVICE=rabbitmq \
  -e CLUSTER_NAME=rmqcluster \
  -l consul \
  -p 4370:4369 \
  -p 5673:5672 \
  -p 15673:15672 \
  -p 25673:25672 \
  gavinmroy/alpine-rabbitmq-autocluster

Consul

docker run --name consul \
-p 8400:8400 -p 8500:8500 -p 8600:53/udp \
-h consul progrium/consul \
-server -bootstrap -ui-dir /ui

From the developer's comment on this issue, it looks like rabbitmqctl cluster_status should be usable to establish that the RMQ instances are truly clustered. However, when I run this command, it does not show any clusters:

Logs

RMQ 1

=INFO REPORT==== 15-Feb-2016::19:57:56 ===                                    
node           : rabbit@edae08d9e0bc                                          
home dir       : /var/lib/rabbitmq                                            
config file(s) : /usr/lib/rabbitmq/etc/rabbitmq/rabbitmq.config               
cookie hash    : iqG7DCBA+lxNNLQq/Y6efg==                                     
log            : tty                                                          
sasl log       : tty                                                          
database dir   : /var/lib/rabbitmq/mnesia                                     
Setting default log settings                                                  
=INFO REPORT==== 15-Feb-2016::19:57:57 ===                                    
autocluster: Registering node with consul                                     

=INFO REPORT==== 15-Feb-2016::19:57:57 ===                                    
autocluster: Node appears to be the first in the cluster    

RMQ 2

Note it also appears to be the 1st node in the cluster

=INFO REPORT==== 15-Feb-2016::19:58:07 ===  
node           : rabbit@e9bd0b21c5af
home dir       : /var/lib/rabbitmq
config file(s) : /usr/lib/rabbitmq/etc/rabbitmq/rabbitmq.config
cookie hash    : iqG7DCBA+lxNNLQq/Y6efg==
log            : tty
sasl log       : tty
database dir   : /var/lib/rabbitmq/mnesia
Setting default log settings
=INFO REPORT==== 15-Feb-2016::19:58:08 ===
autocluster: Registering node with consul

=INFO REPORT==== 15-Feb-2016::19:58:08 ===
autocluster: Node appears to be the first in the cluster

RMQ 2

If I run docker restart rmq2, i get the below. Note the error

=INFO REPORT==== 15-Feb-2016::21:24:26 ===                                                                                                                                                                                                                              
node           : rabbit@e9bd0b21c5af
home dir       : /var/lib/rabbitmq
config file(s) : /usr/lib/rabbitmq/etc/rabbitmq/rabbitmq.config
cookie hash    : iqG7DCBA+lxNNLQq/Y6efg==
log            : tty
sasl log       : tty
database dir   : /var/lib/rabbitmq/mnesia
Setting default log settings
=INFO REPORT==== 15-Feb-2016::21:24:27 ===
autocluster: Registering node with consul

=ERROR REPORT==== 15-Feb-2016::21:24:32 ===
autocluster: Can not communicate with cluster nodes: [rabbit@node1]

=INFO REPORT==== 15-Feb-2016::21:24:32 ===

** EDIT **
The above was using a single Consul server on a separate machine from the two RMQ machines. I've tried this again using a Consul Container running on the same machines running each of the RMQ instances to act as a Consul Client. The RMQ instances will start and register with their co-hosted Consul Client. Both Consul Clients are connected to the same Consul Server. When starting one of the RMQ instances after enough time has elapsed for the 1st RMQ instance to fully register with Consul, we see this:

docker logs rmq2 | grep autoclusterautocluster: Registering node with consul 
autocluster: Can not communicate with cluster nodes: [rabbit@192]            
autocluster: Starting Consul Health Check TTL Timer          

It looks like Consul is registering each RMQ instance using it's IP address for the hostname, and because there's a . in it, it thinks it's an FQDN. If I set RABBITMQ_USE_LONGNAME to true, RMQ fails to boot with this output.

Best Answer

You need to give a meaningful hostname to your docker instances, and hostnames which can be resolved by both instances.

For instance, in the log you provided, here are the hostnames automatically generated by docker:

  • Node 1:

    node           : rabbit@edae08d9e0bc
    
  • Node 2:

    node           : rabbit@e9bd0b21c5af
    

Once you set manually a "resolvable" hostname for your instances, RabbitMQ nodes will be able to communicate and form a cluster.