Nerijus is right, this issue is caused by the HAProxy having a client timeout, which means, if the connection is considered idle for more than X ms then the connection is dropped.
TCP can send keep-alive packets to ensure an idle connection should stay open.
You can check your TCP parameter for keep-alive packets using the following command:
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
By default, this configuration is equal to 7200 seconds, which means TCP will begin to send keep-alive packets only after a connection is idle for more than 2 hours.
So, just update your HAProxy client timeout value to something > 2 hours, e.g:
timeout client 3h
And add the clitcpka option to your backend:
backend rabbitmq_backend
balance roundrobin
mode tcp
option clitcpka
server 0-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
server 1-rabbitmq_backend x.x.x.x:5672 maxconn 4000 check
I also deploy RabbitMQ clusters with Puppet, and you actually don't have to spin up all nodes at the exactly the same time.
What I usually do, and has so far worked for me is:
- install RabbitMQ (RPM or DEB)
- set up hosts file on each node, to conatins entries for all the three. example:
.
192.168.1.11 dev-c1n01-rabbitmq.example.com dev-c1n01-rabbitmq
192.168.1.12 dev-c1n02-rabbitmq.example.com dev-c1n02-rabbitmq
192.168.1.13 dev-c1n03-rabbitmq.example.com dev-c1n03-rabbitmq
nodes we're clustering together (because I don't want to rely on DNS being available)
* deploy rabbitmq.config
.
[
{rabbit, [
{cluster_nodes, {['rabbit@dev-c1n01-rabbitmq', 'rabbit@dev-c1n02-rabbitmq', 'rabbit@dev-c1n03-rabbitmq'], disc}},
{cluster_partition_handling, pause_minority},
{disk_free_limit, 2147483648},
{heartbeat, 0},
{tcp_listen_options, [binary, {backlog, 1024}, {nodelay, true}, {keepalive, true} ]},
{vm_memory_high_watermark, 0.6},
{default_user, <<"admin">>},
{default_pass, <<"somedefaultpass">>}
]},
{kernel, [
]}
,
{rabbitmq_management, [
{listener, [
{port, 15672}
]}
]}
].
% EOF
To create erlang cookie, I usually use http://passwordsgenerator.net/ and set it up to create 20 character string consistent only of uppercase characters. Then, put this string into /var/lib/rabbitmq/.erlang.cookie, like this:
echo -n 'LCQLSHVOPZFHRUXMMAPF' > /var/lib/rabbitmq/.erlang.cookie
- start nodes (order doesn't matter as long as they have same erlang.cookie and rabbitmq.config)
This should work for you. Tested on 3.2, 3.3 and 3.5 versions.
Best Answer
As I suspected, the heartbeat is solely configured client-side. This was confirmed in #rabbitmq on Freenode IRC. The particular issues I'm having with heartbeats are related to the client-side library that I'm using.