Use an Amazon Load Balancer in front of multiple GlusterFS EC2 instances

amazon ec2glusterfsload balancing

I'm encountering an interesting problem with setting up a High-Availability file system cluster on EC2. The idea behind the setup is simple: 2 GlusterFS nodes are in two separate availability zones synchronizing data between themselves. I can mount either of these two servers on any other EC2 instances without any problems.

However, in the interests of spreading things out and also migrating off of bad nodes, I want to put this behind a Load Balancer. The problem seemed simple enough, I opened ports on the load balancer and then set the host to the load balancer instead of the individual glusterFS node, however, it insists that it can't make the connection. I thought this might be a firewall issue and to rule that out, I actually opened ports 1024-65535. A terrible idea for sure, but I needed to rule that out.

Here's what the logs say:

[2013-04-24 21:51:03.581564] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.1
[2013-04-24 21:51:03.608884] W [socket.c:1512:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Transport endpoint is not connected), peer (1.2.3.4:24007)

The strange part is, I can connect to that IP fine via telnet on the same port.

Has anyone done this before, or have any insights as to a way I can work around this?

Thanks!

Best Answer

From the sounds of it, you're attempting to load-balance the initial poll that clients do to discover the Gluster infrastructure.

mount -t glusterfs loadbalancer:Your_VOL /usr/local/specialhome

This is only for the initial connection. Once a client successfully pulls a topology for the volume its interested in it'll connect directly to the bricks it needs to. At that point the LB is out of the loop.

Gluster doesn't like that, as you're learning.

There are a couple of ways of solving this per Gluster accepted practice:

  • A round-robin DNS entry
  • Different load-options

Round-Robin is not Load balancing is a phrase you hear a lot around here, but in this case it's not that bad. You're only using it for the initial connection, and that's it.

The mount-option is:

mount -t glusterfs -o backupvolfile-server=rrdns-02 rrdns-01:Your_VOL /usr/local/specialhome

The backupvolfile-server option tells the mount to use a different name in case the one given on the mount option directly isn't responding. Using these two methods in combination will allow you to deal with temporarily down nodes.

Related Topic