Linux – What to use for software-based shared file storage

clusterdrbdlinuxload balancingstorage

The situation: setting up a load balancer

Currently we have all of our servers (running CentOS Linux) in pairs in our data center: each server has a mirroring server. We don't employ any load balancing at the moment, so serverA gets all traffic and when it fails (hardware or software) we can quickly switch to serverB by configuring the serverA IP address on serverB. We are using MySQL master/master replication (although we could simply use master/slave replication for the current set-up) and rsync to keep the vhost files in sync (serverA syncs to serverB).

This is working well for us, but it is quite inefficient as we have 50% of your hardware doing nothing until a machine fails. We are thinking about putting up load balancers in front of the server-pairs so we can divide the load to both machines and also add extra servers per cluster.

The problem: sharing file storage

Setting this up will probably not take much more than putting a load balancer in front of each server pair and then have it divide traffic to each server of the pair. Except for one thing: file storage. Currently rsync 'pushes' changes from serverA to serverB, but not the other way around. We can set it up so that rsync also runs from serverB to serverA, but the problem is that rsync never knows whether to create or delete a file that exists on serverA but not on serverB. I looked at Unison, but that project seems to be discontinued.

The question: what's the best solution for software-based shared file storage?

So, I'm looking for a different solution. Please mind that I don't want to add more hardware (so no NAS/SAN solution). Also mind that we need a low amount of storage (below 500GB) per cluster and that all servers are on the same local network. We have a decent back-up solution in place (back-ups run every 3 hours).

I've been looking at DRBD and that seems to fit our situation well, but I have no experience with that. Is DRBD the way to go for us? Please share you experience with this and other similar solutions. Any pitfalls to think about? Am I on the right track? Please enlighten me 🙂

Best Answer

DRBD is great.

The good things:

  • It does a magnificent job at replicating data
  • DRBD has in a couple of cases prevented disaster, where it has discovered that the volume was already mounted on the other node, which the raw volumes we get from a SAN are unable to tell us.
  • Heartbeat already has great support for DRBD.

The challenges:

  • Remember to monitor it properly, so you discover split brains when they happen - and can deal with it.
  • DRBD can't be mounted on both servers without a cluster-enabled filesystem on top - I don't have any experience with that part.
  • It is easy to "DOS" the servers by configuring DRBD to use all the available bandwidth for syncing disks. Just configure for lower throughput, and you're OK.

For mounting "the same" filesystem on several nodes, we keep going back to NFS, even though we keep testing various solutions for it. A setup I have no problem with having in production is NFS on top of EXT4 on top of DRBD. I wouldn't dare to do this with the database filesystems, but it's OK for the wwwroot.

Related Topic