KVM+DRBD – Active-Passive Server Replication with Manual Switching

clusterdrbdhigh-availabilitykvm-virtualizationlinux

I need to build 2-node cluster(-like?) solution in active-passive mode, that is, one server is active while the other is passive (standby) that continuously gets the data replicated from active. KVM-based virtual machines would be running on active node.

In case of the active node being unavailable for any reason I would like to manually switch to the second node (becoming active and the other passive).

I've seen this tutorial: https://www.alteeve.com/w/AN!Cluster_Tutorial_2#Technologies_We_Will_Use

However, I'm not brave enough to trust fully automatic failover and build something that complex and trust it to operate correctly. Too much risk of split-brain situation, complexity failing somehow, data corruption, etc, while my maximum downtime requirement is not so severe as to require immediate automatic failover.

I'm having trouble finding information on how to build this kind of configuration. If you have done this, please share the info / HOWTO in an answer.

Or maybe it is possible to build highly reliable automatic failover with Linux nodes? The trouble with Linux high-availability is that there seems to have been a surge of interest in the concept like 8 years ago and many tutorials are quite old by now. This suggests that there may have been substantial problems with HA in practice and some/many sysadmins simply dropped it.

If that is possible, please share the info how to build it and your experiences with clusters running in production.

Best Answer

I have a very similar installation with the setup you described: a KVM server with a stanby replica via DRBD active/passive. To have a system as simple as possible (and to avoid any automatic split-brain, ie: due to my customer messing with the cluster network), I also ditched automatic cluster failover.

The system is 5+ years old and never gave me any problem. My volume setup is the following:

  • a dedicated RAID volume for VM storage;
  • a small overlay volume containing QEMU/KVM config files;
  • bigger volumes for virtual disks;
  • a DRBD resources managing the entire dedicated array block device.

I wrote some shell scripts to help me in case of failover. You can found them here

Please note that the system was architected for maximum performance, even at the expense of features as fast snapshots and file-based (rather than volume-based) virtual disks.

Rebuilding a similar, active/passive setup now, I would heavily lean toward using ZFS and continuous async replication via send/recv. It is not real-time, block based replication, but it is more than sufficient for 90%+ case.

If realtime replication is really needed, I would use DRBD on top of a ZVOL + XFS; I tested such a setup + automatic pacemaker switch in my lab with great satisfaction, in fact. If using 3rdy part modules (as ZoL is) is not possible, I would use a DRBD resources on top of a lvmthin volume + XFS.

Related Topic