Proxmox drbd configuration split brain

drbdproxmoxvirtual-machines

I am planning a proxmox HA configuration with two Dell R710 machines (dual 6 core processors in each) with enterprise level drive raid arrays. I would be using DRBD with a quorum disk on a third machine. I would dedicate two 1GB nics on each server to the DRBD communications. We would have approximately 12 to 14 Virtual Machines running on this pair of servers. The proxmox manual recommends creating two DRBD resources – one for the Virtual Machines that normally run on ServerA and one for the Virtual Machines that normally run on ServerB. This is because of the Primary/Primary state in which this configuration runs. If both servers have VMs talking to the same DRBD resource and a split brain situation occurs, there is potential for data corruption that must be resolved.

While I understand it would take more effort to create new virtual machines, can anybody foresee any potential problems with running a separate DRBD resource for each VM instead? Does anyone have experience running a setup that way and has it worked well? It seems to me that would allow more flexibility in moving machines back and forth.

Best Answer

I did not have an experience with Poxmox but configured normal pacemaker/corosync cluster on CentOS, so hope my observations are still useful and applicable here.

I am very suspicious about Primary/Primary DRBD setup. Even with Primary/Secondary configuration split brain is probable if something goes wrong. I was wondered how easily DRBD can fall into split-brain condition in not well-tuned cluster.

With Primary/Primary case special attention should be done to fencing facilities in order to reduce data loss probability. Excellent introduction to two-node DRBD cluster is here.

Primary/Primary setup is needed mainly for live migration. If you do not use live migration Primary/Secondary is enough, and much more preferable.

Concerning your question, dedicated DRBD resource is also working solution. You will probably move storage stack from DRBD/LVM to LVM/DRBD. So, cluttered LVM becomes to be needed even in Primary/Secondary setup . UPD:Clustered LVM is not needed here as well as dlm to provide It.

Main disadvantage I see: a lot of manual careful work to prepare a VM storage.

Another point to think about in advance - backup strategy. With many DRBD resources it may be a bit more complicated.

I started my first cluster setup with LVM/DRBD stack and dedicated DRBD resource for VM, but later switched to more common DRBD/LVM since new VM provisioning is much simpler in this case.