Recommendation for a small redundant VM infrastructure

infrastructureredundancystoragevirtualization

We currently have an active-active 2 node cluster running virtual machines. Every node has two disks, every disk is DRBD mirrored on the other node. Every node runs virtual machines out of his primary drbd device, and the pacemaker cluster will handle failover (if a node fails, the other becomes primary on both drbd devices and run all the VMs). This is colocated in a datacenter so our costs (beside hardware acquisition) are driven by how many rack units we occupy.

When you start out small this is a great solution, it fits in 2U of rack space (assume ethernet switch(es) is/are already there) and it's 100% redundant. But it's also a slightly difficult setup to manage and it suffers when I/O load goes too high (I guess that's just because of the low number of spindles).

I'm wondering what could be the best solution to scale above our hardware capacity while still being cost-effective and be as redundant as it's reasonable:

  • go on by adding more two node clusters with internal storage, maybe with bigger hardware (eg. 2U servers with more disks)
  • still use two nodes clusters but with external direct attached storage (stuff like 1U or 2U disk enclosures with SAS links) – see note below
  • separate storage and VMs (eg. a pair of storage nodes, mirrored by DRBD, which export iSCSI and manage failover by moving the iSCSI target IP, coupled with two or more diskless nodes which run VMs out of the static iSCSI target IP) – this seems to be what others are doing?
  • use something different from standard servers for the storage part (dedicated storage solutions on gigabit ethernet?)
  • anything else altogether ?

The split storage/application servers seems the most flexible and reasonable solution to me, we could easily add more storage nodes when needed and still use the current application servers or do the other way around when we hit capacity limits.

What do you think are good / bad choices? Do you already have experience on this kind of stuff without bigger budgets (I tend to rule out fibre channel or 10000 euros storage appliances)?

EDIT: To be clear, the idea is that by leveraging modern (and free) software you could implement redundancy just by adding more "commodity" hardware. It's not going to be screamin' fast nor super high availability, but it will let our VM run even if a motherboard dies for as long as it takes to get a spare to the DC and replace the part.

EDIT: I removed the usb mention because it's really not going anywhere (and thanks for pointing that out in replies). I don't really know how I forgot about SAS enclosures. As an example from the Dell website, an MD1000 is 2U with SAS links. Two enclosures attached to two storage nodes via SAS and they could do redundancy and export iSCSI.

Best Answer

You definitely want to separate the disks and VMs because you will want the VM nodes to access shared storage (rather than separate mirrored) so that failover operations are nearly seamless. I would deprecate OS-level clustering in favor of VM-level clustering as in my experience the data stores tend to be more vulnerable points than the hardware and OS (provided that the OS has been set up for stability) and OS-level problems affecting one node of a cluster tend to carry over to the other node (bad updates, netowrk issues, etc.) thereby rendering OS clustering ineffectual. The VMs should have local disks just to run the Hypervisors, but the VM machine disks should be on the shared storage (and you will want that shared storage at least at hardware RAID5). Putting the VMs into a shared resource cluster (a la VMWare) is the way to go because it allows you to do very granular automatic load balancing. With this setup, adding new hardware to the setup becomes a matter of adding the new VM server to the shared disk, putting the Hypervisor on it, and joining it to the cluster.

I don't have any recommendations on the type of shared storage, since people who know the world of shared storage and VMs tend to have very good data and I defer to their judgement.

Related Topic