Linux – Is it a bad idea to use RAID 1 and NBD to create a mirror of a disk for failover

failoverlinuxraid

What I wanted to do is to create a mirror of one machine's disk on a failover machine so that, in the event the primary machine failed, I would just reboot the failover, select a different root partition and be ready to go.

I set it up like this:

  1. A primary machine and a failover machine.
  2. Both machines have a RAID partition defined.
  3. The failover machine serves its RAID partion via nbd-server.
  4. The primary machine mounts the failover's RAID partition via nbd-client.
  5. On the primary machine the two RAID partitions are combined via mdadm into a single RAID device with the --write-mostly flag set for the remote parition.

I wrote some scripts to automatically start everything up, configured grub on the failover so that it has the right options to allow you to boot from either a small mirroring partition or the failover partition. I tested it and it works.

The problem I ran into is that, about once a week, the primary machine seems to completely freeze up. You can't ssh into it, the console won't respond and, after rebooting the machine, the log entries just stop at a certain time and nothing in the log that indicates an error.

I disconnected the NBD partition and ran everything with just the local disk in the RAID array and it's run for a month without any problems.

Is NBD unstable? Could RAID decide to disconnect the local partition and run off the nbd partition at the same moment that the network fails in some way? Is this just the wrong way to go about it?

Thanks.

Best Answer

The problem you want to catch is a complex one.

For mirroring diskpartitions over network seems drbd the right choice. drbd is not trivial, but easy enough, to set it up correctly in some hours.

If your plan to make automatic failover of services on this machines, you would tale a look at Linux-HA.

But, you must aware, that HA is a very complex setup, which has a steep learning curve. All this stuff will be tested carefully before going into production. You have been warned!