We have several hosts where we have an identical hot spare host, which is patched and updated so it is very close to have to same software and config. In case of failure the network cable is switched and the DHCP server is updated with the new MAC address. This is best case, as there usually are a bit more that needs modification.
I feel it is a waste of electricity to have a hot spare host and waste of time to maintain it, and since config modifications are needed in case of failover, I'd like to ask the following:
Are hot spare hosts old school and there are better ways now?
Instead of having a hot spare host, would it make sense to make it a cold spare, take the hard drives and put them in the primary host and change the RAID from 1 to 1+1. In case of failure all I would have to do is change network cables, update the DHCP server, take the hard drives and insert them in the cold spare and power on. The benefit, as I see it, is that the 2×2 disks are always in sync, so only one host to maintain and no config changes are needed when failing over.
Is that a good idea?
Best Answer
Sobrique explains how the manual intervention causes your proposed solution to be sup-optimal, and ewwhite talks about probability of failure of various components. Both of those IMO make very good points and should be strongly considered.
There is however one issue that nobody seems to have commented on at all so far, which surprises me a little. You propose to:
This doesn't protect you against anything the OS does on disk.
It only really protects you against disk failure, which by moving from mirrors (RAID 1) to mirrors of mirrors (RAID 1+1) you greatly reduce the impact of to begin with. You could get the same result by increasing the number of disks in each mirror set (go from 2-disk RAID 1 to 4-disk RAID 1, for example), along with quite likely improving read performance during ordinary operations.
Well then, let's look at some ways this could fail.
rm -rf ../*
orrm -rf /*
instead ofrm -rf ./*
.Maybe, maybe, maybe... (and I'm sure there are plenty more ways your proposed approach could fail.) However, in the end this boils down to your "the two sets are always in sync" "advantage". Sometimes you don't want them to be perfectly in sync.
Depending on what exactly has happened, that's when you want either a hot or cold standby ready to be switched on and over to, or proper backups. Either way, RAID mirrors of mirrors (or RAID mirrors) don't help you if the failure mode involves much of anything aside from hardware storage device failure (disk crash). Something like ZFS' raidzN can likely do a little better in some regards but not at all better in others.
To me, this would make your proposed approach a no-go from the beginning if the intent is any sort of disaster failover.