How to Create a Fault-Tolerant NFS Setup

fault-tolerancelinuxnfs

Probably a FAQ but I haven't found anything useful after a while of searching:

Can I set up NFS in such a way that every single error (e.g. server CPU, hard disk, hd controller, network adapter, network cable, power supply) is masked without any need for immediate intervention?

I have only answers for parts of the problem: RAID, redundant power supply, redundant network adapters

How do I address CPU failure of the NFS server so that clients fail over transparently?

Best Answer

You could buy a system that can tollerate a CPU failure, or you could implement more than one server. You can create an NFS failover cluster fairly easily on Linux (I'm sure Sun et all have a mechanism for this too).

A fairly well supported/common way to do it is with heartbeat, (first link I found on Google, search NFS and heartbeat) to manage the cluster and then share the storage between the servers. The important thing to do with NFS to ensure a transparent failover is to also share the NFS state information which is usually in /var/lib/nfs. You can do that by putting it on the shared storage.

edit: Also setting the fsid option to the same value on the NFS export on each server will prevent you from getting stale file handles when the cluster fails over.

Related Topic