We've got a setup involving separate web & email servers mounting various shares on the same pysical NFSv3 server.
The web server uses a share for apache vhost files, the mailserver uses a share for user mail. (Maildir format, courier-imap)
The mailserver, suddenly and without warning, seems to be randomly 'locking up' due to the nfs share becoming unresponsive. 'df' hangs when it gets to the mounted nfs share. Any attempt to access the share from within the system effectively hangs the process.
The web server, so far – remains unaffected, so I'd like to think we can rule out issues on the NFS server. Both web & email use the same mount options, of which I've tried several combinations, to no avail. Sometimes it runs without issue for weeks, yet we just had it lock up less than 24 hours after increasing the amount of NFS threads on the server & remounting the shares.
Any comments or suggestions ?
rpcinfo output:
rpcinfo -p localhost
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 60949 status
100021 1 udp 32769 nlockmgr
100021 3 udp 32769 nlockmgr
100021 4 udp 32769 nlockmgr
100021 1 tcp 41693 nlockmgr
100021 3 tcp 41693 nlockmgr
100021 4 tcp 41693 nlockmgr
rpcinfo -p $nfs_server
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 752 status
100024 1 tcp 755 status
100011 1 udp 613 rquotad
100011 2 udp 613 rquotad
100011 1 tcp 616 rquotad
100011 2 tcp 616 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100021 1 udp 37709 nlockmgr
100021 3 udp 37709 nlockmgr
100021 4 udp 37709 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100021 1 tcp 54549 nlockmgr
100021 3 tcp 54549 nlockmgr
100021 4 tcp 54549 nlockmgr
100005 1 udp 651 mountd
100005 1 tcp 654 mountd
100005 2 udp 651 mountd
100005 2 tcp 654 mountd
100005 3 udp 651 mountd
100005 3 tcp 654 mountd
Best Answer
In my experience. NFS is notorious for problems like this. Could it be related to a problem with your network switch?
Do the web server and mail server access the same NFS server? if not, try moving the NFS server for your email to a different network port and see if that helps.
Otherwise try some of these options in your fstab file. increase the timeouts. Set the soft option. You might also like to try the fsc (filesystem cache) option. I'm hoping that it will cache the write operation and solve your problem.
See: http://linux.die.net/man/5/nfs
On a side note, your web server is mostly reading. Your mail server is mostly performing writes.
If all that fails, I'd consider ditching nfs and using iSCSI instead.