NFS server not serving share after reboot

centos6initnfsopenvz

I have two Linode containers. Box A is our general purpose web server. It occasionally needs to access Box B, which is set up as an NFS server.

When Box B reboots, Box A is unable to access any NFS shares no matter what I do. After several hours of troubleshooting, I was finally able to narrow it down to a single step fix.

After Box B reboots:

$ sudo service nfs restart

These are both CentOS 6.8 boxes, up to date. NFS-related packages were all installed via yum, I believe. I did have some trouble getting the whole thing set up; it was not a smooth process, but after restarting the nfs service(s), everything works great.

If I

$ sudo service --status-all

there is no difference before and after issuing the restart. Maybe it's a timing issue? But I don't know how to even begin to trouble shoot this. What can I do?

Other things of note:

  • I'm using autofs to automatically mount the share on demand from Box A, but the share won't mount manually either

  • I spend my days on Windows and Mac desktops and servers, but I've been running websites on Linux for many years. I'm proficient in the things I need to do, but it's not my area of comfort and I spend a lot of time googling how to do new things.

I don't even know where to check. I didn't see anything obvious in the logs, but tell me what to look for and I'll post.

Update

On Box B:

 [shorowitz@BoxB ~]$ sudo chkconfig --list nfs
 nfs                0:off   1:off   2:on    3:on    4:on    5:on    6:off
 [shorowitz@BoxB ~]$ sudo chkconfig --list nfslock
 nfslock            0:off   1:off   2:on    3:on    4:on    5:on    6:off

Update 2

After a fresh reboot of BoxB, running

$ sudo showmount -e BoxB

from BoxA shows the expected mount points, but I'm unable to mount them. Simply restarting nfs on BoxB

 $ sudo service nfs restart
 Shutting down NFS daemon:                                  [  OK  ]
 Shutting down NFS mountd:                                  [  OK  ]
 Shutting down NFS services:                                [  OK  ]
 Shutting down RPC idmapd:                                  [  OK  ]
 FATAL: Module nfsd not found.
 FATAL: Error running install command for nfsd
 Starting NFS services:                                     [  OK  ]
 Starting NFS mountd:                                       [  OK  ]
 Starting NFS daemon:                                       [  OK  ]
 Starting RPC idmapd:                                       [  OK  ]

And the mounts are immediately available on BoxA. Those fatal errors appear on subsequent restarts as well when NFS is already working, so I don't know how relevant they are (I thought I had posted them already).

Additional Log Info

I issued the reboot command at 9:29 on Nov 15

 grep -i "nfs" /var/log/message*
 messages:Nov 15 09:29:08 BoxB kernel: nfsd: last server has exited, flushing export cache
 messages:Nov 15 09:29:54 BoxB kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
 messages:Nov 15 09:29:54 BoxB kernel: FS-Cache: Netfs 'nfs' registered for caching
 messages:Nov 15 09:29:54 BoxB kernel: NFS: Registering the id_resolver key type
 messages:Nov 15 09:29:54 BoxB kernel: nfs4filelayout_init: NFSv4 File Layout Driver Registering...
 messages:Nov 15 09:29:54 BoxB kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
 messages:Nov 15 09:29:54 BoxB kernel: xenfs: not registering filesystem on non-xen platform
 messages:Nov 15 09:29:54 BoxB rpc.mountd[2740]: NFS v4 mounts will be disabled unless fsid=0
 messages:Nov 15 09:29:54 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
 messages:Nov 15 09:29:54 BoxB kernel: NFSD: starting 90-second grace period (net ****************)
 messages:Nov 15 09:33:39 BoxB kernel: nfsd: last server has exited, flushing export cache
 messages:Nov 15 09:33:40 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
 messages:Nov 15 09:33:40 BoxB kernel: NFSD: starting 90-second grace period (net ****************)

Update 3:

BoxB

 [shorowitz@BoxB ~]$ sudo chkconfig --list | egrep "nfs|rpc"
 nfs                0:off   1:off   2:on    3:on    4:on    5:on    6:off
 nfslock            0:off   1:off   2:on    3:on    4:on    5:on    6:off
 rpcbind            0:off   1:off   2:on    3:on    4:on    5:on    6:off
 rpcgssd            0:off   1:off   2:off   3:on    4:on    5:on    6:off
 rpcsvcgssd         0:off   1:off   2:off   3:off   4:off   5:off   6:off

 [shorowitz@BoxB ~]$ sudo iptables --list -n -v
 Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source               destination         
     0     0 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
     0     0 REJECT     all  --  !lo    *       127.0.0.0/8          0.0.0.0/0           reject-with icmp-port-unreachable 
    18   710 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           state NEW icmp type 8 
   471 26200 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:22 state NEW 
  204K  393M ACCEPT     all  --  *      *       {BoxA IP}            0.0.0.0/0           
  6721  754K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
  2859  168K LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_INPUT_denied: ' 
  9229  628K REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable 

 Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source               destination         
     0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_FORWARD_denied: ' 
     0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable 

 Chain OUTPUT (policy ACCEPT 278K packets, 8386M bytes)
  pkts bytes target     prot opt in     out     source               destination        

 [shorowitz@BoxB ~]$ sudo rpcinfo -p
program vers proto   port  service
 100000    4   tcp    111  portmapper
 100000    3   tcp    111  portmapper
 100000    2   tcp    111  portmapper
 100000    4   udp    111  portmapper
 100000    3   udp    111  portmapper
 100000    2   udp    111  portmapper
 100024    1   udp  38148  status
 100024    1   tcp  45681  status
 100005    1   udp  37846  mountd
 100005    1   tcp  59259  mountd
 100005    2   udp  59934  mountd
 100005    2   tcp  42645  mountd
 100005    3   udp  33867  mountd
 100005    3   tcp  41823  mountd
 100003    2   tcp   2049  nfs
 100003    3   tcp   2049  nfs
 100003    4   tcp   2049  nfs
 100227    2   tcp   2049  nfs_acl
 100227    3   tcp   2049  nfs_acl
 100003    2   udp   2049  nfs
 100003    3   udp   2049  nfs
 100003    4   udp   2049  nfs
 100227    2   udp   2049  nfs_acl
 100227    3   udp   2049  nfs_acl
 100021    1   udp  37287  nlockmgr
 100021    3   udp  37287  nlockmgr
 100021    4   udp  37287  nlockmgr
 100021    1   tcp  37579  nlockmgr
 100021    3   tcp  37579  nlockmgr
 100021    4   tcp  37579  nlockmgr

This returns nothing:

 grep -v "^#" /etc/sysconfig/nfs

BoxA

 $ chkconfig --list | egrep "nfs|rpc"
 nfs                0:off   1:off   2:on    3:on    4:on    5:on    6:off
 nfslock            0:off   1:off   2:on    3:on    4:on    5:on    6:off
 rpcbind            0:off   1:off   2:on    3:on    4:on    5:on    6:off
 rpcgssd            0:off   1:off   2:off   3:on    4:on    5:on    6:off
 rpcsvcgssd         0:off   1:off   2:off   3:off   4:off   5:off   6:off

 $ iptables --list -n -v
 Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source               destination         
  390K   58M ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
     0     0 REJECT     all  --  *      *       0.0.0.0/0            127.0.0.0/8         reject-with icmp-port-unreachable 
  990K 7850M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
     0     0 DROP       all  --  *      *       43.255.188.145       0.0.0.0/0           
     8   388 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:587 
 11864  608K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:25 
     1    40 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:636 
  4545  238K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:80 
  9759  553K ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:443 
    24   960 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:8080 
   320 19152 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           state NEW tcp dpt:22 
    85  5681 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           
  3254  194K LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables denied: ' 
  3634  227K DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

 Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source               destination         
     0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

 Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
  pkts bytes target     prot opt in     out     source               destination         
 1360K 1907M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

 $ rpcinfo -p
program vers proto   port  service
 100000    4   tcp    111  portmapper
 100000    3   tcp    111  portmapper
 100000    2   tcp    111  portmapper
 100000    4   udp    111  portmapper
 100000    3   udp    111  portmapper
 100000    2   udp    111  portmapper
 100024    1   udp  55882  status
 100024    1   tcp  58283  status
 100011    1   udp    875  rquotad
 100011    2   udp    875  rquotad
 100011    1   tcp    875  rquotad
 100011    2   tcp    875  rquotad
 100005    1   udp  43136  mountd
 100005    1   tcp  55047  mountd
 100005    2   udp  51117  mountd
 100005    2   tcp  42791  mountd
 100005    3   udp  44511  mountd
 100005    3   tcp  46535  mountd
 100003    2   tcp   2049  nfs
 100003    3   tcp   2049  nfs
 100003    4   tcp   2049  nfs
 100227    2   tcp   2049  nfs_acl
 100227    3   tcp   2049  nfs_acl
 100003    2   udp   2049  nfs
 100003    3   udp   2049  nfs
 100003    4   udp   2049  nfs
 100227    2   udp   2049  nfs_acl
 100227    3   udp   2049  nfs_acl
 100021    1   udp  43509  nlockmgr
 100021    3   udp  43509  nlockmgr
 100021    4   udp  43509  nlockmgr
 100021    1   tcp  38725  nlockmgr
 100021    3   tcp  38725  nlockmgr
 100021    4   tcp  38725  nlockmgr

 $ mount | grep nfs
 nfsd on /proc/fs/nfsd type nfsd (rw)
 sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

Update 14 November

 BoxA:

 $ cat /etc/auto.master.d/nfs
 xdata  -rw boxb:/srv/nfs/xdata
 xbackup    -rw boxb:/srv/nfs/xbackup
 zbackups   -rw boxb:/srv/nfs/zbackups

 $ mount | grep nfs
 mount |grep nfs
 nfsd on /proc/fs/nfsd type nfsd (rw)
 sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
 boxb:/srv/nfs/xdata on /mnt/nfs/xdata type nfs (rw,sloppy,vers=4,addr={boxb ip},clientaddr={boxa ip})

Best Answer

Can you update your question with more information?
On NFS Server run

chkconfig --list | egrep "nfs|rpc"
iptables --list -n -v
rpcinfo -p

Next one should return nothing if your nfs server settings are not customized.

grep -v "^#" /etc/sysconfig/nfs

However if you are running iptables - it should be customized. See here
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s2-nfs-nfs-firewall-config.html

On your client run

chkconfig --list | egrep "nfs|rpc"
iptables --list -n -v
rpcinfo -p
mount | grep nfs

If you are running NFSV2 or NFSV3 you also need nfslock running on client

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-nfs-start.html

What does look strange in your output when starting NFS - is this below line

FATAL: Module nfsd not found.
 FATAL: Error running install command for nfsd

And I just realized one thing - since you are running on openvz - below link applies to your situation. See if this helps

https://openvz.org/NFS_server_inside_container

Edit 1: I ran tests today on openvz containers.
Message that you see

FATAL: Module nfsd not found.
FATAL: Error running install command for nfsd

is harmless. It's described here https://openvz.org/NFS_server_inside_container

I could not reproduce your issue. After reboot of nfs server - nfs client was still able to browse files, create new files, delete files from the nfs share.

Can you now post your auto mount configuration and also output of

mount|grep nfs

From Box A, which you already did. But you did it while auto mounted file system was unmounted. So now on Box A cd to auto mounted dir and then run above command. This will give information on which options were used during mount.

Also next time you reboot your Box B and you have this automount issue present - from Site A run this command

netstat -anp|grep ipofB

This will give information on which ports are involved.
In this kind of situations it's also good to collect tcpdumps on B and A.
I tend to think that there is nothing wrong with your config - but something weird is going on with iptables on vzhosts. Not in your containers, but on the hosts.

What you can also try to do is to install nmap on your host A and at the time of the problem - scan your host B to see which ports are open from the point of view of A (some might suggest netstat or ss, but in this case there is also openvz host firewall in front of the container and we don't know if they are blocking something or not)

Edit 2 (Nov 25) Something is very strange with your mount options

Compare your output

$ mount | grep nfs
 mount |grep nfs
 nfsd on /proc/fs/nfsd type nfsd (rw)
 sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
 boxb:/srv/nfs/xdata on /mnt/nfs/xdata type nfs (rw,sloppy,vers=4,addr={boxb ip},clientaddr={boxa ip}) 

To mine below. Here is my /etc/auto.misc. Line 6 is default. Line 16 I added

[root@vznfsclient /]# egrep -vn "^#|^$" /etc/auto.misc
6:cd            -fstype=iso9660,ro,nosuid,nodev :/dev/cdrom
16:nfs          -fstype=nfs             192.168.0.54:/nfs_export

So when I cd to /misc/nfs - my share gets mounted. But look at the default options on line 12.

[root@vznfsclient ~]# mount|egrep -n "nfs|auto"
4:nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
5:sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
10:/etc/auto.misc on /misc type autofs (rw,relatime,fd=6,pgrp=768,timeout=300,minproto=5,maxproto=5,indirect)
11:-hosts on /net type autofs (rw,relatime,fd=12,pgrp=768,timeout=300,minproto=5,maxproto=5,indirect)
12:192.168.0.54:/nfs_export/ on /misc/nfs type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.54,mountvers=3,mountport=42089,mountproto=udp,local_lock=none,addr=192.168.0.54) 

First it's nfsv3 and it's using udp. Ok udp we can change to tcp by changing /etc/auto.misc to this

[root@vznfsclient /]# egrep -vn "^#|^$" /etc/auto.misc
6:cd            -fstype=iso9660,ro,nosuid,nodev :/dev/cdrom
16:nfs          -fstype=nfs,proto=tcp   192.168.0.54:/nfs_export

And mount options will change to

192.168.0.54:/nfs_export/ on /misc/nfs type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.54,mountvers=3,mountport=45378,mountproto=tcp,local_lock=none,addr=192.168.0.54)

When I try to use in /etc/auto.misc -fstype=nfs4 - I can't even cd to /misc/nfs which kind of makes sense since per openvz nfsv4 is not supported inside container https://openvz.org/NFS_server_inside_container

Notice that you have in your mount options sloppy and simple rw. That basically means that

  1. Options passed to mount.nfs are not exactly correct, so it's trying to work around it. Read man mount.nfs and search for /sloppy.
  2. I assume it's trying to use nfsv4. Don't even know how it works if it's not supported inside container.

I'd suggest to change your auto mount maps to correct syntax (see my example) or see examples here https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s2-nfs-config-autofs.html

And test. I just ran tests with autofs in those same 2 containers running on 2 different openvz hosts - and after nfs server reboot - client is still happy.

Edit 3 I can't even reproduce your scenario. I changed my /etc/auto.misc to below, which is pretty much what you have

[root@vznfsclient nfs]# egrep -nv "^#|^$" /etc/auto.misc
6:cd            -fstype=iso9660,ro,nosuid,nodev :/dev/cdrom
16:nfs          -rw                     192.168.0.54:/nfs_export

And after reboot and cd /misc/nfs I still have this

[root@vznfsclient nfs]# mount|grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
192.168.0.54:/nfs_export/ on /misc/nfs type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.54,mountvers=3,mountport=46453,mountproto=udp,local_lock=none,addr=192.168.0.54)

So in my case it's correctly trying to use nfsv3.

The only thing left to do now is

Activate debug in the /etc/autofs.conf

[root@vznfsclient nfs]# grep -i debug /etc/autofs.conf|grep -v "^#"
    logging = debug

Enable debug severity to be send to /var/log/messages in the /etc/rsyslog.conf. Change this

[root@vznfsclient nfs]# grep info /etc/rsyslog.conf
*.info;mail.none;authpriv.none;cron.none                -/var/log/messages

To this (.info to .debug)

[root@vznfsclient nfs]# grep debug /etc/rsyslog.conf
*.debug;mail.none;authpriv.none;cron.none                -/var/log/messages

Restart autofs and rsyslog and then when you change to auto mounted location - you should see debug output in the /var/log/messages

Here is output from my test system

Nov 25 03:06:00 vznfsclient automount[583]: attempting to mount entry /misc/nfs
Nov 25 03:06:00 vznfsclient automount[583]: lookup_mount: lookup(file): looking up nfs
Nov 25 03:06:00 vznfsclient automount[583]: lookup_mount: lookup(file): nfs -> -rw#011#011#011192.168.0.54:/nfs_export
Nov 25 03:06:00 vznfsclient automount[583]: parse_mount: parse(sun): expanded entry: -rw#011#011#011192.168.0.54:/nfs_export
Nov 25 03:06:00 vznfsclient automount[583]: parse_mount: parse(sun): gathered options: rw
Nov 25 03:06:00 vznfsclient automount[583]: parse_mount: parse(sun): dequote("192.168.0.54:/nfs_export") -> 192.168.0.54:/nfs_export
Nov 25 03:06:00 vznfsclient automount[583]: parse_mount: parse(sun): core of entry: options=rw, loc=192.168.0.54:/nfs_export
Nov 25 03:06:00 vznfsclient automount[583]: sun_mount: parse(sun): mounting root /misc, mountpoint nfs, what 192.168.0.54:/nfs_export, fstype nfs, options rw
Nov 25 03:06:00 vznfsclient automount[583]: mount_mount: mount(nfs): root=/misc name=nfs what=192.168.0.54:/nfs_export, fstype=nfs, options=rw
Nov 25 03:06:00 vznfsclient automount[583]: mount_mount: mount(nfs): nfs options="rw", nobind=0, nosymlink=0, ro=0
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: called with host 192.168.0.54(192.168.0.54) proto 6 version 0x40
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: called with host 192.168.0.54(192.168.0.54) proto 6 version 0x70
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: nfs v3 rpc ping time: 0.000366
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: host 192.168.0.54 cost 365 weight 0
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: called with host 192.168.0.54(192.168.0.54) proto 17 version 0x70
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: nfs v3 rpc ping time: 0.000507
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: nfs v2 rpc ping time: 0.000692
Nov 25 03:06:00 vznfsclient automount[583]: get_nfs_info: host 192.168.0.54 cost 599 weight 0
Nov 25 03:06:00 vznfsclient automount[583]: prune_host_list: selected subset of hosts that support NFS3 over TCP
Nov 25 03:06:00 vznfsclient automount[583]: mount_mount: mount(nfs): calling mkdir_path /misc/nfs
Nov 25 03:06:00 vznfsclient automount[583]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw 192.168.0.54:/nfs_export /misc/nfs
Nov 25 03:06:00 vznfsclient automount[583]: spawn_mount: mtab link detected, passing -n to mount
Nov 25 03:06:00 vznfsclient automount[583]: mount(nfs): mounted 192.168.0.54:/nfs_export on /misc/nfs
Nov 25 03:06:00 vznfsclient automount[583]: ioctl_send_ready: token = 28
Nov 25 03:06:00 vznfsclient automount[583]: mounted /misc/nfs