I have two Linode containers. Box A is our general purpose web server. It occasionally needs to access Box B, which is set up as an NFS server.
When Box B reboots, Box A is unable to access any NFS shares no matter what I do. After several hours of troubleshooting, I was finally able to narrow it down to a single step fix.
After Box B reboots:
$ sudo service nfs restart
These are both CentOS 6.8 boxes, up to date. NFS-related packages were all installed via yum, I believe. I did have some trouble getting the whole thing set up; it was not a smooth process, but after restarting the nfs service(s), everything works great.
If I
$ sudo service --status-all
there is no difference before and after issuing the restart. Maybe it's a timing issue? But I don't know how to even begin to trouble shoot this. What can I do?
Other things of note:
-
I'm using autofs to automatically mount the share on demand from Box A, but the share won't mount manually either
-
I spend my days on Windows and Mac desktops and servers, but I've been running websites on Linux for many years. I'm proficient in the things I need to do, but it's not my area of comfort and I spend a lot of time googling how to do new things.
I don't even know where to check. I didn't see anything obvious in the logs, but tell me what to look for and I'll post.
Update
On Box B:
[shorowitz@BoxB ~]$ sudo chkconfig --list nfs
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[shorowitz@BoxB ~]$ sudo chkconfig --list nfslock
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Update 2
After a fresh reboot of BoxB, running
$ sudo showmount -e BoxB
from BoxA shows the expected mount points, but I'm unable to mount them. Simply restarting nfs on BoxB
$ sudo service nfs restart
Shutting down NFS daemon: [ OK ]
Shutting down NFS mountd: [ OK ]
Shutting down NFS services: [ OK ]
Shutting down RPC idmapd: [ OK ]
FATAL: Module nfsd not found.
FATAL: Error running install command for nfsd
Starting NFS services: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]
And the mounts are immediately available on BoxA. Those fatal errors appear on subsequent restarts as well when NFS is already working, so I don't know how relevant they are (I thought I had posted them already).
Additional Log Info
I issued the reboot command at 9:29 on Nov 15
grep -i "nfs" /var/log/message*
messages:Nov 15 09:29:08 BoxB kernel: nfsd: last server has exited, flushing export cache
messages:Nov 15 09:29:54 BoxB kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
messages:Nov 15 09:29:54 BoxB kernel: FS-Cache: Netfs 'nfs' registered for caching
messages:Nov 15 09:29:54 BoxB kernel: NFS: Registering the id_resolver key type
messages:Nov 15 09:29:54 BoxB kernel: nfs4filelayout_init: NFSv4 File Layout Driver Registering...
messages:Nov 15 09:29:54 BoxB kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
messages:Nov 15 09:29:54 BoxB kernel: xenfs: not registering filesystem on non-xen platform
messages:Nov 15 09:29:54 BoxB rpc.mountd[2740]: NFS v4 mounts will be disabled unless fsid=0
messages:Nov 15 09:29:54 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
messages:Nov 15 09:29:54 BoxB kernel: NFSD: starting 90-second grace period (net ****************)
messages:Nov 15 09:33:39 BoxB kernel: nfsd: last server has exited, flushing export cache
messages:Nov 15 09:33:40 BoxB kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
messages:Nov 15 09:33:40 BoxB kernel: NFSD: starting 90-second grace period (net ****************)
Update 3:
BoxB
[shorowitz@BoxB ~]$ sudo chkconfig --list | egrep "nfs|rpc"
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcgssd 0:off 1:off 2:off 3:on 4:on 5:on 6:off
rpcsvcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[shorowitz@BoxB ~]$ sudo iptables --list -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
0 0 REJECT all -- !lo * 127.0.0.0/8 0.0.0.0/0 reject-with icmp-port-unreachable
18 710 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW icmp type 8
471 26200 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 state NEW
204K 393M ACCEPT all -- * * {BoxA IP} 0.0.0.0/0
6721 754K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
2859 168K LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_INPUT_denied: '
9229 628K REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables_FORWARD_denied: '
0 0 REJECT all -- * * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable
Chain OUTPUT (policy ACCEPT 278K packets, 8386M bytes)
pkts bytes target prot opt in out source destination
[shorowitz@BoxB ~]$ sudo rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 38148 status
100024 1 tcp 45681 status
100005 1 udp 37846 mountd
100005 1 tcp 59259 mountd
100005 2 udp 59934 mountd
100005 2 tcp 42645 mountd
100005 3 udp 33867 mountd
100005 3 tcp 41823 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100021 1 udp 37287 nlockmgr
100021 3 udp 37287 nlockmgr
100021 4 udp 37287 nlockmgr
100021 1 tcp 37579 nlockmgr
100021 3 tcp 37579 nlockmgr
100021 4 tcp 37579 nlockmgr
This returns nothing:
grep -v "^#" /etc/sysconfig/nfs
BoxA
$ chkconfig --list | egrep "nfs|rpc"
nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
nfslock 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off
rpcgssd 0:off 1:off 2:off 3:on 4:on 5:on 6:off
rpcsvcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
$ iptables --list -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
390K 58M ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
0 0 REJECT all -- * * 0.0.0.0/0 127.0.0.0/8 reject-with icmp-port-unreachable
990K 7850M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
0 0 DROP all -- * * 43.255.188.145 0.0.0.0/0
8 388 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:587
11864 608K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25
1 40 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:636
4545 238K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
9759 553K ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:443
24 960 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080
320 19152 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
85 5681 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0
3254 194K LOG all -- * * 0.0.0.0/0 0.0.0.0/0 limit: avg 5/min burst 5 LOG flags 0 level 7 prefix `iptables denied: '
3634 227K DROP all -- * * 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1360K 1907M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
$ rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 55882 status
100024 1 tcp 58283 status
100011 1 udp 875 rquotad
100011 2 udp 875 rquotad
100011 1 tcp 875 rquotad
100011 2 tcp 875 rquotad
100005 1 udp 43136 mountd
100005 1 tcp 55047 mountd
100005 2 udp 51117 mountd
100005 2 tcp 42791 mountd
100005 3 udp 44511 mountd
100005 3 tcp 46535 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100021 1 udp 43509 nlockmgr
100021 3 udp 43509 nlockmgr
100021 4 udp 43509 nlockmgr
100021 1 tcp 38725 nlockmgr
100021 3 tcp 38725 nlockmgr
100021 4 tcp 38725 nlockmgr
$ mount | grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
Update 14 November
BoxA:
$ cat /etc/auto.master.d/nfs
xdata -rw boxb:/srv/nfs/xdata
xbackup -rw boxb:/srv/nfs/xbackup
zbackups -rw boxb:/srv/nfs/zbackups
$ mount | grep nfs
mount |grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
boxb:/srv/nfs/xdata on /mnt/nfs/xdata type nfs (rw,sloppy,vers=4,addr={boxb ip},clientaddr={boxa ip})
Best Answer
Can you update your question with more information?
On NFS Server run
Next one should return nothing if your nfs server settings are not customized.
However if you are running iptables - it should be customized. See here
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s2-nfs-nfs-firewall-config.html
On your client run
If you are running NFSV2 or NFSV3 you also need nfslock running on client
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-nfs-start.html
What does look strange in your output when starting NFS - is this below line
And I just realized one thing - since you are running on openvz - below link applies to your situation. See if this helps
https://openvz.org/NFS_server_inside_container
Edit 1: I ran tests today on openvz containers.
Message that you see
is harmless. It's described here https://openvz.org/NFS_server_inside_container
I could not reproduce your issue. After reboot of nfs server - nfs client was still able to browse files, create new files, delete files from the nfs share.
Can you now post your auto mount configuration and also output of
From Box A, which you already did. But you did it while auto mounted file system was unmounted. So now on Box A cd to auto mounted dir and then run above command. This will give information on which options were used during mount.
Also next time you reboot your Box B and you have this automount issue present - from Site A run this command
This will give information on which ports are involved.
In this kind of situations it's also good to collect tcpdumps on B and A.
I tend to think that there is nothing wrong with your config - but something weird is going on with iptables on vzhosts. Not in your containers, but on the hosts.
What you can also try to do is to install nmap on your host A and at the time of the problem - scan your host B to see which ports are open from the point of view of A (some might suggest netstat or ss, but in this case there is also openvz host firewall in front of the container and we don't know if they are blocking something or not)
Edit 2 (Nov 25) Something is very strange with your mount options
Compare your output
To mine below. Here is my /etc/auto.misc. Line 6 is default. Line 16 I added
So when I cd to /misc/nfs - my share gets mounted. But look at the default options on line 12.
First it's nfsv3 and it's using udp. Ok udp we can change to tcp by changing /etc/auto.misc to this
And mount options will change to
When I try to use in /etc/auto.misc
-fstype=nfs4
- I can't even cd to /misc/nfs which kind of makes sense since per openvz nfsv4 is not supported inside container https://openvz.org/NFS_server_inside_containerNotice that you have in your mount options
sloppy
and simplerw
. That basically means thatI'd suggest to change your auto mount maps to correct syntax (see my example) or see examples here https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s2-nfs-config-autofs.html
And test. I just ran tests with autofs in those same 2 containers running on 2 different openvz hosts - and after nfs server reboot - client is still happy.
Edit 3 I can't even reproduce your scenario. I changed my /etc/auto.misc to below, which is pretty much what you have
And after reboot and cd /misc/nfs I still have this
So in my case it's correctly trying to use nfsv3.
The only thing left to do now is
Activate debug in the /etc/autofs.conf
Enable debug severity to be send to
/var/log/messages
in the/etc/rsyslog.conf
. Change thisTo this (.info to .debug)
Restart autofs and rsyslog and then when you change to auto mounted location - you should see debug output in the
/var/log/messages
Here is output from my test system