Nfs – Kubernetes can’t mount NFS volumes after NFS server update and reboot

kubernetesnfsnfs4opensuse

After zypper patch'ing NFS server on openSUSE Leap 15.2 to latest version and rebooting, nodes in kubernetes cluster (Openshift 4.5) can no longer mount NFS volumes.

NFS server version: nfs-kernel-server-2.1.1-lp152.9.12.1.x86_64

/etc/exports contains:

/nfs 192.168.11.*(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

Affected pods are in ContainerCreating status

kubectl describe pod/<pod_name> gives a following error:

Warning  FailedMount  31m   kubelet            MountVolume.SetUp failed for volume "volume" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/smart-services-http-video-stream --scope -- mount -t nfs nfs.example.invalid:/nfs/volume /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/pv-name
Output: Running scope as unit: run-r83d4e7dba1b645aca1e4693a48f45191.scope
mount.nfs: Operation not permitted

Server is running NFSv4 only, so rpcbind is turned off and showmount commands are not working.

Mounting directly on kubernetes node results in following error:

sudo mount.nfs4 nfs.example.invalid:/core tmp/ -v; echo $?
mount.nfs4: timeout set for Wed Jul 21 12:16:49 2021
mount.nfs4: trying text-based options 'vers=4.2,addr=192.168.11.2,clientaddr=192.168.11.3'
mount.nfs4: mount(2): Operation not permitted
mount.nfs4: Operation not permitted
32

firewalld rules on NFS server:

  services: ssh dhcpv6-client nfs mountd rpc-bind samba http tftp
  ports: 2049/tcp 2049/udp

AppArmor was working, turning it off haven't changed the outcome.

Before updating NFS server, everything was working fine and no other configuration changes were made. How can i debug this further and make shares mountable again?

Best Answer

After trying to debug this issue with rpcdebug to no avail, i've resorted to dumping traffic on nfs server coming from one of the nodes. This dump gave an interesting lead:

NFS reply xid 4168498669 reply ERR 20: Auth Bogus Credentials (seal broken)

So the issue was certainly not related to network or apparmor.

Then i've tried to change exports to

/nfs *(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

and everything worked, confirming that this issue lies in some sort of exports misconfiguration.

Rewriting rule to

/nfs 192.168.11.0/24(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

restored connectivity.

According to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/deployment_guide/s1-nfs-server-config-exports

wildcards — Where a * or ? character is used to take into account a grouping of fully qualified domain names that match a particular string of letters. Wildcards should not be used with IP addresses; however, it is possible for them to work accidentally if reverse DNS lookups fail.

So using * with IP address was a clear misconfiguration that somehow worked for months, and finally resulted in errors described in question.

Related Topic