Redhat – systemd doesn’t unmount NFS shares before stopping network

lacplinux-networkingnfsredhatsystemd

Context :

  • RHEL 7.2 up to day as for October 2016
  • Physical system
  • NetworkManager disabled
  • Network configured through the teaming of 2x10G NICs (eth0 & eth1) as lacp0
  • (irrelevant) IP addresses are configured on VLAN subinterfaces lacp0.XXX and lacp0.YYY
  • (also irrelevant) These systems are destined to be Oracle 12c nodes

The network connectivity is 100% OK, benchmarks confirms the LACP is fully functionnal and approaches the 20 GBps theoretical maximum.

Problem :

systemd doesn't detect that the network stack is stopped during shutdown and waits until too late to unmount NFS shares and thus fails to unmount them which leads to it hanging indefinitly for the NFS server to respond.

Symptom(s) :

After running "systemctl stop network.service", both network.target and network-online.target are still considered as active.

What I've come to so far :

NFS mounts added via the /etc/fstab file are translated into *.mount systemd units. Those units automatically depends on remote-fs.target which depends on `network-online.target.

From the documentation, it seems network*.target depends on a network management tool to detect whether the network is up and such. This can be NetworkManager, systemd-nerworkd, or anything else (but what ?). I think my problem may be here as it seems our jumpstart template relies on the old init scripts for managing the interfaces. And I doubt systemd can interact with it to be informed of the network being up or down (despite being used to stop the network stack with systemctl stop network)

My second hypothesis is that network teaming using libteam/teamd even through ifcfg-* files are out of the systemd network.target scope. There seem to be no dependency between the teamd systemd units (including teamd@lacp0.service) and the network units. That would explain why the only systems displaying this issue are those LACP-enabled systems, and we had not the problem before when using typical bonding.

So my question : What solution do I have to make sure my NFS shares are unmounted before my network stack is brought down, typically when rebooting the system ?

PS: would be better if the said solution didn't come from the way to create NFS mounts, so that someone who has to add a share to this server doesn't have to be informed of the special steps to take. This seems nearly impossible considering our production process.

Best Answer

Unfortunately the only "right" answer to this issue seems to be using a Network management tool which for now is either NetworkManager (Red Hat best practice) or systemd-networkd.

The workaround we used, in order to avoid using NetworkManager is this :

Edit /etc/systemd/system/teamd@.service.d/override.conf

[Unit]
Before=remote-fs.target

[Install]
WantedBy=network-online.target

[Service]
ExecStop=/bin/bash -c "while grep ' nfs ' /proc/mounts; do sleep 5; done"
TimeoutStopSec=30

This file will be concatenated to the system template of any teamd@<teamname>.service as /etc/systemd/system/* files take precedence on /usr/lib/systemd/system/

On stopping, systemd will initiate NFS unmounts first but by default doesn't wait for them to be finished. We then force teamd@.service, which is responsible for network connectivity, to wait at most 30 seconds for NFS shares to be unmounted before killing teamd daemons and continuing with the shutdown process.

References :

Related Topic