Nfs – Systemd: start a unit after another unit REALLY starts

glusterfsnfssystemd

In my particular case I want to start remote-fs unit after all glusterfs completely starts.

My systemd files:

glusterfs target:

node04:/usr/lib/systemd/system # cat glusterfsd.service 
[Unit]
Description=GlusterFS brick processes (stopping only)
After=network.target glusterd.service

[Service]
Type=oneshot
ExecStart=/bin/true
RemainAfterExit=yes
ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"

[Install]
WantedBy=multi-user.target

remote-fs target:

node04:/usr/lib/systemd/system # cat remote-fs.target 
[Unit]
Description=Remote File Systems
Documentation=man:systemd.special(7)
Requires=glusterfsd.service
After=glusterfsd.service remote-fs-pre.target
DefaultDependencies=no
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target

OK, all Gluster daemons start successful and I want to mount Gluster filesystem via NFS, but Gluster's NFS share gets ready not immediately after glusterfs.service started, but a few seconds later, so usually remote-fs is unable to mount it even regarding Requires and After directives.

Let's see the log:

Apr 14 16:16:22 node04 systemd[1]: Started GlusterFS, a clustered file-system server.
Apr 14 16:16:22 node04 systemd[1]: Starting GlusterFS brick processes (stopping only)...
Apr 14 16:16:22 node04 systemd[1]: Starting Network is Online.
Apr 14 16:16:22 node04 systemd[1]: Reached target Network is Online.
Apr 14 16:16:22 node04 systemd[1]: Mounting /stor...

Here everything is OK, remote filesystem (/stor) seems to be mount after glusterfs started, as it meant to be according to unit files… But the next lines are:

//...skipped.....
Apr 14 16:16:22 node04 systemd[1]: Started GlusterFS brick processes (stopping only).

What? GlusterFS got ready only for this moment! And then we see:

//...skipped.....
Apr 14 16:16:23 node04 mount[2960]: mount.nfs: mounting node04:/stor failed, reason given by server: No such file or directory
Apr 14 16:16:23 node04 systemd[1]: stor.mount mount process exited, code=exited status=32
Apr 14 16:16:23 node04 systemd[1]: Failed to mount /stor.
Apr 14 16:16:23 node04 systemd[1]: Dependency failed for Remote File Systems.
Apr 14 16:16:23 node04 systemd[1]: Unit stor.mount entered failed state.

Mount failed because NFS server was not ready when systemd attempted to mount the storage.

Due to non-deterministic nature of systemd boot process, sometimes (approx. 1 of 10 boots) mounting this filesystem on boot succeeds.

If onboot mount was unsuccessful, I can login to the server and manually mount the /stor directory, so Gluster's NFS service seems to work fine.

So how to start remote-fs after glusterfsd, i.e. after Started GlusterFS brick processes line appears in the log?

remote-fs seems to be one of the very last targets, so I can't get it start after another "workaround" target which is in fact not required by remote-fs.

Best Answer

You can analyze systemd boot sequence by following command. View the output file by using a SVG supporting web browser.

systemd-analyze plot > test.svg

That plotting will provide you last boot's timing statistics, which will provide you more clarified point of view to problem.

I solved my NFS mounting problem by adding mount commands in to /etc/rc.local. However I'm not sure, will it work with glusterd integration, worth a try for a quick fix. In order to make systemd run rc.local you should satisfy following condition:

# grep Condition /usr/lib/systemd/system/rc-local.service
ConditionFileIsExecutable=/etc/rc.d/rc.local