Fixing Systemd Unit-File Timeout After Start

bashdebianlinuxshell-scriptingsystemd

I have written a bashscript with the following content:

#!/bin/bash
cd /opt/ut_server/System
./ucc-bin-linux-amd64 server DM-Rankin?game=XGame.xDeathMatch?mutator=AntiTCC2009r6.MutAntiTCCFinal,utcompv17a.MutUTComp?AdminName=admin?AdminPassword=1111 ini=server.ini -port=1234 -log=s9.log -nohomedir &

sleep 10
echo "finish ucc"
exit 0

and now I want start it with a unit file and systemd:

[Unit]
Description=Unreal Tournament 2004 Server
After=network.target

[Service]
WorkingDirectory=/home/unreal-user/
User=unreal-user
Group=unreal-user
Type=forking
ExecStart=/home/unreal-user/start_ut_serv.sh &
ExecStartPost=/bin/bash -c "umask 022; echo $MAINPID > /home/unreal-user/ut2k4-server.pid"
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -9 $MAINPID
TimeoutSec=400
ExecRestart=/bin/kill -9 $MAINPID && /home/unreal-user/start_ut_serv.sh
PIDFile=/home/unreal-user/ut2k4-server.pid
RestartSec=30
Restart=on-failure
#Restart=always

[Install]
WantedBy=multi-user.target

When I start the service file I've got an error like:

ut2k4-serv.service: Start operation timed out. Terminating.
systemd[1]: ut2k4-serv.service: Control process exited, code=killed, status=15/TERM

Why is this timing out here? How handle this problem?

When I start the bashscript manual it also seems that there is still something on the STDOUT from it, even though a job was specified in it? This is just curious..

Best Answer

This is mostly just copy-paste from the systemd.service documentation since it's largely enough.

Type=

<...>

The exec type is similar to simple, but the service manager will consider the unit started immediately after the main service binary has been executed. The service manager will delay starting of follow-up units until that point. (Or in other words: simple proceeds with further jobs right after fork() returns, while exec will not proceed before both fork() and execve() in the service process succeeded.) Note that this means systemctl start command lines for exec services will report failure when the service's binary cannot be invoked successfully (for example because the selected User= doesn't exist, or the service binary is missing).

If set to forking, it is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main service process, and the service manager will consider the unit started when the parent process exits. This is the behavior of traditional UNIX services. If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can reliably identify the main process of the service. systemd will proceed with starting follow-up units as soon as the parent process exits.

Your process is clearly not a forking process, you've attempted multiple hacks to make it one when it's natively supported by systemd.


PIDFile=

Takes a path referring to the PID file of the service. Usage of this option is recommended for services where Type= is set to forking. The path specified typically points to a file below /run/. If a relative path is specified it is hence prefixed with /run/. The service manager will read the PID of the main process of the service from this file after start-up of the service. The service manager will not write to the file configured here, although it will remove the file after the service has shut down if it still exists. The PID file does not need to be owned by a privileged user, but if it is owned by an unprivileged user additional safety restrictions are enforced: the file may not be a symlink to a file owned by a different user (neither directly nor indirectly), and the PID file must refer to a process already belonging to the service.

You're writing the systemd MAINPID variable to that file, but not only is MAINPID undocumented for ExecStartPost= but more importantly, since PIDFile is set, it is actually read from that file (at this point, I'm not too sure what it expands to since in this situation, it make no sense; it should have been read when ExecStart= returned so I'd say it's empty).


This section describes command line parsing and variable and specifier substitutions for ExecStart=, ExecStartPre=, ExecStartPost=, ExecReload=, ExecStop=, and ExecStopPost= options.

<...>

This syntax is inspired by shell syntax, but only the meta-characters and expansions described in the following paragraphs are understood, and the expansion of variables is different. Specifically, redirection using "<", "<<", ">", and ">>", pipes using "|", running programs in the background using "&", and other elements of shell syntax are not supported.

Your ExecStart= and ExecRestart= are clearly broken in addition to being pointless with regards to Type= and the default implementation of ExecRestart=.


As of now, your bash wrapper is useless and depending on what its aim is, you should remove it altogether and set the UT server directly as the ExecStart= argument. The server command-line can be sourced from an EnvironmentFile= and you're done!

Furthermore, unless that server is an absolute nightmare, stopping any kind of service with a SIGKILL is... bad, to say the least. The default implemention of ExecStop= will likely be better suited.

Next time you're writing a service script, try starting from the default one and progressively change parameters instead of changing everything and then wondering which one of the twenty parameters is broken, it's easier to debug 😉