Node.JS systemd service won’t restart

node.jssystemd

I have a simple systemd service unit to start up my Node.JS web server, and for some reason, the Restart=on-failure is not working and restarting the process.

Here is my service unit file (with proprietary names removed):

[Unit]
Description=Node.JS web server
After=network.target

[Service]
User=villa
Environment=NODE_PATH=.
WorkingDirectory=/path/to/server/code
PermissionsStartOnly=true
ExecStart=/usr/local/bin/node server.js
ExecStop=/bin/killall node
Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target

Next, I do a daemon-reload, then restart the process, and kill it with a SIGKILL like so:

[root@localhost ~]# ps -ef | grep node
villa    24783     1 17 10:54 ?        00:00:00 /usr/local/bin/node server.js
root     25172 26051  0 10:54 pts/1    00:00:00 grep --color=auto node
[root@localhost ~]# kill -9 24783
[root@localhost ~]# sleep 2
[root@localhost ~]# ps -ef | grep node
root     29462 26051  0 10:55 pts/1    00:00:00 grep --color=auto node

As you can see, even after waiting for longer than the RestartSec setting, the process doesn't start back up.

This is what is there in the status after killing the process like above:

[root@localhost ~]# systemctl -l status webserver.service
● webserver.service - Node.JS web server
   Loaded: loaded (/etc/systemd/system/webserver.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-05-03 10:54:53 EDT; 7min ago
  Process: 27843 ExecStop=/bin/killall node (code=exited, status=1/FAILURE)
  Process: 24783 ExecStart=/usr/local/bin/node server.js (code=killed, signal=KILL)
 Main PID: 24783 (code=killed, signal=KILL)

May 03 10:54:31 localhost.localdomain node[24783]: <...web server's standard output, nothing abnormal at all...>
May 03 10:54:53 localhost.localdomain systemd[1]: webserver.service: main process exited, code=killed, status=9/KILL
May 03 10:54:53 localhost.localdomain systemd[1]: webserver.service: control process exited, code=exited status=1
May 03 10:54:53 localhost.localdomain systemd[1]: Unit webserver.service entered failed state.
May 03 10:54:53 localhost.localdomain systemd[1]: webserver.service failed.

The odd thing is, if I use this exact same service unit file but with the command /usr/bin/sleep 1000 instead of node server.js, the sleep process is restarted correctly and immediately after my kill -9. So there must be something odd going on with Node.JS.

Any ideas as to why my Node process isn't starting back up?

Best Answer

Turns out that my systemd service unit file was correct all along (minus the removal of the ExecStop= line that Mark posted, which made my file more correct). My issue was that my service unit file was located in /usr/lib/systemd/system, and unfortunately, another developer placed the same file--minus the Restart= line--in /etc/systemd/system, without telling me.

According to systemd.unit(5) (man systemd.unit):

Unit files are loaded from a set of paths determined during compilation, described in the two tables below. Unit files found in directories listed earlier override files with the same name in directories lower in the list.

   Table 1.  Load path when running in system mode (--system).
   ┌────────────────────────┬─────────────────────────────┐
   │Path                    │ Description                 │
   ├────────────────────────┼─────────────────────────────┤
   │/etc/systemd/system     │ Local configuration         │
   ├────────────────────────┼─────────────────────────────┤
   │/run/systemd/system     │ Runtime units               │
   ├────────────────────────┼─────────────────────────────┤
   │/usr/lib/systemd/system │ Units of installed packages │
   └────────────────────────┴─────────────────────────────┘

So, in short, systemd was seeing the file in /etc/systemd/system that didn't have the Restart= line in it before it was seeing my updated file in /usr/lib/systemd/system which did have the Restart= line. I just had to remove the outdated file, and my problem was solved.