Systemd execute command after start limit reached

systemd

I've been working on a systemd service to wrap an administration script and I'm trying to gracefully handle it completely breaking.

Right now I have Restart set to always so it will try again when something fails, but some failure states require attention (missing config file, bad SQL, etc), so I don't want it continuously spinning in the background in an uncorrectable state.

I found StartLimitInterval, StartLimitBurst, and StartLimitAction, which stops trying to restart it after X failures in Y seconds, but it turns out that the only actions available for StartLimitAction are rebooting or shutting down the machine, which is a little overkill.

I've been looking at OnFailure and wrote a mini service to send an alert email when it's triggered, but OnFailure triggers every time the service dies, not when it hits the start limit, so we get a bunch of emails instead of just one.

Any ideas of what to try next?

Best Answer

From the systemd.unit man page:

OnFailure=

A space-separated list of one or more units that are activated when this unit enters the "failed" state. A service unit using Restart= enters the failed state only after the start limits are reached.

However the second sentence appears to be a new constraint, as it is in the manual for version 241 of systemd on my Arch installations, but not in version 219 on my CentOS 7 installation.

You can check your systemd version with systemctl --version

I know it's an old question but just wanted to share for anyone else who has the same problem.

Related Solutions

Nfs – Systemd: start a unit after another unit REALLY starts

You can analyze systemd boot sequence by following command. View the output file by using a SVG supporting web browser.

systemd-analyze plot > test.svg

That plotting will provide you last boot's timing statistics, which will provide you more clarified point of view to problem.

I solved my NFS mounting problem by adding mount commands in to /etc/rc.local. However I'm not sure, will it work with glusterd integration, worth a try for a quick fix. In order to make systemd run rc.local you should satisfy following condition:

# grep Condition /usr/lib/systemd/system/rc-local.service
ConditionFileIsExecutable=/etc/rc.d/rc.local

Systemd – Start Multiple Processes with One Service File

Well, assuming that the only thing changing per unit file is the remote.example.com part, you can use an Instantiated Service.

From the systemd.unit man page:

Optionally, units may be instantiated from a template file at runtime. This allows creation of multiple units from a single configuration file. If systemd looks for a unit configuration file, it will first search for the literal unit name in the file system. If that yields no success and the unit name contains an "@" character, systemd will look for a unit template that shares the same name but with the instance string (i.e. the part between the "@" character and the suffix) removed. Example: if a service getty@tty3.service is requested and no file by that name is found, systemd will look for getty@.service and instantiate a service from that configuration file if it is found.

Basically, you create a single unit file, which contains a variable (usually %i) where the differences occur and then they get linked when you "enable" that service.

For example, I have a unit file called /etc/systemd/system/autossh@.service that looks like this:

[Unit]
Description=AutoSSH service for ServiceABC on %i
After=network.target

[Service]
Environment=AUTOSSH_GATETIME=30 AUTOSSH_LOGFILE=/var/log/autossh/%i.log AUTOSSH_PIDFILE=/var/run/autossh.%i.pid
PIDFile=/var/run/autossh.%i.pid
#Type=forking
ExecStart=/usr/bin/autossh -M 40000 -NR 5000:127.0.0.1:5000 -i /opt/ServiceABC/.ssh/id_rsa_ServiceABC -l ServiceABC %i

[Install]
WantedBy=multi-user.target

Which I've then enabled

[user@anotherhost ~]$ sudo systemctl enable autossh@somehost.example.com
ln -s '/etc/systemd/system/autossh@.service' '/etc/systemd/system/multi-user.target.wants/autossh@somehost.example.com.service'

And can intereact with

[user@anotherhost ~]$ sudo systemctl start autossh@somehost.example.com
[user@anotherhost ~]$ sudo systemctl status autossh@somehost.example.com
autossh@somehost.example.service - AutoSSH service for ServiceABC on somehost.example
   Loaded: loaded (/etc/systemd/system/autossh@.service; enabled)
   Active: active (running) since Tue 2015-10-20 13:19:01 EDT; 17s ago
 Main PID: 32524 (autossh)
   CGroup: /system.slice/system-autossh.slice/autossh@somehost.example.com.service
           ├─32524 /usr/bin/autossh -M 40000 -NR 5000:127.0.0.1:5000 -i /opt/ServiceABC/.ssh/id_rsa_ServiceABC -l ServiceABC somehost.example.com
           └─32525 /usr/bin/ssh -L 40000:127.0.0.1:40000 -R 40000:127.0.0.1:40001 -NR 5000:127.0.0.1:5000 -i /opt/ServiceABC/.ssh/id_rsa_ServiceABC -l ServiceABC somehost.example.com

Oct 20 13:19:01 anotherhost.example.com systemd[1]: Started AutoSSH service for ServiceABC on somehost.example.com.
[user@anotherhost ~]$ sudo systemctl status autossh@somehost.example.com
[user@anotherhost ~]$ sudo systemctl status autossh@somehost.example.com
autossh@somehost.example.com.service - AutoSSH service for ServiceABC on somehost.example.com
   Loaded: loaded (/etc/systemd/system/autossh@.service; enabled)
   Active: inactive (dead) since Tue 2015-10-20 13:24:10 EDT; 2s ago
  Process: 32524 ExecStart=/usr/bin/autossh -M 40000 -NR 5000:127.0.0.1:5000 -i /opt/ServiceABC/.ssh/id_rsa_ServiceABC -l ServiceABC %i (code=exited, status=0/SUCCESS)
 Main PID: 32524 (code=exited, status=0/SUCCESS)

Oct 20 13:19:01 anotherhost.example.com systemd[1]: Started AutoSSH service for ServiceABC on somehost.example.com.
Oct 20 13:24:10 anotherhost.example.com systemd[1]: Stopping AutoSSH service for ServiceABC on somehost.example.com...
Oct 20 13:24:10 anotherhost.example.com systemd[1]: Stopped AutoSSH service for ServiceABC on somehost.example.com.

As you can see, all instances of %i in the unit file get replaced with somehost.example.com.

There's a bunch more specifiers that you can use in a unit file though, but I find %i to work best in cases like this.

Best Answer

Related Solutions

Nfs – Systemd: start a unit after another unit REALLY starts

Systemd – Start Multiple Processes with One Service File

Related Topic