I've been working on a systemd service to wrap an administration script and I'm trying to gracefully handle it completely breaking.
Right now I have Restart
set to always
so it will try again when something fails, but some failure states require attention (missing config file, bad SQL, etc), so I don't want it continuously spinning in the background in an uncorrectable state.
I found StartLimitInterval
, StartLimitBurst
, and StartLimitAction
, which stops trying to restart it after X failures in Y seconds, but it turns out that the only actions available for StartLimitAction
are rebooting or shutting down the machine, which is a little overkill.
I've been looking at OnFailure
and wrote a mini service to send an alert email when it's triggered, but OnFailure triggers every time the service dies, not when it hits the start limit, so we get a bunch of emails instead of just one.
Any ideas of what to try next?
Best Answer
From the systemd.unit man page:
However the second sentence appears to be a new constraint, as it is in the manual for version 241 of systemd on my Arch installations, but not in version 219 on my CentOS 7 installation.
You can check your systemd version with
systemctl --version
I know it's an old question but just wanted to share for anyone else who has the same problem.