Problem 1
I want to monitor a headless running LibreOffice-Process with monit version 5.25.1.
Here is my monit config for this approach:
cat /etc/monit/conf.d/libreoffice
check program lo-check-8101 with path "/bin/bash /opt/libreoffice/chkloproc.sh TestLOPort8101 8101"
with timeout 10 seconds
if status != 0 then exec "/bin/bash /opt/libreoffice/loproc_is_down.sh"
if status = 0 then exec "/bin/bash /opt/libreoffice/loproc_is_up.sh"
This LibreOffice Instance is listening on port 8101.
The check-script is returning 0 if everything is ok and 101 if there is an
error with that LibreOffice Instance. I'm testing the text conversion of this
running LibreOffice Process by sending HTML, requesting TEXT and check the
response.
The action-scripts (loproc_is_down.sh / loproc_is_up.sh) are adding / deleting
an iptables rule to pronounce the status to a running haproxy, who is port-checking that
LibreOffice Instance / Process … if this sounds a little bit complicated, I'm sorry, but that is not
the problem I would like to talk about here.
The problem is, that I don't understand, why monit is logging the following entries:
monit log after restart
[CET Oct 29 16:58:18] info : Starting Monit 5.25.1 daemon with http interface at [localhost]:2812
[CET Oct 29 16:58:18] info : Monit start delay set to 10s
[CET Oct 29 16:58:28] info : 'host1' Monit 5.25.1 started
[CET Oct 29 16:58:58] error : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 16:58:58] info : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_up.sh'
[CET Oct 29 16:59:28] error : 'lo-check-8101' status failed (0) -- no output
… and the following status screen from 'monit status':
monit status
Monit 5.25.1 uptime: 0m
Program 'lo-check-8101'
status Status failed
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 0
last output -
data collected Tue, 29 Oct 2019 16:58:58
System 'host1'
status OK
monitoring status Monitored
monitoring mode active
on reboot start
load average [0.03] [0.02] [0.01]
cpu 0.6%us 0.6%sy 0.0%wa
memory usage 543.9 MB [7.8%]
swap usage 0 B [0.0%]
uptime 20d 1h 11m
boot time Wed, 09 Oct 2019 16:47:51
data collected Tue, 29 Oct 2019 16:58:58
To me it seems, that the check-script is returning exit value 0 but status is reported / interpreted as "Status failed".
I don't understand, why monit is reporting an "error: … status failed (0)" in its logfile.
What does status mean other than the interpretation of the last exit code of the given check-script programm?
Problem 2
And there is another reaction from monit, which I can't understand, perhaps anybody can explain it to me?
When I try to fake a broken LibreOffice Process by stopping it, monit does recognize this after one cycle and is starting the wanted / configured action-script 'loproc_is_down.sh' and reporting the last exit code correctly as 101, but with the log-line
"info: status succeeded (101)"
for the first cycle and again then with
"error: status failed (101)"
monit log with faked failure
[CET Oct 29 17:14:28] info : 'lo-check-8101' status succeeded (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:14:28] error : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:14:28] info : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_down.sh'
[CET Oct 29 17:14:58] error : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:15:28] error : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
The opposite is when starting that LibreOffice Process again:
monit log when service is running again
[CET Oct 29 17:15:58] error : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 17:15:58] info : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_up.sh'
[CET Oct 29 17:15:58] info : 'lo-check-8101' status succeeded (0) -- no output
[CET Oct 29 17:16:28] error : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 17:16:58] error : 'lo-check-8101' status failed (0) -- no output
Which looks like monit runs that check-script, which is returning exit code 0 and starts the action-script "loproc_is_up.sh" and reports it with "status succeeded (0)"
… but then again is logging "error: status failed (0)" in the following cycles.
I am not understanding the meaning of "status" in the monit concept / documentation … can somebody explain it to me?
Thank you for reading this long post and hopefully help me with an answer.
Best Answer
Monit is there to catch problems on a monitored entity.
So - line by line - your config tells Monit:
Execute a binary. Store the exit code and some additional info.
A problem occurs if status is not 0. Now execute a binary.
A problem occurs if status is 0. Now execute a binary. - I don't even get what the result of this call should be. Everything's okay here, so why executing something?
So to say: With this config there is not "success" (= everything is fine) case.
To optimize it, you should only catch problems with Monit:
This means nothing is done by Monit if status is 0.
Some more words on the config:
check process
and perhaps somesend
/expect
magic to verify the service is running..sh
files executable (+x
; ie.chmod +x /opt/libreoffice/*.sh
) and you have a correct shebang in those files, you can omit/bin/bash
in your executes for better readability.My config on this (not knowing what protocol is used by
:8101
, assuming http) would be more like this:Getting
loadavg
withper core
requires latest Monit-version. So it might not be available in your distro, so I commented out this line ;)Edit after response from OP (I hope you get notified):
(it's really a pain that we cannot comment < 50 Rep...)
If I get it right, you have to convert something to get the state of the application, if conversion fails the app should be restarted. Translated to Monit:
... where the
CONVERT_HERE
executable exits with 0 if converting goes well and <>0 if it fails. I still feel I missed something here. ;)Could you perhaps drop all three executables to a gist or something?