Linux – Monit: how do you stop monit from running the exec statement every time the test fails

linuxmonit

How do you stop monit from running the exec statement every time the test fails? The statement in my monitrc is:

check filesystem tmpfs with path /var                                           
    if space > 90% then exec "/usr/bin/logger -p daemon.crit 'MAJOR: space test'"

This seems weird because someone else asked a question in which he was doing an alert and it had the behaviour I wanted. I'm ready to start choke slamming linux boxes.

Edit: here is the opposite case Repeat monit alerts

Is it because he is using alert not exec?

Best Answer

I had to deal with a similar issue a few times ago.

The fact is that monit is not able to do this, as far as i know.

With monit you can deal with X times and/or Y cycles directives, but more or less quickly the exec action will be triggered more than once, depending on time you spend to fix the issue.

So, finally, i've decided to write my own check script to handle all the logic, based on flags.

I'm going to share this with you, then you take or not, it's up to you.

First : Write the script to monitor FS usage, let's say /root/check_fsspace.sh :

#!/bin/sh

myFS=/var
myTreshold=90
flagFile=/tmp/flag

spaceused=$(df -h | grep "$myFS" | tr -s " " | cut -d" " -f5 | cut -d"%" -f1)

if [ $spaceused -gt $myTreshold ]; then
  if [ ! -f $flagFile ]; then
     touch $flagFile
     exit 1
  else
     exit 0
  fi
fi

if [ $spaceused -le $myTreshold ]; then
   rm -f $flagFile
   exit 0
fi

Here i assume you can understand the script. If not, tell me, i will explain it.

Second : Setup your monit service definition :

check program check_fs with path "/root/check_fsspace.sh"
  if status != 0 then exec "/usr/bin/logger -p daemon.crit 'MAJOR: space test'"

Related Solutions

Monit – How to Disable Instance Start/Stop Alerts

Monit can, according to the documentation generate a number of alerts:

Event:     | Failure state:              | Success state:
---------------------------------------------------------------------
action     | "Action done"               | "Action done"
checksum   | "Checksum failed"           | "Checksum succeeded"
bytein     | "Download bytes exceeded"   | "Download bytes ok"
byteout    | "Upload bytes exceeded"     | "Upload bytes ok"
connection | "Connection failed"         | "Connection succeeded"
content    | "Content failed",           | "Content succeeded"
data       | "Data access error"         | "Data access succeeded"
exec       | "Execution failed"          | "Execution succeeded"
fsflags    | "Filesystem flags failed"   | "Filesystem flags succeeded"
gid        | "GID failed"                | "GID succeeded"
icmp       | "Ping failed"               | "Ping succeeded"
instance   | "Monit instance changed"    | "Monit instance changed not"
invalid    | "Invalid type"              | "Type succeeded"
link       | "Link down"                 | "Link up"
nonexist   | "Does not exist"            | "Exists"
packetin   | "Download packets exceeded" | "Download packets ok"
packetout  | "Upload packets exceeded"   | "Upload packets ok"
permission | "Permission failed"         | "Permission succeeded"
pid        | "PID failed"                | "PID succeeded"
ppid       | "PPID failed"               | "PPID succeeded"
resource   | "Resource limit matched"    | "Resource limit succeeded"
saturation | "Saturation exceeded"       | "Saturation ok"
size       | "Size failed"               | "Size succeeded"
speed      | "Speed failed"              | "Speed ok"
status     | "Status failed"             | "Status succeeded"
timeout    | "Timeout"                   | "Timeout recovery"
timestamp  | "Timestamp failed"          | "Timestamp succeeded"
uid        | "UID failed"                | "UID succeeded"
uptime     | "Uptime failed"             | "Uptime succeeded"

We were able to fix this on our side by setting (addresses changed to protect the innocent):

SET ALERT important-messages@projectlocker.com ON { invalid, nonexist, timeout, resource, size, timestamp}
SET ALERT less-important-messages@projectlocker.com ON {action, permission, pid, ppid, instance, status}

This successfully routes the messages to the adresses we care about. You can set them globallly or locally, but our alerts are just global.

The subheadings under SERVICE TESTS at: http://mmonit.com/monit/documentation/monit.html correspond fairly neatly to the types above.

For each scheduled process or feature of your server, you should be able to come up with what matters to you in plain English, and match that desire to one of the tests mentioned in SERVICE TESTS. For example, if I'm running Apache, I know that I care about:

Is the PID in the PID file still running? (nonexist)
Did the PID change without my knowledge? (pid)
Is the service responding in a timely fashion to a restart? (timeout)

For a custom daemon that polls, I may care about whether the log file is getting updated with status messages regularly (timestamp).

Linux – Monit having trouble determining if the Bitcoin daemon is running

You are correct in surmising that the bitcoin user won't have permission to write to /var/run, however root will be able to read /home/bitcoin/.bitcoin/bitcoind.pid. I would leave the PID in the latter location, and then work some more at figuring out why monit isn't reading that second location. My bet would be a typo in the path in the monit config.

Best Answer

Related Solutions

Monit – How to Disable Instance Start/Stop Alerts

Linux – Monit having trouble determining if the Bitcoin daemon is running

Related Topic