How to check if a service (that listens on given port) is up & working

monitoringnagiosprocess

OK, so lets say I have a Nagios setup that monitors different services using the so-called nagios-plugins.

What would be the best practive for my nagios plugin (probably written in python) to determine if given service is running OK?

The particular service in question is a python socket server that listens on some port. So I will make sure nagios frequently checks that service and if it stops responding / dies, I should restart it. What should I do to know if the socket server is alive? Eventually how would I check if it is responding.

I have control over the service – I can change the way it works if that would help me determine it's health state.

Any ideas are welcome!

Best Answer

Keeping to the standard Nagios plugins found on, say, an Ubuntu repository, you can use the check_tcp plugin to send a string, and then check to see if it returns the expected response:

Usage:check_tcp -H host -p port [-w <warning time>] [-c <critical time>] [-s <send string>]
[-e <expect string>] [-q <quit string>][-m <maximum bytes>] [-d <delay>]
[-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j]
[-D <days to cert expiry>] [-S <use SSL>] [-E]

Since you can modify your service, you can do something like "Are you OK?" and look for "I'm OK". It depends on how involved you want to get with checking to see if the service is up and running.

You can also use check_procs to see if the process for the service is there. This might be in conjunction with a check_tcp check, or as an alternative. Again, it depends on what you're doing, and how much you actually want to do. If you want to get very involved, you can write a custom Nagios check that will do all sorts of things to verify the functionality of the service and return custom state messages to the Nagios server.

Related Topic