Automatically monitor new cloud servers using Open Monitoring Distro (OMD)

amazon-web-servicescheck-mkcloudmonitoringnagios

I've been spending some time getting my head around using Nagios, Check_mk and some other very useful tools installed as part of the OMD package.

WATO is especially useful for administrating all of our static Windows and Linux based servers through a GUI once the check_mk agent is installed manually.

I wanted to ask what is the best way to automate this entire monitoring process? Or even if it can be done?

We will be using chef recipes to provision new servers on a regular basis and kill off others frequently. If we are to continue using Nagios / Check_mk then it's essential that the admin effort is minimal to track and monitor our infrastructure.

Many thanks for your help.
Steve

Best Answer

Highlevel, there are two ways:

Make chef write valid Check_MK config files (this has already been done by now), and have it trigger inventory + reloads via the WATO automation. This is probably more transparent.
Make Check_MK read the hosts from your CMDB (should you run a professional setup, there would be one...) or from the Chef config. This is feasible the Check_MK config allows you basically anything that Python allows you. So you could read data from LDAP, some API, Chef config, or a flat file. To me, it's the cleaner approach since it has a more direct "data" interface.

I think in the long run the first way is going to work out better for you anyway since it's more oriented towards WATO. I would still pick the second one and hook into the EC2 vm list and such.

A hybrid is possible with i.e. some daemon listens in on events like VM creations and writes out config to the WATO readonly folder.

Note: It would be highly stupid to not sanitycheck any such datasource. Just because some Infrastructure as Code nutcase adds a (infrastructure) bug and deletes 100% of your VMs from Chef they should not be immediately removed from monitoring.

Make sure it stays a little out of band.

A 2010-ish document about dynamic Check_MK interfacing could be found here: https://geni-orca.renci.org/trac/wiki/OMDeventhandlers

It's really old but lays out the basic ideas well.

I've made a first proof of concept for a config-mgmt ---to ---- Check_MK interface. Not as nice as I would like it, but just limited by my speed/skill writing Python. :)

I'm using it with approx. non-cloud 70 servers now: https://bitbucket.org/darkfader/nagios/src/461992c2c5452807a37838ca99fd92977fcf96e1/check_mk/ino2cmk/ino2cmk.py?at=default

Related Solutions

Sql-server – omd nagios monitoring servicestate mssql / Failed to open service

I had to trial and error this one. What works for me is

MSSQL\\$Instance

as in

check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQL\\$Instance

Linux – Nagios check_mk with plugins

If you need to run this on your monitored end nodes:

1) Read the documentation (that page likely exists since about 2012) https://mathias-kettner.de/checkmk_mrpe.html

2) Read the check man page.

OMD_user@host:~ $ cmk -M mrpe

3) If needed, also look into caching to avoid checking cloudwatch every minute - should you not want that. It depends on how often you want to poll the data.

http://lists.mathias-kettner.de/pipermail/checkmk-werks-lvl1/2016-January/002474.html

If you need to run this only locally on the Nagios server:

4) just define a legacy ("active") check. Described at http://mathias-kettner.com/cms_wato_services.html in section 8.2.

(I would recommend you actually read all of that page though)

5) a very deep howto in case you would need to write your own & want to add a web ui for it: http://www2.steinkogler.org/steinkogler.org/2016/08/21/check-mk-write-your-own-active-check/

Best Answer

Related Solutions

Sql-server – omd nagios monitoring servicestate mssql / Failed to open service

Linux – Nagios check_mk with plugins

Related Topic