I've been spending some time getting my head around using Nagios, Check_mk and some other very useful tools installed as part of the OMD package.
WATO is especially useful for administrating all of our static Windows and Linux based servers through a GUI once the check_mk agent is installed manually.
I wanted to ask what is the best way to automate this entire monitoring process? Or even if it can be done?
We will be using chef recipes to provision new servers on a regular basis and kill off others frequently. If we are to continue using Nagios / Check_mk then it's essential that the admin effort is minimal to track and monitor our infrastructure.
Many thanks for your help.
Steve
Best Answer
Highlevel, there are two ways:
I think in the long run the first way is going to work out better for you anyway since it's more oriented towards WATO. I would still pick the second one and hook into the EC2 vm list and such.
A hybrid is possible with i.e. some daemon listens in on events like VM creations and writes out config to the WATO readonly folder.
Note: It would be highly stupid to not sanitycheck any such datasource. Just because some Infrastructure as Code nutcase adds a (infrastructure) bug and deletes 100% of your VMs from Chef they should not be immediately removed from monitoring.
Make sure it stays a little out of band.
A 2010-ish document about dynamic Check_MK interfacing could be found here: https://geni-orca.renci.org/trac/wiki/OMDeventhandlers
It's really old but lays out the basic ideas well.
I've made a first proof of concept for a config-mgmt ---to ---- Check_MK interface. Not as nice as I would like it, but just limited by my speed/skill writing Python. :)
I'm using it with approx. non-cloud 70 servers now: https://bitbucket.org/darkfader/nagios/src/461992c2c5452807a37838ca99fd92977fcf96e1/check_mk/ino2cmk/ino2cmk.py?at=default