Monitoring zabbix server (itself) externally

monitmonitoringsystem-monitoringzabbix

I have some infrastructure (servers, switches etc) monitored by a Zabbix server, setup to alert in case of issues; so far so good. But what if the Zabbix server itself (or any of the underlying infra) experiences a problem?

One idea would be to publish some sort of heartbeat, which can be monitored by an external system. I'm thinking to use the Zabbix API (probably using py-zabbix) to expose this over http and have it monitored using smth like monitor.us.

Before I take the plunge, I can't help wondering if something simple already exists to cover this? Or is this even a good approach? Would monit be a better approach compared to a custom Python script? (not sure this passes the "simplicity" test)…

Best Answer

So here's what I ended up doing:

  1. Wrote a fairly simple Python script which uses pyzabbix to interrogate Zabbix for the set of "triggers" currently failing (see snippet below). This runs periodically on a background Thread (so it has to be thread-safe).
  2. I used web.py to expose this to the external monitoring system.

There was an unexpected hiccup: the Zabbix API still responds even if the Zabbix server is down and there is no way to interrogate the status of the server - which was the main thing I wanted to monitor. Thankfully a patch exists to allow such server status queries.

Here is the code to query the set of failing Zabbix triggers (adapted from an example which comes with pyzabbix). If you need the code for the full monitor, please ask in a comment and I'll post it on github.

def __query_unacked_triggers(self):
    """ queries for currently tripped _triggers which haven't been acked """
    return self._zapi.trigger.get(
        only_true = 1,
        filter = { 'value': 1 },
        skipDependent = 1,
        monitored = 1,
        active = 1,
        output = 'extend',
        expandDescription = 1,
        expandData = 'host',
        withLastEventUnacknowledged = 1,
    )
Related Topic