Zabbix Application Monitoring

monitoringzabbix

We operate a SAAS business, and we have hundreds of processes which can roam from server to server. They are .net processes which can be created (started) on any one of a bank of machines, run for a period of time (typically weeks), and then be migrated to another machine.

These processes have many different time-series outputs (which are broadcast using RabbitMQ) and we have our own bespoke system for monitoring the application processes.

We have a variety of monitoring tools (for example LogicMonitor) but we're starting using Zabbix for server monitoring.

It makes sense to me that we put all time-series data from all sources (switches, servers, hosts, VM's, applications) into one place because then we can compare the server wide data (for example CPU load, memory load).

I'm considering using Zabbix for this.

I can see that Zabbix supports sending time-series data using the https://www.zabbix.com/documentation/3.0/manual/concepts/sender. So I know I can get data into it.

I'm struggling to understand how to setup Zabbix for this given Zabbix is server centric, with keys for each time-series data. But, I expect this is a common scenario but I'm new to Zabbix.

I imagine a hierarchy along the following lines:

DataCenter (1 of n)
-> Rack (1 of n)
    keys (eg power used)  
   -> Physical Machine (1 of n) "The hosts"
       keys (eg CPU, Memory, Network Bandwidth)
      -> VM (1 of n) 
          keys (eg CPU, Memory, Network Bandwidth)    
          -> Application
             keys (eg CPU, Memory, Network Bandwidth, Jobs per second etc) 

Is this something Zabbix supports? I thought about perhaps using a naming convention for the host or keys but it feels like I'm doing something wrong.

Best Answer

As you mentioned Zabbix is designed for hosts/servers and keys, so as a first step to model your hierarchy you could create hosts for every VM and then use host groups as needed for datacenters or racks.

Zabbix has no build-in support for clusters or roaming applications. To monitor those I usually create "meta-hosts", basically empty host entries without any agent. Then I use some monitoring script to send zabbix trapper items to that host.

For example: using three VMs app1, app2, app3 with normal system monitoring (CPU, memory), in addition one "meta-host" service1 with my application template. Then having my roaming application send monitoring data with zabbix_sender -z zabbixserver -s service1 -k service.some.stat -o 42 (or the equivalent library call for the programming language).

As a result I will have system stats for all VMs and continuous application stats instead of intermitted application stats spread across three VMs.