How to configure zabbix to add containers dynamically and monitor them across servers that have agents

zabbixzabbix-agent

I am struggling with discovering and monitoring dynamic (i.e. moving) containers across multiple physical servers and affiliating the results with the container service, not the server on which the agent that runs it is running.

I have 2 servers: A and B; I have one container: C. C can run on either A or B, my orchestration engine (Swarm, Kubernetes, Rancher, etc.) is responsible for ensuring it runs in at least one place.

I have a zabbix agent on A and B, so I can monitor CPU, filesystem, memory, all the usual stuff.

I want to monitor 2 things:

  1. Availability of C. I don't care where, but I want to know it is running.
  2. Checks of processes within C. I have a script that checks status.

How do I configure Zabbix and the agents so it reports on both the state of C and its processes, independent of where it is running?

My assumptions are:

  1. I treat C as a host, rather than A or B.
  2. I assign discovery of processes within C and their state as Items of C.

My questions are:

  1. How do I set C to be auto-discovered as a Host from A or B? I can use my own script or https://github.com/monitoringartist/Zabbix-Docker-Monitoring with filters to pick up only the containers I care about.
  2. How do I set the process check to run on both A and B agents?

In short, how do I set it up to run container auto-discovery as host, and then process check for each of those, such that they run on every server with an agent in a given group/pattern, but then affiliate the output to C container rather than the A or B server on which the agent is running?

EDIT: Thanks to first responder, I get the "meta-host" idea. But that creates new issues:

  1. How do I create a C "meta-host" if I have to add an IP when I create it? It could be the IP of A or B!
  2. How do I automatically create C by discovery, based on rules?
  3. Since Zabbix reaches out to each of A and B to say, "run these tests", how does it know whether to ask A to run them or B to run them?

This would be much easier if I could just say, "run discovery on all hosts in the 'Docker' group", which would discover all C (and D, etc.) containers and add them as hosts. And then also say, "run processes check script on all containers that were auto-discovered", perhaps by knowing which agent currently has access to the container (i.e. where it is running now).

I am starting to get the feel more and more that Zabbix is great for monitoring applications that are tied to a particular server, less so at monitoring apps that move around? Or am I misunderstanding it?

Best Answer

Disclaimer: I'm the author of https://github.com/monitoringartist/Zabbix-Docker-Monitoring

Set up standard Zabbix-Docker-Monitoring on A and B.

Edit Docker template - filter discovered container as you need + remove any trigger prototypes.

Create new calculated items, which will aggregate each C related item from A and B into new C calculated items (you can create C "metahost" in the Zabbix) - set up new triggers on top of these new C metrics.

Update: Use calculated items for aggregation - for example aggregate sum(docker.up[cid]) from A and B - then trigger condition for "Container cid is not running" will be sum(docker.up[cid])<1. Pls read Zabbix doc for correct syntax.

LLD discovers where is your container running atm and it will update items/triggers accordingly. If you want to eliminate any false alerts, don't forget to tune timings of LLD/triggers.