Zabbix Trigger Hysteresis Not Returning to Normal – Troubleshooting Guide

zabbix

I have the following to prevent agent connectivity from flapping (it must be solid for 20 mintutes before marking as OK).

After correcting the syntax from Zabbix Trigger Hysteresis – Incorrect trigger expression I have the following:

({TRIGGER.VALUE}=0 and {Template App Zabbix Agent:agent.ping.nodata(5m)}=1) or ({TRIGGER.VALUE}=1 and {Template App Zabbix Agent:agent.ping.min(20m)}=1)

Whilst it fires initially when there is no data, it never recovers. I am using v3.0.9 at the moment so trying to work with the above for now.

I have checked under the targets latest agent data, and can see agent ping is indeed 1 for more than 20 minutes.

Any ideas what i have done wrong please?

Best Answer

I've done some experiments:

The agent.ping writes a "1" value if reachable and doesn't write anything if unreachable; so even if your agent is unreachable for 2 hours, the last value is 1. This means that .min(), .avg() etc... always works on a list of "1" values.

The .nodata() function does not help with rebounds as well: it returns "1" only if it hasn't received any data for the entire time interval, "0" otherwise.

For instance, .nodata(20m) on a 60sec item will return:

  • 1: if no data received for the entire 20m time range (20 empty values)
  • 0: if everything's ok (20 full values)
  • 0: for everything inbetween (ie: 5 ok, 5 minutes of unreach, 10 ok)

I have found workaround, assuming that you check the agent reachability every 60 seconds:

({TRIGGER.VALUE}=0 and {Template App Zabbix Agent:agent.ping.nodata(5m)}=1) or ({TRIGGER.VALUE}=1 and {Template App Zabbix Agent:agent.ping.count(20m,1)}<20)

The expression will trigger after 5 minutes of unreachability and recover only when you have 20 "1" values in the last 20 minutes.

Not too elegant, but it works.