Zabbix: adding items make agents not available

zabbix

Zabbix version: 3.0.3 (zabbix-server-mysql)

OS: Ubuntu 14.04 Trusty

Number of hosts (enabled/disabled/templates): 28 / 0 / 57

Number of items (enabled/disabled/not supported): 1349 / 161 / 47

Number of triggers (enabled/disabled): 902 / 39

Required server performance, new values per second: 22.86

Zabbix server config:


StartPollers=5
StartPollersUnreachable=2
StartTrappers=5
StartDiscoverers=3
StartHTTPPollers=5

I have template with 3 items like this: net.tcp.port[<IP>,3128]. Template is applied to 10 servers.

Here is problem: when I enable this items, events like zabbix-agent on <hostname> is not available for 2 minutes start to randomly appear on 10 hosts where template is applied. Values on graph "Zabbix Server Preformance" (that represents ), representing zabbix[wcache,values], start going down from 19-19.5 to 16-17. Values representing zabbix[queue] stay at 0 as before.

When I disable items, problem disappears.

Zabbix server is not overloaded by I/O or CPU, there is plenty of free memory. Doesn't seem as hardware performance issue. Zabbix agents on hosts are available, I check it with nc -vz <hostname> 10050.

Nothing abnormal appears in server log or agents logs on this 10 hosts.

I tried increasing ulimit -n for zabbix server process, it was increased: cat /proc/<zabbix_worker_pid>/limits now shows Max open files 10240 10240 files. Didn't help.

I tried increasing number of StartPollers to 10 and 15 – didn't help either.

What is happening to server?

UPD:

Items type: Zabbix agent

All systems are rinning Linux ubuntu 14.04 trusty

Agents on hosts run 3 listeners, 1 collector and 1 active checks process.

For 7 of this 10 hosts zabbix_get -s <host> -t net.tcp.port[<IP>,3128] works instantly for all 3 items, on other 3 hosts it works for about 3 seconds and returns 0(monitored IPs are not available from that 3 hosts).

Best Answer

Finally:

If:

  • timeout on both agent and server are the same (default: timeout = 3)
  • there is item net.tcp.port[<IP>,<port>] and trigger using it
  • pair [<IP>,<port>] is unavailable by TCP timeout

Then:

"Zabbix-agent on {HOST.NAME} is unawailable" ( trigger expression: {agent.ping.nodata(2m)} = 1 ) start spawning on hosts with this item. Not the trigger for specific item, but the trigger for the agent availability. This is bug, but zabbix guys do not seem to agree:

https://support.zabbix.com/browse/ZBX-10868

Zabbix version 3.0.3 for both server and agent.

Possible workarounds:

  • make Timeout in zabbix_server.conf more than in zabbix_agentd.conf
  • use UserParameter like this: UserParameter=tcp_connect_check[*], /bin/nc -z "$1" "$2" -w "$3"; echo $? and create items connect timeout less than in zabbix_agentd.conf. To avoid securely problems, do not enable UnsafeUserParameters in zabbix_agentd.conf