Zabbix-agent unreachable

monitoringzabbixzabbix-agent

One of the servers monitored by Zabbix is not reachable. I have no idea why as this works normally with other servers.

  • The zabbix-agent service on the monitored server is running.
  • We have several servers, all monitored by zabbix. In /etc/zabbix/zabbix_agentd.conf I see no difference between this problematic server and another one that works normally.
  • Both the zabbix server and the monitored server (agent-server) are hosted by Amazon.
  • All zabbix monitored servers are linked to a security group with two inbound rules for port 10050 and 10051 for the zabbix-server IP. So incoming requests from the zabbix-server to the zabbix-agents on these servers should be allowed. They work on several servers, but not on this one.
  • The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.
  • I can open a telnet session from the zabbix-server to the agent. It disconnects automatically, but it connects. So I guess the firewall is not the problem.
  • Server: Amazon Linux (Centos like)
  • Installed file: http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-release-2.2-1.el6.noarch.rpm
  • SELinux is disabled on all these agents and on the server.

Agent log after restart of zabbix-agent service

 10939:20151127:093938.268 Starting Zabbix Agent [agent-server.test]. Zabbix 2.2.11 (revision 56693).
 10939:20151127:093938.268 using configuration file: /etc/zabbix/zabbix_agentd.conf
 10942:20151127:093938.269 agent #1 started [listener #1]
 10945:20151127:093938.269 agent #4 started [active checks #1]
 10941:20151127:093938.270 agent #0 started [collector]
 10944:20151127:093938.270 agent #3 started [listener #3]
 10943:20151127:093938.271 agent #2 started [listener #2]
 10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail 
 (cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)

When I telnet to the agent-server, then enter agent.version, it returns: ZBXD2.2.11

Contents of /etc/zabbix/zabbix_server.conf (server):

ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBName=zabbix
DBUser=zabbix
DBPassword=******
DBSocket=/var/lib/mysql/mysql.sock
SNMPTrapperFile=/var/log/snmptt/snmptt.log
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts

Contents of /etc/zabbix/zabbix_agentd.conf (agent)

PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1
Server=zabbix-server-ip
ListenPort=10050
StartAgents=3
# ServerActive=zabbix-server-ip # commented out
Hostname=server.test
Timeout=3
AllowRoot=1
Include=/etc/zabbix/zabbix_agentd.d/

Netstat on zabbix server

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10051               0.0.0.0:*                   LISTEN      7624/zabbix_server  
tcp        0      0 :::10051                    :::*                        LISTEN      7624/zabbix_server

Netstat on problematic agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      3248/zabbix_agentd  
tcp        0      0 :::10050                    :::*                        LISTEN      3248/zabbix_agentd 

Netstat on working agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      24242/zabbix_agentd 
tcp        0      0 :::10050                    :::*                        LISTEN      24242/zabbix_agentd

Active vs passive agent

  • I've opened port 10051 on the server for the problematic agent IP.
  • Telnet shows that works, from agent to server.
  • I've activated the ActiveServer option with the zabbix-server-ip as value. The error mesage is gone in the log after restarting the agent.
  • The problem is still there…

Next try:

  • I've did the same for a working agent, can telnet from agent to server.
  • ActiveServer is set with the zabbix-server-ip, agent is restarted
  • StartAgents is set to 0, to force using the active agent.
  • Zabbix reports that this server is unreachable…
  • Then I reset to passive.

All in all, the active mode may have been set in the agent config on several servers, it has never worked. All reports are from passive agents.

Agent Interfaces

  • Opening via Monitoring > Latest data, selecting host=all, I click the server name, and choose Host Inventory
  • The working agent displays its own IP address.
  • The problematic agent displays the zabbix-server-ip.

I don't know why this happens, but it seems strange.

What can cause this connection problem? How can I reconnect the server with the agent?

Solution

It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.

Best Answer

How about current setting of SELinux and iptables on agent box? Can you from agent telnet to server via port 10051?

You can try to check the connectivity between boxes using tcpdump on agent: tcpdump -i your_interface tcp port 10050. Using this you can see the incoming/outgoing packets.