Check_ntp_time has been failing on all my hosts sporadically. I usually receive
CRITICAL - Socket timeout after 10 seconds
And a couple of minutes later check_ntp_time succeeds and shows the correct offset
NTP OK: Offset 0.0001899003983 secs
I've tried raising the check_ntp_time command timeout to 20 seconds but it fails at the same rate. I've tried removing nopeer and noquery from the ntp.conf to no avail (which makes sense because it would fail 100% of the time if that was the issue). The fact that it is failing at random times and succeeding right after is really throwing me off. It's worth noting also that it doesn't fail for all hosts at the same time, it usually fails between 1 to 3 hosts at a time. Any idea what could be causing this?
My check_ntp_time
command looks like this:
define command{
command_name check_ntp_time
command_line $USER1$/check_ntp_time -H pool.ntp.org -t 20 -w 1 -c 3
}
EDIT:
Metric Min. Max. Average
Check Execution Time: 0.00 sec 20.00 sec 1.153 sec
Check Latency: 0.00 sec 0.00 sec 0.000 sec
Percent State Change: 0.00% 31.84% 0.86%
2 checks per second (0.5 per CPU)
Best Answer
It's because the check tries to connect over IPv6 a half of timeout specified by '-t' and then it falls back to IPv4. So you can decrease the timeout to 10 seconds and you should get a response in 5 seconds:
Or you can use IPv4 only by '-4', then you will get a response in < 1 sec: