Nagios’ check_ntp_time sporadic socket timeout

nagiosntp

Check_ntp_time has been failing on all my hosts sporadically. I usually receive

CRITICAL - Socket timeout after 10 seconds

And a couple of minutes later check_ntp_time succeeds and shows the correct offset

NTP OK: Offset 0.0001899003983 secs

I've tried raising the check_ntp_time command timeout to 20 seconds but it fails at the same rate. I've tried removing nopeer and noquery from the ntp.conf to no avail (which makes sense because it would fail 100% of the time if that was the issue). The fact that it is failing at random times and succeeding right after is really throwing me off. It's worth noting also that it doesn't fail for all hosts at the same time, it usually fails between 1 to 3 hosts at a time. Any idea what could be causing this?

My check_ntp_time command looks like this:

define command{
    command_name    check_ntp_time
    command_line    $USER1$/check_ntp_time -H pool.ntp.org -t 20 -w 1 -c 3
    }

EDIT:
Metric Min. Max. Average
Check Execution Time: 0.00 sec 20.00 sec 1.153 sec
Check Latency: 0.00 sec 0.00 sec 0.000 sec
Percent State Change: 0.00% 31.84% 0.86%

2 checks per second (0.5 per CPU)

Best Answer

It's because the check tries to connect over IPv6 a half of timeout specified by '-t' and then it falls back to IPv4. So you can decrease the timeout to 10 seconds and you should get a response in 5 seconds:

[root@server ~]# time /usr/lib64/nagios/plugins/check_ntp_time -q -H time1.google.com -w 1 -c 2 -t 10
NTP OK: Offset 0.0004314184189 secs|offset=0.000431s;1.000000;2.000000;

real    0m5.767s
user    0m0.843s
sys     0m4.908s

Or you can use IPv4 only by '-4', then you will get a response in < 1 sec:

[root@server ~]# time /usr/lib64/nagios/plugins/check_ntp_time -q -H time1.google.com -4 -w 1 -c 2 -t 10
NTP OK: Offset 0.0006598234177 secs|offset=0.000660s;1.000000;2.000000;

real    0m0.401s
user    0m0.003s
sys     0m0.007s