How should I manage and troubleshoot NTP issues

ntpntpdntpdate

For some time now I've been fighting with some NTP issues in my company's network and I'm having a hard time to understand how the commands combine with the service.
For example:
In the server's /etc/ntp.conf there's a line:

server IP_of_internal_ntp_server

But when I type ntpq -p it shows me a different server's IP.
In addition, through time I've learned that the way to re-sync a server's time with the NTP server is this:

service ntpd stop && ntpdate ntp_server && service ntpd start

My questions are:

Are the ntpd daemon and ntpdate command work together? if so, why do I have to stop the ntpd daemon in order to sync ntp?

The ntpq -p command, is it affected by the /etc/ntp.conf file?

In some servers a Nagios NTP check is returning NTP OK: Offset unknown while in all other servers I get a proper response and all
other servers are configured just the same, why is that?

Thanks in advance, Itai

Edit #1:
/etc/ntp.conf:

driftfile /var/lib/ntp/drift
fudge   127.127.1.0 stratum 10  
keys /etc/ntp/keys
restrict 0.centos.pool.ntp.org mask 255.255.255.255 nomodify notrap noquery
restrict 127.0.0.1 
restrict 1.centos.pool.ntp.org mask 255.255.255.255 nomodify notrap noquery
restrict 2.centos.pool.ntp.org mask 255.255.255.255 nomodify notrap noquery
restrict -6 ::1
restrict default kod nomodify notrap nopeer noquery
server 127.127.1.0
server 130.117.52.203

Output of ntpq -p:

[root@nyproxy15 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 38.74.128.71    .INIT.          16 u    -   64    0    0.000    0.000   0.000
*LOCAL(0)        .LOCL.          10 l   45   64  377    0.000    0.000   0.001
[root@nyproxy15 ~]#

Please ignore the stratum 16, I know it needs to be fixed.

Edit #2:
I've edited /etc/ntp.conf and commented out the lines you mentioned.

[root@nyproxy15 ~]# service ntpd stop ; ntpdate 130.117.52.203 ; service ntpd start
Shutting down ntpd:                                        [  OK  ]
30 Sep 08:16:30 ntpdate[31192]: adjust time server 130.117.52.203 offset -0.078324 sec
ntpd: Synchronizing with time server:                      [  OK  ]
Starting ntpd:                                             [  OK  ]
[root@nyproxy15 ~]# ntpq -p
localhost.localdomain: timed out, nothing received
***Request timed out
root@nyproxy15 ~]# ps -ef |grep ntp
root     31210     1  0 08:16 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid

Edit #3:

It seems like, now, after a few minutes, ntpq -p returns the correct response:

[root@nyproxy15 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*130.117.52.203  46.4.54.78       3 u    9   64  377   80.633   -9.950   1.420
[root@nyproxy15 ~]#

Best Answer

If you want an NTP server to do anything reliably, you need not to lie to it about the reliability of its own clock; the lines

server 127.127.1.0

and

fudge 127.127.1.0 stratum 10

do exactly that, and it looks like getting rid of them has fixed things.

As for stopping ntpd before brute-forcing the time with ntpdate, my understanding is that there's a single structure inside the kernel for playing with the clock, and ntpd sits on it (in order to skew the time if needed). As long as it's there, ntpdate can't get a look in; so it's necessary to take it out of the picture long enough for ntpdate to work.

But my understanding's strictly from running pool servers; I'm no kernel programmer, and could be wrong about that.

Best Answer

Related Solutions

Linux – Why does the ntpd not work

Linux – Seemingly poor quality of NTP time synchronization using a GPS clock

Related Topic