ntpd Sync Issue – Why ntpd Does Not Sync Clock if Internet Connection is Delayed

confntpntpd

In our production box, I am a facing a problem with ntpd. I am enabling the NTP feature for our production box and observing one isssue.

We start the ntpd daemon in the initialization process of our box. During that time internet connection is not there. Below is my small ntp.conf file

driftfile  /etc/ntp.drift
logconfig =syncstatus
server pool.ntp.org iburst

Our box get internet connectivity little late once the interface comes up. That time I see that ntpd does not sync the clock. When I do ntpq -c as , I get no association id's found. I did wait for almost 30 min but still got no association id's found

I have to restart the ntpd. After restarting it, ntpd syncs the clock and everything works normally.
But again if I reboot my box then same issue happens. Again I have to restart ntpd, once box comes up and internet is reachable.

Did any one faced similar kind of problem?

Should I delay start of ntpd till the time interface comes up?

Update

I did some more experiment and I replaced server pool.ntp.org iburst with pool pool.ntp.org iburst and with this change ntpd sync the clock automatically. I didn't have to restart the ntpd. So here arises another question to me.

What happened when I replaced server with pool?

Should I always use pool keyword instead of server?

When should I use pool and when should I use server?

I did some research and found that
pool is the same as server, except it resolves one name into several addresses and uses them all
if they are doing same thing then why server pool.ntp.org iburst didn't worked for me but pool pool.ntp.org iburst worked.

Update

As suggested, I have used pool instead of server but still my clock is not able to synchronize at bootup. Previously no association id's found was coming but after using pool it is displaying the list.

GW:/admin# ntpq -c lpeer
     remote           refid      st t when poll reach   delay    offset  jitter
===================================================================== =========
 time.google.com .POOL.          16 p    -   64    0    0.000   +0.000   0.002

GW:/admin# ntpq -np
      remote           refid      st t when poll reach   delay   offset  jitter

 time.google.com .POOL.          16 p    -   64    0    0.000   +0.000   0.002

GW:/admin# ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 34173  8811   yes  none  none    reject    mobilize  1

GW:/admin# ntpq -c "rv 34173"
associd=34173 status=8811 conf, bcast, sel_reject, 1 event, mobilize,
srcadr=0.0.0.0, srcport=0, srchost="time.google.com", dstadr=0.0.0.0,
dstport=0, leap=11, stratum=16, precision=-19, rootdelay=0.000,
rootdisp=0.000, refid=POOL, reftime=(no time), rec=(no time), reach=000,
unreach=0, hmode=3, pmode=0, hpoll=6, ppoll=10, headway=0,
flash=1400 peer_dist, peer_unreach, keyid=0, offset=+0.000, delay=0.000,
dispersion=16000.000, jitter=0.002,
filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00      0.00    0.00,
filtoffset=   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00,
filtdisp=   16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0

I see flash status as 1400. What is the meaning of 1400 I was not able to found the flash status 1400 in ntp documentation.

Update

It started working. I replaced iburst with minpoll 3 maxpoll 4 and after that it is working on reboot. I used pool like this pool pool.ntp.org minpoll 3 maxpoll 4.
I am not sure what difference does this change made.

I also read that we should avoid using minpoll and maxpoll.
Too frequent for a sustained period and public NTP services may block you. ntpd is already good at dynamically selecting the pool interval.

Anyway thank you all for helping me.

Best Answer

When you supply a server to ntpd, at startup it resolves the hostname to an ip address and tries to use the ip address to sync the time. If that hostname does not resolve, it deletes it. Even if it does resolve it, it doesn't remember the hostname, only the ip address.

If the server in your server line was a local host with a fixed ip address (rather than a dynamic pool), you could replace the hostname with the real ip address, and it shouldn't delete it even if the network isn't up at startup.

If you supply instead a pool to ntpd, it retains the hostname (and tags it with .POOL.). Periodically (including at startup), it will resolve that hostname in DNS and add any IP's it gets as separate entries, and prune some of the least favorable ones.

You can see some of this with the command ntpq -np or equivalently ntpq -n -c peers

Note that there are also timing issues and ntpd version issues with all of this. This exact problem was filed as a bug in ntpd, and there have been several fix variations. Some versions of ntpd will defer the hostname resolution if it fails, but it may eventually give up anyway; so if you are testing by briefly disconnecting the network and reconnecting it, the problem may not occur. Also, ntp uses a polling algorithm that exponentially increases host poll time for both reachable and unreachable hosts (depending on your clock stability and the usefulness of the host as a time sync) with an upper limit of 1024 seconds (32 minutes), so if network reachability changes, it may take that long for it to notice. (The poll times and intervals are listed in ntpq -np)

Additionally, some boot startup scripts use ntpdate or similar tools to set the system clock to a server from ntp.conf so that the clock is partially synchronized before ntpd starts. This is a one shot attempt, and if it fails, ntpd may start with the clock wildly wrong. If it is only slightly wrong, ntp will fix it, but if it is majorly wrong, ntpd may refuse to sync the clock, and in some cases and versions of ntpd, may crash or exit. Some versions of ntpd have their own one shot clock big step options.