NTP – What is Dispersion and How to Control It

busyboxlinuxntpntpdUbuntu

We roll out Ubuntu 14.04 servers on isolated networks, running ntpd 4.2.6p5, configured to use multiple NTP servers as provided by customers (no access to pool.ntp.org). Our dumb terminal client devices run an old version of BusyBox (1.00-rc2) and ntpclient 2010 from Larry Doolittle.

This setup has worked great for years, but recently we've hit a roadblock with a new customer. They provided us with 5 in-house NTP server addresses which seem to work great on their own, as far as ntpdate-debian is concerned on the Linux server. On the BusyBox side however, ntpclient complains with "Dispersion too high". From the debug output, ntpclient gets "1217163.1" from the NTP server but the max value it supports is absolute(65536).

$ /usr/sbin/ntpclient -s -i 15 -h 10.17.162.250 -d
Configuration:
  -c probe_count 1
  -d (debug)     1
  -g goodness    0
  -h hostname    10.17.162.250
  -i interval    15
  -l live        0
  -p local_port  0
  -q min_delay   800.000000
  -s set_clock   1
  -x cross_check 1
Listening...
Sending ...
recvfrom
packet of length 48 received
Source: INET Port 123 host 10.17.162.250
LI=0  VN=3  Mode=4  Stratum=4  Poll=4  Precision=-20
Delay=60745.2  Dispersion=1346801.8  Refid=10.31.10.21
Reference 3668859928.942079
(sent)    3668859928.708371
Originate 3668859928.708371
Receive   3668859928.963271
Transmit  3668859928.963369
Our recv  3668859928.708371
Total elapsed:      0.00
Server stall:      93.09
Slop:             -93.09
Skew:          255443.94
Frequency:             0
 day   second     elapsed    stall     skew  dispersion  freq
42463 56728.708  rejected packet: abs(DISP)>65536

These are all devices on the same LAN so frankly I am flabbergasted. Aghast even.

Here's the ntpq -pn output from the Ubuntu 14.04 server:

user@host:~$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l 1025   64    0    0.000    0.000   0.000
 10.17.162.249   10.17.6.10       5 u   23 1024   37    0.865  1381.07 697.260
 10.31.10.22     .LOCL.           1 u 1044 1024   17   29.586  -838.06 397.342
 10.17.6.10      10.31.10.21      4 u 1065 1024   17    0.366  105.245 402.999
*10.31.10.21     132.246.11.238   3 u    5 1024   37   29.418  794.292 616.796
 10.17.6.11      10.31.10.21      4 u 1038 1024   17    0.408  120.030 381.058

My questions are:

What is dispersion and what can alter its value?
What commands could I run to get more details from the NTP servers?
Could the fault lie on the Ubuntu server side, with an improper ntp.conf? There is nothing special there really.
Would switching to chrony change anything in this case?

Best Answer

I see some confusion going on in the answers here. For starters, ntpclient, at least in -s mode, isn't acting as a full NTP client, it's only sending and receiving one packet, so there's no "last 8 packets received". It isn't actually estimating its own dispersion at all.

Instead, the value it's printing is the value called "root dispersion" (rootdisp) in the packet returned by the server, which is an estimate of the total amount of error/variance between that server and the correct time. The way this is calculated is pretty simple: every NTP server either gets its time from an external clock (for example a radio or GPS receiver), or from another NTP server. If a server gets its time from an external clock, its root dispersion is the estimated maximum error of that clock. If it gets its time from another NTP server, its root dispersion is that server's root dispersion plus the dispersion added by the network link between them.

One point of confusion here is that while ntpq and chrony display dispersion and root dispersion in seconds, which is what people are used to looking to, ntpclient displays it in microseconds. Regardless, a value of 1217163 is still quite high. A good NTP server knows the time within a few milliseconds; a bad one within a few tens or hundreds of milliseconds. Yours is telling you that its time can only be trusted to within +/- 1.2 seconds.

You can actually get ntpclient to synchronize to this server anyway by passing the -x 0 or -t option (depending on version of ntpclient), which disables NTP sanity checks. If you only need roughly accurate time (to within a few seconds), that may be good enough. However, ntpclient is being pretty reasonable in refusing to synchronize to such a bad server. Your ntpq output on the ubuntu machine is showing a jitter of hundreds of milliseconds for all of its servers, even though they have low delay, which indicates either a very unreliable network, a conspiracy of all of the servers to provide erratic time, or a basic timekeeping problem on the server itself.

It also concerns me that the server 10.31.10.22 is advertising a refid of LOCL (undisciplined local clock) but has a stratum of 1. Usually the local clock is fudged to a stratum of 10 so that it's only used as a last-resort synchronization source to keep a herd from drifting apart. Either 10.31.10.22 is misconfigured and providing bad time to the rest of the network, or it's being disciplined to good time by some program outside of NTP's control, in which case the misconfiguration is simply that it's advertising the LOCL refid; it should be overridden to e.g. GPS or whatever is providing its time.

Related Solutions

Linux Bash – How to Sort du -h Output by Size

As of GNU coreutils 7.5 released in August 2009, sort allows a -h parameter, which allows numeric suffixes of the kind produced by du -h:

du -hs * | sort -h

If you are using a sort that does not support -h, you can install GNU Coreutils. E.g. on an older Mac OS X:

brew install coreutils
du -hs * | gsort -h

From sort manual:

-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)

NTP fudge network source stratum

After some more research it seems "fudging" the stratum level of a network source is not possible. So I moved on and tried dtoubeli's answer. To my surprise, simply making my local time server a stratum level 2 (equal to the 3rd party device) did not always cause it to be the preferred time source. My local ntpd would still rule them both as "false ticks". For what reason, I'm not sure, but I'm guessing because they were the only two time sources, and their times were so far off.

The biggest problem here is the fact that my 3rd party device doesn't seem to hold a very consistent time, in fact it fluctuates a lot. The solution to my problem was adding several other accurate time sources (pool.ntp.org) to my /etc/ntp.conf. Now my local server is always chosen as the preferred time source, often times despite having a higher stratum level than some of the servers in the pool.

Best Answer

Related Solutions

Linux Bash – How to Sort du -h Output by Size

NTP fudge network source stratum

Related Topic