System Time Off by Hundreds of Milliseconds Despite NTP Sync

ntptimetime-synchronization

I'm running a couple of servers which need a pretty tight time sync (<50ms) as they are running a Paxos algorithm.
The servers are running NTP and are successfully sync at one point.
According to hwclock the 11-minute mechanism is enabled, so the system time should be copied to hardware clock.

However, I see that after a reboot the system time can be off by as much as 300ms compared to the time just before a reboot. Is it unreasonable to think that after a reboot the time should be within 50ms of the time just before reboot?

Best Answer

My initial reaction was that 300ms seems like an awful lot, but I do have numbers to produce, and they show that @Law29 is right:

  1. One of my machines over a normal week:
    • Frequency: frequency
    • System peer offset: sysoffset
  2. Same system, shorter period with a reboot involved:
    • Frequency: frequency-reboot
    • System peer offset: sysoffset-reboot
    • Scatter plot of the peers peerstatsplot-reboot

(Hope you can read all the numbers on the graphs OK - drop me a comment if not.)

As you can see, there's a rather large discrepancy. It surprised me how much it was, and also how long it took to get back on track with the frequency correction, considering that there's a stratum 1 GPS source on my local network. And given that the peer samples are fairly tightly clustered on the plot, it's clearly a problem with the local clock, not inconsistent network delay during startup. (For the record, the hardware is a Shuttle DS437 fanless mini-PC with a dual-core Celeron 1037U @ 1.8 GHz.)

So the takeaways are probably:

  1. make sure ntpd is successfully writing the NTP drift file,
  2. make sure the kernel's 11-minute timer to update the hardware clock is on (See "Automatic Hardware Clock Synchronization by the Kernel" in man hwclock for details), or your shutdown process is updating the hardware clock,
  3. make sure ntpd has 4-10 reachable sources (in iburst mode), and
  4. configure your startup dependencies so that ntpd has a chance to fix the clock before Paxos starts.