I have two linux machines (A and B) on an isolated network. They must be time-synchronized. Machine A is powered intermittently and must serve the time, as it is connected to an authoritative time source (GPS). Machine B is only powered if machine A is powered, but it is an embedded linux device and its power state will change frequently. Neither machine has access to other systems. It's a closed network.
I understand that this is quite a tall order for NTP, as NTP usually expects to have contact with several servers. I'm having trouble getting this to work properly on Machine B. Machine A synchs to the GPS just fine, and machine B can reach machine A and even do time queries, but Machine A is not trusted (perhaps by itself?). After a solid hour of machine A being up, this suddenly changed and machine B worked. However, when machine A went down (and thus machine B), machine B is once again unable to find a good time synch.
Here's some ntpdate info. Please note that even when machine A's stratum is 1, the operation fails with the same output at the end.
10.10.10.1: Server dropped: strata too high server 10.10.10.1, port 123 stratum 16, precision -19, leap 11, trust 000 refid [10.10.10.1], delay 0.02614, dispersion 0.00000 transmitted 4, in filter 4 reference time: 00000000.00000000 Thu, Feb 7 2036 6:28:16.000 originate timestamp: d3a9bdc4.27ebb350 Thu, Jul 12 2012 21:19:00.155 transmit timestamp: bc17c803.b42dfffe Sat, Jan 1 2000 0:25:39.703 filter delay: 0.02625 0.02614 0.02618 0.02625 0.00000 0.00000 0.00000 0.00000 filter offset: 39544160 39544160 39544160 39544160 0.000000 0.000000 0.000000 0.000000 delay 0.02614, dispersion 0.00000 offset 395441600.451568 1 Jan 00:25:39 ntpdate[677]: no server suitable for synchronization found
My guess is that machine A just doesn't trust itself for serving time. After 51 minutes (may have happened earlier, I don't know) of uptime and having its clock synch'd to GPS, machine A started to serve time correctly, and machine B picked it up. I need this to happen earlier. Like, within seconds if possible.
With the following configs (and a lot of waiting), it eventually succeeds.
Machine A ntp.conf:
server 127.127.28.0 prefer true minpoll 4 maxpoll 4 fudge 127.127.28.0 stratum 1 time1 0.420 refid GPS
Machine B ntp.conf:
server 10.10.10.1 prefer true minpoll 4 maxpoll 4
ntpq -c peers on Machine B without good time fix:
remote refid st t when poll reach delay offset jitter ============================================================================== 10.10.10.1 .STEP. 16 u 9 16 0 0.000 0.000 0.000
ntp1 -c peers on Machine B with good time fix:
remote refid st t when poll reach delay offset jitter ============================================================================== *10.10.10.1 SHM(0) 2 u 7 16 17 0.669 2.597 1.808
So, now the question becomes: how do I make Machine A trust itself quickly?
Some debug output from Machine A before and after machine B decides that Machine A is good enough to use..
before..
~ # ntpq -c rv associd=0 status=c418 leap_alarm, sync_uhf_radio, 1 event, no_sys_peer, version="ntpd 4.2.6p4@1.2324 Fri Feb 24 15:01:45 UTC 2012 (1)", processor="armv7l", system="Linux/2.6.35.14", leap=11, stratum=2, precision=-19, rootdelay=0.000, rootdisp=44.537, refid=SHM(0), reftime=d3ab0053.43b44780 Fri, Jul 13 2012 20:15:15.264, clock=d3ab0062.e7e03154 Fri, Jul 13 2012 20:15:30.905, peer=34819, tc=4, mintc=3, offset=0.000, frequency=0.000, sys_jitter=3.853, clk_jitter=36.492, clk_wander=0.000
after…
~ # ntpq -c rv associd=0 status=0415 leap_none, sync_uhf_radio, 1 event, clock_sync, version="ntpd 4.2.6p4@1.2324 Fri Feb 24 15:01:45 UTC 2012 (1)", processor="armv7l", system="Linux/2.6.35.14", leap=00, stratum=2, precision=-19, rootdelay=0.000, rootdisp=41.278, refid=SHM(0), reftime=d3ab0063.43b37856 Fri, Jul 13 2012 20:15:31.264, clock=d3ab006d.9ee53ec2 Fri, Jul 13 2012 20:15:41.620, peer=34819, tc=4, mintc=3, offset=0.000, frequency=43.896, sys_jitter=0.762, clk_jitter=36.953, clk_wander=0.000
Best Answer
NTP should work fine. Look at some of the options for fast synchronization on start-up. Look at the
burst
andiburst
options for the system B. Look at thetrue
option for the GPS clock source.Consider using the hardware clock as a backup time source on both systems. Set a higher stratum system B. Something like the following should work:
Watch the output of
ntpq -c peers
to see when you get a trusted clock source. Normallyntp
wants a number of responses from a trusted time source before it trusts it. This is indicated by the first character on each line.While NTP likes more sources, any odd number of time sources within one stratum level should work well. As you only have two servers and a GPS clock the priority (stratum) of the sources should increase from GPS, clock on server A, clock on server B. Increasing the stratum between each by three or four levels will ensure priorities are respected.
EDIT: If you have the busybox NTP server on server A, it may be worthwhile installing the full ntp server package. Understanding what is happening with server A should go a long way to solving your problem. You will need at least one trusted time source there before server B should trust it. If
ntpq -c peers
doesn't work, then you can tryntpdc peers
. Both these commands allow you to query other hosts. Apeerstats
log could also be useful.On server B use ntpclient as documented the busybox ntp howto to log what is happening on it
The clocks should be reasonably close to the correct time if the servers haven't been down for long. If you need to sync the two systems, that should be sufficient. The GPS will bring the time into sync with the real world eventually.
'ntpd -q' synchronizes quickly, but exits (ntpdate behaviour). It needs to be followed by an
ntpd
command without the quit option to have continuous synchronization.EDIT2: I check my server and found one of the servers was off by a second. While fixing this I played with the settings.
iburst
gets a server trusted very quickly.true
ensured the clock driver was trusted if there weren't multiple other trusted sources. The clock took a little more than a minute before it was locally trusted and could be trusted remotely.When testing you should be able to restart the
ntpd
process once it is synchronized and test how fast settings work. In the above case Server B may need to be restarted to test how fast it synchronizes. When monitoringntpd
changes I use a line like:The hostname and sleep time are adjusted as require. In some cases I chain two or more
ntpq
command lines in the loop. When doing so I use an echo and/or date command to provide an indication of where sets of data change.