Linux – NTP service stopped in centos7.1 linux machine

centos7clock-synchronizationlinuxlinux-kernelntp

In our 3 linux boxes centos 7.1, we saw below messages in the logs and then ntpd service got stopped since the offset is more than 1000s.

systemd: Time has been changed

ntpd[2626]: 0.0.0.0 0617 07 panic_stop -26789 s; set clock manually within 1000 s.

All these 3 linux boxes are under ESX machine. Also note many other linux boxes are under same ESX machine apart from these linux boxes.

We have external NTP servers configured in those boxes and no issues with those ntp servers.

Given the above scenario, What are the possibilites where system clock gets changed apart from manual intervention since this issue caused NTP service to be stopped automatically.

Best Answer

Speculative answer: on-board clock might drift if power is cut. Maybe on-board battery is nearly dead. If machine is powered down for a while and power is cut off then the time set upon reboot might be outside ntpd's max allowed adjustment.

If you are on VMs then only the service on the VM server should need controlling.

I have a CentOS 7.1 machine currently (not a VM) . . . During this month it had power down of 47min + 57min + 1day7min + 2min. There was some electrical work done in machine room. Look at 'last -x shutdown reboot':

[root@boxymcboxface ~]# last -x shutdown reboot 
reboot   system boot  3.10.0-229.el7.x Sun Jan 15 16:41 - 16:43 (8+00:02)   
shutdown system down  3.10.0-229.el7.x Sun Jan 15 16:38 - 16:41  (00:02)    
reboot   system boot  3.10.0-229.el7.x Sun Jan 15 16:16 - 16:38  (00:22)    
shutdown system down  3.10.0-229.el7.x Sat Jan 14 09:09 - 16:16 (1+07:07)   
reboot   system boot  3.10.0-229.el7.x Fri Jan 13 12:18 - 09:09  (20:50)    ** first ntpd panic_stop seen @ Jan 13 12:38:39 **
shutdown system down  3.10.0-229.el7.x Fri Jan 13 11:21 - 12:18  (00:57)    ** down for 57 mins **
reboot   system boot  3.10.0-229.el7.x Tue Nov 22 11:49 - 11:21 (51+23:31)  
shutdown system down  3.10.0-229.el7.x Tue Nov 22 11:02 - 11:49  (00:47)

The first panic_stop message:

ntpd[733]: 0.0.0.0 c617 07 panic_stop -1027 s; set clock manually within 1000 s.

It would be interesting to see what clock is set to after each reboot. But only the latest message can be seen. 'dmesg |grep clock':

[    0.810823] rtc_cmos 00:08: setting system clock to 2017-01-15 16:40:57 UTC (1484498457)

So it looks like over the space of 57mins when probably maybe the power was out for ~30mins~ the clock drifted out (too fast) by 17mins.

Related Solutions

Linux – Single NTP server on isolate network

NTP should work fine. Look at some of the options for fast synchronization on start-up. Look at the burst and iburst options for the system B. Look at the true option for the GPS clock source.

Consider using the hardware clock as a backup time source on both systems. Set a higher stratum system B. Something like the following should work:

server  127.127.1.0
fudge   127.127.1.0 stratum 8

Watch the output of ntpq -c peers to see when you get a trusted clock source. Normally ntp wants a number of responses from a trusted time source before it trusts it. This is indicated by the first character on each line.

While NTP likes more sources, any odd number of time sources within one stratum level should work well. As you only have two servers and a GPS clock the priority (stratum) of the sources should increase from GPS, clock on server A, clock on server B. Increasing the stratum between each by three or four levels will ensure priorities are respected.

EDIT: If you have the busybox NTP server on server A, it may be worthwhile installing the full ntp server package. Understanding what is happening with server A should go a long way to solving your problem. You will need at least one trusted time source there before server B should trust it. If ntpq -c peers doesn't work, then you can try ntpdc peers. Both these commands allow you to query other hosts. A peerstats log could also be useful.

On server B use ntpclient as documented the busybox ntp howto to log what is happening on it

The clocks should be reasonably close to the correct time if the servers haven't been down for long. If you need to sync the two systems, that should be sufficient. The GPS will bring the time into sync with the real world eventually.

'ntpd -q' synchronizes quickly, but exits (ntpdate behaviour). It needs to be followed by an ntpd command without the quit option to have continuous synchronization.

EDIT2: I check my server and found one of the servers was off by a second. While fixing this I played with the settings. iburst gets a server trusted very quickly. true ensured the clock driver was trusted if there weren't multiple other trusted sources. The clock took a little more than a minute before it was locally trusted and could be trusted remotely.

When testing you should be able to restart the ntpd process once it is synchronized and test how fast settings work. In the above case Server B may need to be restarted to test how fast it synchronizes. When monitoring ntpd changes I use a line like:

while ntpq -c peers localhost; do sleep 10; done

The hostname and sleep time are adjusted as require. In some cases I chain two or more ntpq command lines in the loop. When doing so I use an echo and/or date command to provide an indication of where sets of data change.

Leap-second flag not forwarded to NTP clients

FYI: seems that it's a version specific behavior, with a "forward"-port package from Lenny (4.2.4p4+dfsg-8lenny3) on Squeeze works as expected: leap seconds fields are forwarded to the clients.

Best Answer

Related Solutions

Linux – Single NTP server on isolate network

Leap-second flag not forwarded to NTP clients

Related Topic