Likely causes of NTPD dying unexpectedly and solutions

amazon s3ntpntpdservicevirtual-machines

On a web application which uses s3 for physical document storage, we are experiencing issues with NTP continuously dying. This seems to happen roughly once or twice a day. There is very little information provided when this occurs, other than that the PID file exists but the service is dead when I check the status.

Can anyone suggest likely causes of NTPD dying? I am assuming that maybe clock drift is causing it to die but I am not sure what would cause that either. There is more than enough memory and available disk space.

The last time the service died, this was the output:

Sep  6 06:15:25 vm02 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="988" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Sep  6 06:17:06 vm02 ntpd[10803]: 0.0.0.0 0618 08 no_sys_peer
Sep  6 08:01:10 vm02 ntpd[10803]: 0.0.0.0 0617 07 panic_stop -28101 s; set clock manually within 1000 s.

Best Answer

I would say there is no 1-minute method to find the exact reason.

We had similar issues before in our ESXi environment. To cut the story short, we found the ESXi host's clock drifted a lot and guest VMs were syncing time from both ESXi host and upstream NTP server. This caused NTPd on VMs confused therefore died quite often.

We also found in some rare cases the random packet loss also caused NTPd quit because the round trip time between your server and upstream NTPd server is used to calculate the drift time.

In above two cases, if NTPd sees a massive time drift, for example more than 1000s, it quits by default. -g option will help a bit.

   -g      Normally,  ntpd  exits  with  a  message to the system log if the offset exceeds the panic threshold,
           which is 1000 s by default. This option allows the time to be set to any value  without  restriction;
           however,  this  can  happen only once. If the threshold is exceeded after that, ntpd will exit with a
           message to the system log. This option can be used with the -q and -x options. See the tinker command
           for other options.

You can have a look at the system log, which should have some words may give you a hint. You could also monitor "ntpq -p" output to have a rough idea how the offset develops.

Related Solutions

Compare NTPD and ntpdate

The NTP algorithm includes information to allow you to calculate and fix the drift in your server's clock. NTPD includes the ability to use this to keep your clock in sync and will run more accurately than a clock on a computer not running NTPD. NTPD will also use several servers to improve accuracy.

ntpdate does not keep any state to perform this service for you so will not provide the same kind of accuracy. It will allow you to provide it with a list of servers which it will use to attempt to provide you with a better result but this is no substitute for the sophisticated algorithms provided in NTPD that track your drift from each of the servers over time.

NTPDATE corrects the system time instantaneously, which can cause problems with some software (e.g. destroying a session which now appears old). NTPD intentionally corrects the system time slowly, avoiding that problem. You can add the -g switch when starting NTPD to allow NTPD to make the first time update a big one which is more or less equivalent to running ntpdate once before starting NTPD, which at one time was recommended practice.

As for security concerns, ntp servers do not connect back on uninitiated connections, which means your firewall should be able to tell that you initiated the ntp request and allow return traffic. There should be no need to leave ports open for arbitrary connections in order to get NTPD to work.

From the ntpdate(8) man page:

ntpdate can be run manually as necessary to set the host clock, or it can be run from the host startup script to set the clock at boot time. This is useful in some cases to set the clock initially before starting the NTP daemon ntpd. It is also possible to run ntpdate from a cron script. However, it is important to note that ntpdate with contrived cron scripts is no substitute for the NTP daemon, which uses sophisticated algorithms to maximize accuracy and reliability while minimizing resource use. Finally, since ntpdate does not discipline the host clock frequency as does ntpd, the accuracy using ntpdate is limited.

Linux – System clock drifting out of sync with hwclock and ntpd

Add this parameters at kernel boot line

notsc divider=10 clocksource=acpi_pm

and restart your system. This is related to RHEL 5 Hyper-V Guest - Cannot sync with NTP after kernel upgrade

Best Answer

Related Solutions

Compare NTPD and ntpdate

Linux – System clock drifting out of sync with hwclock and ntpd

Related Topic