Virtualization NTP – Limits of Running NTP Servers in Virtual Machines

ntpvirtualization

I want to set up several Stratum 2 time servers on my local network. Virtual machines would certainly be a cheaper way to do this than buying three 1U servers. What limitations would doing so impose? That is, to what degree will accuracy be adversely impacted?

Additionally, my instinct is that these local time servers should reside on different physical machines in order to mitigate any hardware irregularities. Is this intuition correct?

Edit
I should say that by "virtual machines" I didn't specifically mean VMware. Rather, I meant the general concept of virtualized instances.

Best Answer

The simple fact is that clock accuracy within a VM is still really bad. This comes from a few spots, but the killer thing is that the time drift is not constant; the drift factor changes from moment to moment. NTP is a protocol that has clock compensation built within it, but it was designed with a static drift factor built in. For example, if a physical machine loses 12 seconds every 30 days, NTP can compensate for that and does so very well. But if that machine can lose anywhere from 4 to 70 seconds every 30 days, NTP isn't so good at tracking that level of change.

What makes it really hard for NTP to keep up in a VM environment is that the local clock it sees can change its drift factor over the course of a minute. Depending on the frequency it is checking its parent time sources it can cause major drift-factor changes and cause it to go out-of-sync far more often. Out-of-sync time cascades throughout your organization.

NTP for a local network is a relatively low impact protocol with a very small memory footprint, and can happily piggy-back on your other network infrastructure servers like your DNS and DHCP servers. Some routers can also provide NTP functionality, so you may want to look into that.

Ideally you want two separate servers in separate locations that each sync against a different set of higher stratum servers. It would also be a very good idea of both time-servers were configured to use the other server as a 'peer', which will minimize the impact to time-service should one of the upstream time-sources go awry; there will be a stratum change but at least it won't report out-of-sync. And finally, be nice to your upstream time providers and configure your servers to go a very long time between polls once time is well established. This is the 'maxpoll' parameter on the 'server' line, and is a power of two in seconds between sync attempts.

If you absolutely had to use VMs for this, I'd set up no less than three such NTP servers. Each of those needs to be on a different host, and if possible in a different data-center. As with what I just suggested, they need different time-sources and should peer with each other. Then configure all of your NTP clients to use all three as Parent sources. Make sure your maxpoll values are low enough to never go more than an hour and a half between sync packets off-network, and 30 minutes on-network. Chances are good at least one of the three will be in-sync at any given time. For clients that can only talk to one time-host, they'll just have to put up with the occasional out-of-sync event. Overall, time-quality in this scenario would not be as exact as it would be with physical servers.

If I had to ball-park, I'd say your consensus time in the pure-VM environment would probably be within, oh, 30 to 100ms of true. In a purely physical environment, your consensus time would probably be within 10ms once the time servers had been up long enough for time to settle.