NTP synchronization stays with no reach

ntpntpd

Usually, when a machine looses connection completely, ntpd misses a couple polls and marks all sources as failed sanity. Which seems quite logical. But I've met a situation when a server stays marked as current time source while its reach turned 0.

Sever is deployed in a same subnet as a target machine providing very low delay, offset and jitter. The situation was modelled by shutting down the connection physically: just unplugging a cord from a client machine. I tried to recreate this, but since then the same machine always loses synchronization status nicely after 5-6 unsuccessful polls.

The real question is: what exactly determines the synchronization status when the connection is lost?

Best Answer

There is a deffinite explanation about reach register in RFC-1305:

The reachability register is shifted one position to the left, with zero replacing the vacated bit. If all bits of this register are zero, the clear procedure is called to purge the clock filter and reselect the synchronization source, if necessary. If the association was not configured by the initialization procedure, the association is demobilized.

Howewer RFC-1305 is obsoleted by RFC-5905, which is not that destinctive:

Next, the 8-bit p.reach shift register in the poll process described in Section 13 is used to determine whether the server is reachable and the data are fresh. The register is shifted left by one bit when a packet is sent and the rightmost bit is set to zero. As valid packets arrive, the rightmost bit is set to one. If the register contains any nonzero bits, the server is considered reachable; otherwise, it is unreachable.

No clear procedure is mentioned in Section 13. But still it looks like an unreacable peer should loose its synchronation status at some point.

I've managed to recreate synchronized status with reach 0 situation to ensure that it is rare and not at all permanent. It took disabling "burst" in servers configuration and breaking the connection right after the synchronization.

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 91.198.10.4     194.190.168.1    2 u   20   64  177   51.137   -2.192  11.049
 192.168.1.1     193.67.79.202    2 u   65   64   77    0.459   -1.818   0.922
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*91.198.10.4     194.190.168.1    2 u   21   64  177   51.137   -2.192  11.049
+192.168.1.1     193.67.79.202    2 u    -   64  177    0.449   -3.192   1.828

The reach was 177 which is 01111111 in binary. So it took 7 polls to establish the synchronization.

The synchronization then were lost at this posotion:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+91.198.10.4     194.190.168.1    2 u  574   64    0   63.846   -9.652   0.756
*192.168.1.1     193.67.79.202    2 u  553   64    0    0.449   -3.192   0.505
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 91.198.10.4     194.190.168.1    2 u  575   64    0   69.871  -10.409   0.002
 192.168.1.1     193.67.79.202    2 u  554   64    0    0.449   -3.192   0.505

When numbers are little strange as 64*9 = 576 not 575, but i guess, the representation might be 1 second inaccurate. Considering this, it took 9 unsuccessful polls to break the synchronization status.

So, considering both theory and practice, it looks like the state in which source with 0 reach might be considered current time source is possible, but also rare and temporary.