Electronic – In clock recovery, how is the recovered clock used to recover data

clockclock-recoveryflipfloppll

I've been refreshing my memory on clock recovery, and I've hit some issues trying to understand how the recovered clock can be practically used to latch data bits from the input data stream.

For simplicity, let's assume a NRZ line code, such as 8b10b. Because of NRZ encoding, the data stream will transition if a logical 0 follows a logical 1 or vice versa. Any transition on a wire carrying NRZ data will be due to the transmitter clock latching a new bit to be sent.

Assume an analog PLL which generates a square wave on the VCO, an edge detector on the data stream input which creates positive pulses on each transition of the data stream (see page 34 of 2), and a positive-edge-triggered Phase-Frequency detector (see page 35 of 2) which generates the phase difference.

In a traditional clock recovery setup using a PLL/edge detector, the recovered clock's positive edge will eventually align to the transitions in the received bit stream, and thus be aligned to the transmitter's clock.

The problem I see with aligning to the transmitter's clock is that when using the recovered clock to latch the input data into a flip-flop, immediately after latching (or possibly even before due to jitter! A PLL can't lock on to the exact frequency), the data stream as seen by the flip-flop's input will transition. Although small, this is a hold (setup) time violation for the flip-flop. Additionally, I recall that sampling as far away from transitions as possible is ideal to accommodate for jitter.

However, none of the sources I've been reading discuss any solution to my perceived issue of "using the recovered clock as-is to shift in input data". The closest I've seen is a diagram implying that the recovered clock should clock a flip-flop fed with the input data stream.

The naive solution I would use would be to "invert the recovered clock before feeding it to the flip flop which latches the input data". Assuming the problem I perceive exists, what solutions are used to work around the issue?

Best Answer

I think this may be glossed over in some of the literature because obviously you want to sample the data in the middle of the data bit (sampling accurately in the middle of the data bit is a large part of ensuring high jitter tolerance), so obviously you're going to phase shift or delay the clock or the data somewhere along the line by 90 or 180 degrees, so it's not necessarily mentioned. There are a bunch of ways of doing that. Invert the clock is one. Fixed phase shift with an analog technique such as a filter, hybrid coupler, or delay line also works. Quadrature or differential outputs on the VCO is another option. If you're not using a PLL but are instead using phase interpolators or tapped delay lines, the usual solution is to use two phase interpolators or delay line taps that are held 90 degrees apart by the control logic, one to look at the edges and the other to look at the data.

Let's take a look at a couple of commercial parts and see how they do it. First, a 30 Gbps GTY transciever out of a Xilinx Ultrascale FPGA (from UG578 page 192):

Virtex Ultrascale GTY RX CDR block diagram

There you have it, two phase interpolators, one looking at edges and one looking at data. The control logic detects transitions and checks to see if it's sampling the edges too early or too late and adjusts the phase interpolator taps accordingly, keeping a 90 degree offset between the two so the data is always sampled exactly half way between the transitions that it has locked on to. It can track a frequency difference between the internally generated reference frequency (half line rate) and the actual receive line rate of up to +/- 200 ppm (above 8 Gbps).

Here's what those sample points look like (UG578, page 193):

CDR Sampler Positions

How about a part that actually uses a PLL with a VCO to recover the data? Well, this technique seems to have fallen out of favor, at least for modern high speed serial stuff. Not entirely sure why, but I presume it's beause building VCOs is a pain and if you use phase interpolators then you can share a VCO across several transmitters and receivers instead of requiring one per receiver. Anyway, here is the block diagram for a Lucent LG1600FXH, an older (1999!) part for retiming SONET up to 5.5 Gbps (LG1600FXH datasheet, page 2):

LG1600FXH block diagram

Hey, look at that, their VCO has quadrature outputs! Actually, that's a bit of a red herring. In this case they are using the in-phase output to clock the capture flip-flop, but they also aren't locking the VCO onto the data directly, they're locking onto the output of an edge detector (LG1600FXH datasheet, page 3):

LG1600FXH Frequency and Phase Detector

The edge detector uses a tuned delay line and an XOR gate to produce pulses that the PLL locks on to. These pulses start on transitions, but the pulses are tuned by the delay line to be exactly half of the data bit width (LG1600FXH datasheet, page 3):

LG1600FXH Timing diagram

It looks like with the way the phase detection logic works out, the PLL actually locks inverted with respect to the edge pulses. Because of the tuned delay from the edge detector, the PLL locks with the in-phase output rising edge smack in the middle of the data bit.

I will also note that the LG1600FXH is actually a hybrid integrated circuit with several discrete components on a ceramic substrate. That's probably the only real way to get away with building a stub delay line based edge detector like that. The LG1600FXH datasheet also has a rather extensive theory of operation section; I recommend taking a look at it.

A major advantage of the phase interpolator based CDR circuits is that they are usually capable of operating over a very wide range of line rates, and they are relatively easy to reconfigure for a different line rate. For example, the GTY transceivers in the Xilinx Ultrasacale series FPGAs are capable of covering essentially the entire range of 500 Mbps to 30 Gbps, switching between two different PLLs and several divider settings as necessary. PCI express links always come up initially in gen 1 mode (2.5 Gbps per lane), and then negotiate up to higher speeds (gen 2 at 5 Gbps or gen 3 at 8 Gbps per lane). The links can also be renegotiated on-the-fly for power/performance trade-offs (for example, a laptop discrete GPU falling down to gen 1 when not being actively used, then switching to gen 2 or gen 3 when watching a video or playing a game).

For the LG1600FXH and other CDRs based on analog delay methods, the problem is that the edge detector generates pulses of a fixed duration. The result of this is that the range is much more limited, only a handful percent around the design line rate. As the line rate diverges from the design line rate, the jitter performance will degrade as the data sampling point moves away from the center. Even further out, and the edge detector and phase detectors won't work properly, causing the PLL to not lock reliably. And the delay line cannot be re-tuned as it's physically cut to length during manufacture.