Electronic – Synchronize Outputs of Separate FPGAs Within 1ns

fpgatimingtiming-analysis

Edit: I have been able to achieve 560ps uncertainty in simulation by using external PLL feedback through the entire chip. Once I verify in real hardware I will post a complete solution.

I'm trying to synchronize the outputs of two physically separated Spartan-6 FPGAs. Ideally, the skew between the two device's outputs would be less than 1ns.

The outputs are differential pins controlled by an OSERDES2.

One possibility might be to distribute a reference clock to the two FPGAs. The signal would be routed so that it arrives at each FPGA at "exactly the same" time.

The reference clock would be passed through a PLL, with feedback through a BUFIO2_FB device. As shown in the clocking resources example:

Will this feedback mechanism cancel out clock insertion delay, so that the IOCLK is in phase with clock input pin?

The ISERDES2 in the same IOCLK domain would register a reference data signal which simply denotes a particular clock edge to align the output with.

Are there other device variations which will prevent the outputs from being deterministically in phase?

For example I'm concerned the propagation delay from OSERDES2 to output pin could vary greatly between devices. Based on timing datasheet:

Does this specification of 0.94 ns mean that the propagation can be anywhere from 0 to 0.94ns across different devices?

Is there any more sound approach to this problem?

Best Answer

Is there any more sound approach to this problem?

Yes!!

Put both of the FPGA functions into a single device.

Trying to maintain determinism to with 1ns for separate devices is close to a pipe dream.

Related Solutions

Electronic – Understanding max and min propagation delay in flip-flops

The first equation tells you the maximum propagation delay allowable through a block of combinational logic between two clocked registers. For ideal flip-flops the delay would just be the clock period, \$T_C\$, but for real flip-flops you have to subtract the time from the clock edge until the inputs of the combinational block (the outputs of the first set of flip-flops) become stable, \$t_{PCQ}\$, and the required setup time (when the outputs of the combinational block must be stable before the clock edge on the second set of flip-flops), \$t_{setup}\$.

The other side of the timing analysis is that the flip-flop inputs must remain stable for a short period of time after the clock edge to make sure that the inputs are properly latched. This characteristic of a flip-flop is called the hold time, \$t_{hold}\$. In order to satisfy the second flip-flop's hold time requirement, it is necessary that the clock-to-q contamination delay, \$t_{ccq}\$, plus the contamination delay of the combinational logic, \$t_{cd}\$, must be greater than the flip-flops hold time requirement. Rearranging the equation, you can specify the relationship between a characteristic of the combinational logic, \$t_{cd}\$, to the two characteristics of the flip-flops, \$t_{hold}\$ and \$t_{ccd}\$.

In practice, you will have many combinational paths in a block of logic between two sets of registers. For the first equation you should use the largest propagation delay of any path, while for the second you should use the smallest contamination delay on any path.

Electronic – generate a 90° Clock signal with Xilinx’s ODELAY for RGMII

Maybe it's not a direct answer to your question, but i want to draw your attention on the following possible workarounds:

skew rate is controllable both at RGMII PHY and FPGA IC

Typically RGMII PHY implements a de-skewing mechanism (e.g. KSZ9021 can absorb skews up to 1.8 ns, very near to that what you need), therefore (if your phy has it, of cause) you activate it. Shift (delay) the clock at your PHY keeping the data the same. Shaded areas on the picture below explains this graphically.

Also, additionally to the PHY shifts if not enough, you can correspondingly configure slew rate at FPGA slowing the data while fasting the clock.
pcb tracing could be yet flexible, not dogmatic

(if, of cause) You can (then) route the clock trace "proportionally" longer than the data (or vise versa, depending on direct/inverted clocking).
fpga drives tx lines

You can use your output drivers normal (instead of DDR) and control them thru multiplexing like assign output[0:3] = (txclk) ? txdata[0:3] : txdata[4:7].

Your way (not only calculations) about ODELAY looks correct, but i (and i think anybody) cannot confirm/refute it because that correctness can be approved finally only in the design (board) where various side effects, like clock jitter, which are difficult to predict and simulate, can be observed and estimated.

Also, it seems slightly strange that you use non-integrally-divisible clocks of 125 (=25*5?) and 200 (=25*8?) MHz instead of 125 and 250 being divisible integrally (i.e. 250/125=2). In the case of a single-sourced, phase-aligned, divisible clocks pair, you could use the highest one to drive the lines changed on the lowest clock, with non-DDR outputs too.

EDIT 1

if TX_Clock is the transmit logic reference clock (i.e. the block is built around always @(posedge TX_Clock)) then the ODDRs (in SAME_EDGE mode) should use its 90-deg shifted version, i.e. TX_Clock90, not vice-versa. But you wrote:

The normal clock is used for ODDR registers and the phase shifted clock is send to the PHY device.

Is it correct? Could you give the link to "The reference implementation" you mentioned here?

Also, the transmit clock to an RGMII PHY should be generated as

ODDR  otxclkbuf ( .D1(1), .D2(0), .C(TX_Clock90), .CE(1), .SR(0), Q(RGMII_TXCLK) )

to be synchronous phase-by-phase with the data signals, RGMII_TXDs and RGMII_TXCTRL, as the RGMII protocol requires it.

This is noted in the 7 Series SelectIO Guide too:

Clock Forwarding

Output DDR can forward a copy of the clock to the output. This is useful for propagating a clock and DDR data with identical delays, and for multiple clock generation, where every clock load has a unique clock driver. This is accomplished by tying the D1 input of the ODDR primitive High, and the D2 input Low. Xilinx recommends using this scheme to forward clocks from the FPGA logic to the output pins.

Again, if you avoid using DCM, how do you plan to work when your PHY will in slave mode 1000BASE-T or in DPLL-based receive mode 1000BASE-X/SGMII, for both where GMII_RXCLK is a low-quality CDR-based one that could not be used directly to clock the receive logic and also the transmit logic in 1000BASE-T?

Edit 2

First, you need to distinguish what do yo want: "pure" RGMII (referred to as Original GMII in the document you mentioned) or "clock-shifted" RGMII (RGMII-ID in the document). Your rgmii.vhdl code is about the "shifted" one. Here, i recommend you to re-choose yourself to "pure" RGMII because (from the RGMII document dated 2002 and from PHY/SERDES ICs i used) any modern GbE PHY supports clock/data shifting and your has no need to sophisticate your code.

Second, for any value you'll selected for ODELAY, you'll need to approve and a hundred to one to tune it on the live board by an oscilloscope in your hands. 26 is normal, let it be your initial tap for step-by-step iterating.

Also, i recommend you to ask a new question like

how to program ODELAY right (in Xilinx FPGA)?
how to shift a clock by 90 degrees without DCM/PLL (in Xilinx FPGA)?
how to use ODDR having only one clock and no its shifted replica (in Xilinx FPGA)?

without the tags "ethernet" and "gigabit" because, as i see, your interest is about xilinx-fpga-oddr-odelay in total, with nothing about ethernet-gigabit.

Good luck.

P.S. From the code your shown, the MAC is expected to update the data at posedge !tx_clk90 while, as i can assume, your initial GMII client code has no such expectation.

Best Answer

Related Solutions

Electronic – Understanding max and min propagation delay in flip-flops

Electronic – generate a 90° Clock signal with Xilinx’s ODELAY for RGMII

Related Topic