Some time ago I implemented a GMII interface for my Gigabit Ethernet core. Now I'm trying to do the same with the RGMII protocol. The reference implementation from Xilinx uses IDELAY[|E1|E2] primitives to adjust the input delay. I would like to do the same with ODELAY…
The reference implementation uses two transmit clocks: TX_Clock
and TX_Clock90
(90° phase shifted). The normal clock is used for ODDR registers and the phase shifted clock is send to the PHY device.
I know I could generate that clock with a DCM/MMCM or PLL but I don't want redesign my entire Gigabit Ethernet stack to provide a new TX clock for my physical abstraction layer.
I thought, I could use an ODELAY primitive to shift the TX_Clock as desired. But how do I calculate the tap/delay count?
DS182 – Kintex-7 DC Switching Guide – page 28 defines 'ODELAY chain delay resolution' as follows:
\$T_{IDELAYRESOLUTION} = \dfrac{1}{32 \cdot 2 \cdot F_{REF}} = \dfrac{1}{32 \cdot 2 \cdot 200\,MHz} = 0.078125\,ns = 78.125\,ps\$
My IDELAYCTRL primitive is sourced with a 200 MHz reference clock and TX_Clock runs at 125 MHz (8 ns). A phase shift of 90° would means a delay of 2 ns, which equals to 25.6 delay taps of 78.125 ps.
So is it correct to set the ODELAY's value to 26?
Abbreviations:
GMII – Gigabit Media Independent Interface
RGMII – Reduced GMII (using DDR technique)
ODDR – DDR output register
Best Answer
Maybe it's not a direct answer to your question, but i want to draw your attention on the following possible workarounds:
skew rate is controllable both at RGMII PHY and FPGA IC
Typically RGMII PHY implements a de-skewing mechanism (e.g. KSZ9021 can absorb skews up to 1.8 ns, very near to that what you need), therefore (if your phy has it, of cause) you activate it. Shift (delay) the clock at your PHY keeping the data the same. Shaded areas on the picture below explains this graphically.
Also, additionally to the PHY shifts if not enough, you can correspondingly configure slew rate at FPGA slowing the data while fasting the clock.
pcb tracing could be yet flexible, not dogmatic
(if, of cause) You can (then) route the clock trace "proportionally" longer than the data (or vise versa, depending on direct/inverted clocking).
fpga drives tx lines
You can use your output drivers normal (instead of DDR) and control them thru multiplexing like
assign output[0:3] = (txclk) ? txdata[0:3] : txdata[4:7]
.Your way (not only calculations) about ODELAY looks correct, but i (and i think anybody) cannot confirm/refute it because that correctness can be approved finally only in the design (board) where various side effects, like clock jitter, which are difficult to predict and simulate, can be observed and estimated.
Also, it seems slightly strange that you use non-integrally-divisible clocks of 125 (=25*5?) and 200 (=25*8?) MHz instead of 125 and 250 being divisible integrally (i.e. 250/125=2). In the case of a single-sourced, phase-aligned, divisible clocks pair, you could use the highest one to drive the lines changed on the lowest clock, with non-DDR outputs too.
EDIT 1
if TX_Clock is the transmit logic reference clock (i.e. the block is built around
always @(posedge TX_Clock)
) then the ODDRs (in SAME_EDGE mode) should use its 90-deg shifted version, i.e. TX_Clock90, not vice-versa. But you wrote:Is it correct? Could you give the link to "The reference implementation" you mentioned here?
Also, the transmit clock to an RGMII PHY should be generated as
to be synchronous phase-by-phase with the data signals, RGMII_TXDs and RGMII_TXCTRL, as the RGMII protocol requires it.
This is noted in the 7 Series SelectIO Guide too:
Again, if you avoid using DCM, how do you plan to work when your PHY will in slave mode 1000BASE-T or in DPLL-based receive mode 1000BASE-X/SGMII, for both where GMII_RXCLK is a low-quality CDR-based one that could not be used directly to clock the receive logic and also the transmit logic in 1000BASE-T?
Edit 2
First, you need to distinguish what do yo want: "pure" RGMII (referred to as Original GMII in the document you mentioned) or "clock-shifted" RGMII (RGMII-ID in the document). Your rgmii.vhdl code is about the "shifted" one. Here, i recommend you to re-choose yourself to "pure" RGMII because (from the RGMII document dated 2002 and from PHY/SERDES ICs i used) any modern GbE PHY supports clock/data shifting and your has no need to sophisticate your code.
Second, for any value you'll selected for ODELAY, you'll need to approve and a hundred to one to tune it on the live board by an oscilloscope in your hands. 26 is normal, let it be your initial tap for step-by-step iterating.
Also, i recommend you to ask a new question like
without the tags "ethernet" and "gigabit" because, as i see, your interest is about xilinx-fpga-oddr-odelay in total, with nothing about ethernet-gigabit.
Good luck.
P.S. From the code your shown, the MAC is expected to update the data at
posedge !tx_clk90
while, as i can assume, your initial GMII client code has no such expectation.