Electronic – Does fly-by order of bytes matter in DDR3+ design

ddr3pcb-designsdram

DDR3 introduces a Fly-by mode, which complicates memory controller, which now needs to account for different round-trip times amongst bytes, in exchange for a greater flexibility in PCB design.

I've looked through one of TI reference designs involving DDR3, which IIRC has a mention in the schematics, that control signals should go through bytes in a successive manner, starting from the lowest. And it was realised in PCB/Gerbers exactly this way.

Recently I've looked through yet another TI reference design with a similar CPU and DDR3. What caught my attention was the fact, that this design had lowest byte the farthest in fly-by chain!

I then tried to find any recommendation or standard excerpt in this regard, but to no avail. Every Appnote mentions that control signals go through every byte in a succession and then terminate to a half DDR power supply, but none stresses the order of bytes it should take.

My understanding is that during start-up each byte lane is examined and then measured delays are saved into memory controller runtime configuration space. After that each byte is treated independently from others taking these delays into consideration. So that during read operation, memory controller issues commands and waits for each byte after the delay previously measured. And vice-versa during write it issues bytes according to delays. Is this understanding correct? Or probably does memory controller need to receive bytes in a specific order?

And ultimately, does the order of fly-by topology going through bytes matter? Why or Why not?

Best Answer

There is no specific order of DQ lanes vs. their fly-by position on the clock. The delays are discovered by the calibration process, and as you noted, used to set up the memory controller.

This paper gives a good overview: https://www.nxp.com/docs/en/application-note/AN4466.pdf

Summary of the calibrations done:

  • ZQ (drive strength and termination)
  • Write leveling (set outbound delay of DQ and DQS based on per-lane skew)
  • DQS gating (set read DQS 'validity window', that is, round-trip latency)
  • Read DQS delays (set inbound DQS DLL to align with read data midpoint)
  • Write DQS delays (set outbound DQS delay to align with write data midpoint)

The standard for doing this is laid out in JESD79-3E, available here: https://www.jedec.org/document_search?search_api_views_fulltext=JESD79-3