Electronic – Why are SDRAM CAS latencies so high

dramtiming

I don't understand why the CAS latency of modern DDR4 memory is so high. I have no trouble understanding why the RAS latency is as high as it is — given the small amount of charge stored in each bitcell, it is not hard to imagine that it takes a while for the signals to stabilize enough for the sense amplifiers to get a reliable reading when opening a row in a DRAM bank, and also that this is a rather analog process that isn't directly comparable to other memory technologies.

However, the way I understand the architecture of DRAM, once a row has been opened, the output from the sense amplifiers has effectively been latched into some sort of buffer made in ordinary CMOS logic, so in other words some kind of SRAM bank, the size of which is the size of the row, so some 2-8 kB in modern memories. I can think of no other example of an 8 kB SRAM memory (at least among ones made on similarly high-end process nodes) that takes so long to access.

For example, a common CAS latency value of DDR4 memory is 18 cycles, which at a rough 1.5 GHz memory clock rate (at DDR4 3000), means a real time value of about 12 ns. Compare that to, for instance, the L1 cache of a modern, fast CPU, which despite being significantly larger (often at 32 kB), being a CAM (which I'd assume is more complex to read out than a directly addressed memory), and even having multiple read and write ports, yet requires much less time to fetch a value, often at or even below 1 ns (and that time even includes such processes as virtual-to-physical address translation). I understand, of course, that the DDR4 memory is communicating over a bus that is significantly more cumbersome to drive than the internal data lines on a chip, but that doesn't seem to be relevant here, seeing as how the cycles of the CAS latency are being consumed inside of the memory chips themselves.

I certainly have zero experience designing high-speed ICs, so I'm sure there's much I could be missing or misunderstanding, but I just cannot think of any mechanism by which it would take so long to read and write to the open row buffer. It would be very satisfying if someone could explain why this is.

Why the is the CAS latency of modern DDR4 memory so high?

Best Answer

The CAS latency delay is from the column selector logic, there are muxes and column selector logic in most DRAMS, Because of this even though the data is ready after the sense amplifier and latch, the data must still travel through a selector mux to get to the data buss, and this takes time. Below is a simplified diagram of a small DRAM.

enter image description here
Source: https://en.wikipedia.org/wiki/Dynamic_random-access_memory (with edits)

DDR3 looks something like the picture below, the column selector logic adds some delay and also the read latch and mux. CAS lets the memory controller know which clock cycle the data will be available after the correct row and mux has been selected which takes time.

enter image description here Source: https://en.bmstu.wiki/File:DDR3_Sheme.png (with edits)

enter image description here Source: https://www.eenewseurope.com/news/understanding-ddr-sdram-timing-parameters/page/0/2

I've never designed a Column selector but what I do know about digital design and logic, I'd imagine that the CAS timing is related to the size memory array because that would increase the size and timing of the column decoders. Once could also design faster column decoders with increase size and cost.

EDIT: The reason why the SRAM's latency is so fast is because it doesn't need a data selector mux, the memory goes straight to the data bus. There is no logic that propagates delays to the data bus and so as soon as the data is addressed it can be available on the bus (in the diagram below, you can see that in a typical SRAM the bus is tied right to the ram for minimal delay). In addition the memory cells for SRAMS are active transistors, not passive capacitors. So the memory cells in SRAMS can provide current to the sense amplifiers which would make the latency much lower (faster rise times). SRAMS are much more expensive in size and cost and also power, so they are only used where latency needs to be lowest. enter image description here
Source: https://computationstructures.org/lectures/caches/caches.html