Electronic – How does SDRAM refresh interact with ECC

error correctionsdram

I'm trying to understand how SDRAM hardware works if it also has ECC capability.

If a memory system has ECC capability it will be able to correct a single bit error in a block of memory and detect, but not correct, a multi-bit error. The way I understand it is that when the block of memory is read, the Error Correcting Code for that block is also read, and if there is a single bit flipped from what was originally written, it is automatically corrected by the memory controller.

Now SDRAM, by its nature, must refresh the data it holds or it will be corrupted over time. So it needs to read its memory cells and then rewrite the data back on a regular basis. From what I've read, this refresh read is slightly different from a regular memory read since it doesn't actually have to send data over the bus to the CPU so the refresh happens a whole bank at a time and just reads and writes the data back into the same cells without touching the bus.

My question is, does the ECC process come into play during the regular SDRAM refresh reads, or is that process bypassed in order to make the refreshes as fast as possible so that they won't tie up the memory system and inhibit regular memory accesses? If I have ECC SDRAM, do the single bit errors get automatically corrected on every refresh cycle or does the memory controller wait until an official memory access to detect and correct single bit errors?

Perhaps the answer might depend on the particular memory controller. I'm reading through the datasheet for an Intel 855GM/855GME Graphics and Memory Controller Hub to see if that particular controller does what I'm describing, but I haven't found an answer yet.

Best Answer

For starters, SDRAM Refresh does not technically move the data outside of the chip. At an academic level it is reading the data and writing the data back, but the SDRAM Data pins does not see that data-- it is done internally to the SDRAM chip itself. The SDRAM controller tells the SDRAM to do the refresh, but that is all that is seen externally.

ECC is done outside the SDRAM chip, in the SDRAM controller (usually located inside the CPU or chipset). There are also many different SDRAM controllers that support ECC, so it is hard to make general statements that are always correct. But I'll give it a shot.

When a memory location is read, and the data is corrupted but correctable, the corrected data is usually written back to RAM.

Some ECC controllers will use "inactive" time to read every memory location and, if there is a correctable error, write the corrected data back. The idea here is that this prevents a single bit error that is correctable from turning into an uncorrectable multi-bit error due to further "bit rot". There is a term for this feature that I am forgetting at this moment.

Reading every memory location is a nice idea, but on more modern computers this cannot be relied upon for effectively refreshing the SDRAM. Modern machines have a lot of memory and it takes a lot of time to read it. The built in refresh of the SDRAM chips works quite well. And doing this takes away valuable memory bandwidth from the CPU.

It is much better to just use the normal refresh, and then scan memory for errors in a low-priority task.