Electronic – MicroSD with ECC

error correctionflashmicrosd

Some MicroSD cards (mostly Transcend) claim to have an ECC controller on the chip. I was wondering how this works? Do they verify all data written to it, and automaticly relocate the sectors to different positions in case they detect a mismatch?

If so, how come they can have such a high speed rating (Class 10), when they have to read back all the data that is written, and sometimes even have to write the data twice?

Best Answer

does [ECC] mean you cannot use 10% of the available space on the card?

No. The ECC does not use any of the available user data space on the card. Instead, the ECC is stored in separate "spare areas" next to the user data.

spare area

Many flash memory chips are designed such that, for each 512 byte sector intended for user data, there is an additional corresponding 16 byte "spare area". (Jim Cook. "Flash memory 101: An Introduction to NAND flash".)

Those spare areas, also called "out-of-band areas" (OOB areas), are designed to be used for ECC codes. (Eureka Technology. "NAND Flash FAQ". "What is spare column in NAND flash?". ) ( Is flash chip capacity really limited to the powers of 2? )

In theory, a person could program the flash memory controller directly attached to these flash memory chips to use those "spare area" bits for whatever he wanted.

A few flash file systems directly connected to these flash memory chips use that "spare area" in sophisticated ways to store file system data and more complex error correction codes, instead of Hamming ECC data -- object IDs, wear-leveling data, etc. (Wikipedia: YAFFS). (Compared to the simple "Hamming" system described below, such file systems have some other approach to detecting and dealing with bit errors -- Reed-Solomon codes, RAID-like multi-chip systems, or etc.).

SD/MMC cards have a tiny flash memory controller inside them, between the SD/MMC interface pins and the actual flash memory chips. ( see a b c for photos of SD card internals, showing the small controller chip and the much larger flash memory chip.).

ECC write

how come they can have such a high speed rating

When your computer or microcontroller sends the "write a 512 byte sector" command to the SD/MMC card, the sector is erased and then written. There's not room here to discuss the things manufacturers do to make flash faster, so I'll stick to explaining why ECC cards are not noticeably slower than a (hypothetical) card without ECC.

My understanding is that all SD/MMC controllers should at least write Hamming ECC codes to the spare area for each block, unless they use use some more sophisticated error-correction system. ("ST AN1823: Error Correction Code in NAND Flash Memories") (Is there some way to tell whether the card was really doing that or not?)

The flash memory chip is designed so the entire page (user data + spare area) is erased at once. This erase would take the same amount of time on a flash chip without ECC, so the ECC doesn't slow erases down.

The controller calculates the ECC data, and blindly(*) writes the user data and the ECC data to the flash. Hamming codes can be calculated "at wire speed" while the user data is coming through the SD/MMC interface or while the user data is being sent to the flash card, so the ECC calculations don't slow the writes down.

The controller does have to write an "extra" 16 bytes of data for each 512 byte sector, but that is not noticeably slower than a hypothetical flash chip without ECC: 1/32 ~= 3% slower.

(*) SD card controllers don't bother reading back the data during a write to see if it worked. They only detect problems much later, while handling a "read" command. Alas, I hear rumors of unscrupulous people taking perfectly good 1 GB cards and reprogramming the controller into thinking there's actually 4 GB of flash connected to it. The resulting fake/counterfeit/forged card seems to work fine for reads and writes until the first 1 GB is filled up; it even seems to work fine during (blind) writes to the rest of the "4 GB", but then things go horribly wrong trying to read data back from the card.

ECC read

a significant amount of time to recover errors ... because it needs to process the original data as a whole

Yes, Hamming codes needs to process the "whole block", but for SD cards the "whole block" is only 512 + 16 bytes.

When your computer or microcontroller sends the "read a 512 byte block" to the SD/MMC card, the SD/MMC card controller reads the full 512 user bytes + 16 ECC bytes, corrects the errors (if there were any), and sends the corrected 512 bytes of user data to your computer or microcontroller (or sends an error code if there were so many errors in the user data or in the ECC data or both that they were uncorrectable).

Because the on-card controller can read 512 + 16 bytes from the flash and do Hamming correction much faster than it trickles response codes and data through the card interface, this is not noticeably slower than a (hypothetical) card without ECC.

As far as I know, SD/MMC cards don't use Reed-Solomon error correction -- but I suppose they could do that internally, and there's no way to tell from the outside.