Electronic – Protection of BeagleBone Black from eMMC corruption on power loss

I'm considering an application where a BeagleBone Black, preferably running one of the recommended Debian images loaded in eMMC, is powered by a 5V source that can be removed at any moment. I want to prevent catastrophic data loss in this event, including if a write to file is underway. At least, I want the BBB to still be able to reboot from eMMC on return of power, repair the (probably ext4) file system, and run my application. If I can have the insurance that any file that was properly flushed is consistent, that's ideal.

The problem I fear is that, when power is lost while a write in underway in the eMMC, there could be data loss in an unpredictable sector, because a recent sector write handled by the eMMC triggered a Flash page erase, and the other sectors in that page are lost, and (because of wear leveling and sector reallocation internal to the eMMC) they belong to files or volumes that have not been written recently, including perhaps in read-only file systems or files. That problem is mentioned here:

a power cycle at the wrong time can destroy data anywhere on the (SD or eMMC) card, no matter where you THINK you're writing

and that could justify the warning in the BBB manual and here that

[powering down using the power button] will also help prevent contamination of the SD card or the eMMC.

as well at the terse warning made by what seems to be the primary eMMC source for the BBB

Avoid power-down during WRITE and ERASE operations.

To what degree can such data loss occur? What is the duration of the vulnerability window, if any?

Are there standard solutions? Recommendations?

Is there any built-in software mechanism against this, and how does it work? In particular, is the TPS65217C PMIC programmed to generate an NMI on loss of VDD_5V, aka AC, as the hardware allows, and is it handled in a way that stops ongoing eMMC write activity? If not, how can I at least add that?

Is stopping any eMMC-related activity good enough for the eMMC to enter a safe state by itself after some delay? Which delay?

What about a capacitor or Supercap (like Murata DMT3N4R2U224M3DTA0: 0.22F, 4.2V, 0.3Ω) on the battery connector to provide some power reserve?

Are there modes/settings in the eMMC to configure a write performance/safety compromise? What are the defaults, are they adjustable?

Important update: I have located authoritative information in the March 2010 eMMC specification about reliability when power is lost during write:

In general, an interruption to a write process should not cause corruption in existing data at any other address. However, the risk of power being removed during a write operation is different in different applications. Also, for some technologies used to implement eMMC, there is a tradeoff between protecting existing data (e.g., data written by the previous completed write operations), during a power failure, and write performance.

Lots of valuable information follows in that source, including description of how the eMMC declares that it promises (or not) to protect previously written data from haphazard erasure in case of power loss, and what seems to be truly write-protected areas.

Addition: a comment mentions that mobile phones use read-only file systems in eMMC, and are not known to experience corruption of that. However, mobile phones have a large battery, and that practically ensures they do not experience abrupt power loss in normal operation, thus are not concerned by the eMMC corruption mechanism that I fear.

Best Answer

All layers of the stack need to cooperate here.

The eMMC storage device has an integrated load balancer that needs to handle power loss situations gracefully. I'd expect it to do that, by pretending that incompletely written blocks were not written at all.

The file system needs to be able to handle power loss between block writes. The ext3 and ext4 file systems fulfill this, as they are journaled and enforce the ordering of writes so the journal accesses are kept separate from actual file system modifications.

The weak point in your setup is the package manager, which doesn't offer a good solution if the power fails during a package upgrade. Usually you should still be able to boot unless the error happened precisely during renaming files after an installation, and the files being renamed were important system files.

For almost all purposes, that window is sufficiently small that you can ignore the problem. You can shrink it more by running dpkg --configure -a and dpkg -iGROEB /var/cache/apt/archives during each boot -- these should get you back into a sane state from an interrupted update, at the expense of a bit of time, and they cannot handle cases where dpkg itself or the init system were damaged as a result.

Last but not least, you can set up a rescue system in an initrd that will check and try to repair and/or reinstall the system if needed, and anchor this in the bootloader.

Best Answer

Related Solutions

Electronic – FATFS porting on STM32F103 SPI Flash

Memory addressing in eMMCs

Related Topic