Electronic – Low power strategy for dealing with spontainious bit flips durring sleep on AVR 8-bit

avrinterruptssleepwatchdog

I am designing a ATTINY-based circuit that is intended to run unattended for very long periods of time on a very small battery. The device spends almost all of its time in deep sleep, and only wakes briefly in response to a rare pin change interrupt.

I am programming defensively to ensure that the program will always return to a known state even in the face of spontaneous bit flips in almost any register (although I can't get a read on how likely these actually are).

There is one case that I can not figure out how to mitigate: a bit flip to an interrupt control registers that happens while sleeping.

The relevant bits seem to be…

PCMSKn – Pin Change Enable Mask. Must have a 1 for the corresponding pin to generate an interrupt.

PCIEn – Pin Change Interrupt Enable. Must have a 1 for any of the enabled pin to generate an interrupt.

GIE. Global Interrupt Enable. Must have a 1 for any interrupt to occur.

If any of these bits get flipped to a 0 while I am asleep, then the next pin change seemingly will not wake the processor and I am dead in the water with no way to recover.

One way to deal with this would be to set up a safety level 2 WatchDog to periodically reset the processor while I am sleeping, and write 1's to all the interrupt control bits on each reset. This would work great in theory and would seem to be bullet proof, except in practice enabling the WatchDog raises the sleep power consumption several orders of magnitude (from ~0.01uA to ~5uA @ 3V @25C) and thus would cut the projected lifetime of my device from decades to months.

What are some power efficient strategies for robustly dealing with this problem?

Best Answer

Can you use two pins for the interrupt? That way you could have two PCMSK bits set. You can also set up both pin change interrupts, which covers PCIEn. That leaves only GIE. I don't think there's anything you can do about that.

My (admittedly vague) understanding is that cosmic ray-induced bit flips are more of a problem for SRAM and DRAM, which have weaker feedback in their data storage. I haven't heard as much concern about registers. Personally, I'd be more worried about your stack and global variables since you don't have ECC RAM.

Regardless, to get a true high-reliability system, you need an MCU that's designed for it. There are products available with redundant hardware and ECC memory. Unfortunately, they are neither cheap nor low-power. These sorts of MCUs are normally used for dangerous applications like airbags and medical devices.

My suggestion is to program as defensively as you can, but accept that you can't eliminate every risk unless you're willing to pay for it. The odds of your GIE bit randomly flipping are very small. Unless a failure is going to hurt or kill someone, that level of risk is not worth worrying about.