Electronic – Programs resistant to hardware issues

embeddederror correctionprogrammingreference-materials

I recall at one point reading about embedded development where the programmer took into account things like memory corruption and possibly other hardware issues. For example:

  1. If an instruction in memory is somehow corrupted, the program would run correctly anyway.
  2. If the value of some variable in memory is changed, the program will still produce the correct result.

Dealing with #2 seems like a reasonable application of error correcting codes, but #1 seems to me like it would be very difficult. Does anyone know of any references or examples of someone doing that in software?

Best Answer

There are various techniques to reduce the problem like the ones you mention, but there is no 100% solution.

  • Memory corruption can be corrected by error correcting (ECC) memory, at the cost of extra memory and the correcting hardware itself (which causes extra delay). In some cases you must take care to access all memory regularly to prevent single-bit errors to develop into (uncorrectable) multi-bit errors.

  • Sensors are often a source of problems. Reading multiple values and averaging and/or throwing out the outliers helps.

  • Processors can fail, and software can contain bugs. The space shuttle is a famous example of multiple processors (not all of the same type!) and software written by independent sources. Arbitrating between processors/programs that claim different results can be tricky.

  • In most cases an occasional problem can be tolerated if it is detected and handled in a safe (or otherwise satisfactory) way. This can vary from halting the system to offering degraded performance.

In practice you will have to assess which problems are likely to occur, and then find ways to handle those problems. There is no catch-all solution.