Electronic – Firmware reliability for over temperature detection

embeddedreliability

I am speaking about an industrial equipment that is generating gas. It is under certification test (CE marking) IEC 61010 in a third party lab.

In the device there is an over temperature detection mechanism, which is aimed at stopping the device. Here is how it works:

  • Temperature sensor (LM335)
  • A uC is reading the temperature every seconds
  • After reading it compares its value with a hardcoded threshold, and if temperature is over the threshold, it raises the alarms and launches the stop procedure.
  • In a couple of seconds it goes in stand-by mode.

The lab making the certification tests tells us that it since the over temperature detection is made by mean of software, it is not reliable. And because it is not reliable, it is not compliant with CE Marking.

They also add that "It is considered by the testing labs that a software is all but a reliable component. Non exhaustive list: infinite loops, data corruption, EMC perturbation …"

It sounds completely extrapolated to me.

Is the testing labe right? Are software/firmware considered as not reliable?
If no, how shall I prove the lab that we designed our firmware component to be reliable?

Best Answer

Yes, they're 100% correct.

Safety-critical software/firmware and system design requires special considerations. You may find it worthwhile to sidestep the issue by using an approved mechanical over temperature limit as a safety, and the software limit becomes a kind of toy- if it works, it prevents the thermal fuse or whatever from failing, if doesn't nobody dies. That would probably be my recommendation if I was designing such a system. An analog sensor such as the LM335 could be fed in parallel to a comparator as well as your digital stuff, which would be way easier to certify, but as I said it might be easier to just insert an approved thermal cutoff in the power circuit and be done with it.

enter image description here

It makes sense to certify software in high-volume products such as garage door controllers, gas burner purge and ignition control systems and so on where you cannot easily side-step the safety issues, or in low-volume high price systems such as industrial or transportation controls, but in a low-volume relatively inexpensive system it probably makes sense to minimize the NRE costs.

UL1998 is one standard that I have some familiarity with, you may find this UL page useful- it has pointers to some of the IEC standards, of which one or another is probably applicable in your particular situation.

Lots of things can go wrong with the execution of firmware- and there are ways to reduce the likelihood of disaster- programming unused memory with jumps to cold-start, checking all kinds of stuff before blindly kicking a Watchdog Timer (WDT) (most people know to use a WDT, but it's not usually used to best effect), refreshing internal locations and external ports regularly instead of assuming they always will remain unperturbed. That's just a few things.. there are more.