Electronic – Impact of PCB revisions on its embedded software

embeddedpcbpcb-assemblypcb-design

Over the past couple of years we have done some minor revisions to our PCB, including

  • Different component selection (resistors / caps / …) based on availability / price (while still taking into account their original characteristics)
  • Components layout on the PCB
  • Trace widths / lenghts (as a result of the layout)

What we've often found is that although these changes shouldn't really impact the embedded software, we still needed to tweak our software after each of those revisions.

Some hardware revisions brought instability with them that only showed up after days / weeks / months (ex: the inability to power toggle a certain component on the board)

We needed to tweak high/low sequences, serial line timeouts

The big poblem is that these issue don't show up immediately, but sometimes take weeks/months before they start popping up, making it difficult to perform corrective action (especially if one of the issues is the failure to power toggle the modem needed to perform a firmware update).

Are there any guidelines / best practices to eliminate that risk ? (some kind of stress testing procedure / things we need to take into account while doing such revisions ?

Best Answer

The "best practice" here is called regression testing. Simply put, you want to have a batch of tests which

  • cover the core functionality of all components of your system
  • can be executed with minimum human intervention

The second requirement is important if you want to catch intermittent issues which surface only after some time. If the test is manual, you will be able to run it a couple of times. If the test is highly automated, you can run it repeatedly for a couple of days 24/7, spotting irregular events which don't happen every time (like power toggle failures).

It's also usually a good idea to include a stress test which uses the maximum CPU time / Memory space / communication bandwidth / electrical power.

Finally, if testing is a problem for your team, consider designing a system with higher tolerances. Include a power supply which is able to provide 50-100% extra power. If you have an MCU, make sure your stack is never used by more than 50% of its capacity and the CPU is idle at least 50% of the time. This won't eliminate the risk of course, but will reduce it significantly.