Electronic – STM32F7 device freeze: cannot access registers

cdebuggingembeddedmicrocontrollerstm32f7

I am encountering random freezes with the STM32F7. This issue is difficult to debug as any debug session launched in Eclipse crashes when trying to halt the core, which makes it impossible to see where in the code the freeze occured.

Similarly, I am not able to access the device registers to inspect the program counter, the stack pointer and the IPSR, either via the eclipse debug session or using the ST-Link Utility software (the device resets when connecting and the state is lost).

I would appreciate any information on how to extract the device state when frozen, or whether any states can be recovered after a reset. Also if this situation sounds familiar to anyone, please let me know if you found a root cause. I am available to answer any questions and provide more detail.

additional details

(a) The hardware – I'm afraid I can't give you too much info here, it's a custom proprietary design.

(b) The history – This freeze issue has only become a problem recently. When I retested our older code however, the issue was seen, albeit less frequently (I think). I have been reverting bits of code to see if the issue still occurs. The issue is that I have no logs or debugging capabilities when the device freezes, so I cannot pinpoint where the code is failing.

The issue occurs infrequently, so it is difficult to say whether or not a change has had any effect.

(c) I haven't tried reducing the (system?) clock speed, and I will retest the device more thoroughly with different connections removed.

(d) Failure rate is roughly 1 failure per device per day, but it is difficult to quantify any improvement/regression (device is running all day).

(e) You ask about capturing data after a reset, but you report the problem is the MCU "freezing", not that it is resetting. I guess (but it isn't clear) that the problem is always that the MCU freezes. It's you who can perform a manual reset, and you are asking whether any useful info can be captured after manual reset. Is that correct?

Correct. Retrieving info during the freeze would be even better though.

Debug steps

(i) This problem is not reproducible with a test despite my best efforts.

(ii) what you are doing after that e.g. are you performing a manual reset?

Trying to read the registers, pinpoint where the code hung.

Best Answer

If the debugger is locked out, assuming power/clocking is OK, then the most likely outcome is a internal bus deadlock. As an example, if you have external XIP EEPROM and you have mis-configured the device, the peripheral could hang when you hit specific address/data patterns.

You may still be able to read RAM information after reset (or add some tracing to dump something to RAM). The core architectural registers are not reset (unless you have a high-reliability safety specific part), so if you put a B self in the reset handler (assuming the debugger 'halt after reset' doesn't work) then your old state should be preserved.

You can use ETM trace to capture realtime trace, and if you capture trace into the ETB you can just read this out after reset (again, it should be persistent) without needing a probe (but maybe the buffer pointer will be lost, so you might need to manually process the trace). ETM trace will show branches (addresses for any indirect branch) and exceptions (including lockup) - so you should get a reasonably accurate indication of what the last thing was. Speficically, the trace is just 'watching' the core, so it sees instruction retire without being affected by the bus activity. The DWT might give something useful too - depending on what actually fails.

As referenced in one of the comments, at least using MDK, you are able to attach to a 'hot' running target (without reset/halt, without code download), since the debug port is a completely asynchronous bus master embedded in the core. This gives you the ability to probe memory state on a long uptime device (but not core registers) with minimal intrusion, and this might be valuable to confirm a diagnosis. You could even script up a regular polling of memory locations if this is relevant.

Finally, since the debug port 'acts like the core', you are able to mimic quite a lot of bus deadlock failures by just using the debugger memory window.