I have been hitting some hard-faults on the firmware I have created with FreeRTOS on a SAMD21 (ARM Cortex-M0) MCU.
So I took a further action to find out the cause and eventually bumped into this article on Code_Red pointing out the snippet mentioned below.
However, in this stage it's not clear for me how to use the numbers I have extracted after this method is hit.
Obviously I have bunch of memory locations, however, how can I make conclusions on which line of code caused the issue according to these locations?
BTW, the call stack has not been useful and only has a single in it which point to the current breakpoints in the HardFault_handlerC()
Thanks in advance for your help,
/**
* HardFault_HandlerAsm:
* Alternative Hard Fault handler to help debug the reason for a fault.
* To use, edit the vector table to reference this function in the HardFault vector
* This code is suitable for Cortex-M3 and Cortex-M0 cores
*/
// Use the 'naked' attribute so that C stacking is not used.
__attribute__((naked))
void HardFault_HandlerAsm(void){
/*
* Get the appropriate stack pointer, depending on our mode,
* and use it as the parameter to the C handler. This function
* will never return
*/
__asm( ".syntax unified\n"
"MOVS R0, #4 \n"
"MOV R1, LR \n"
"TST R0, R1 \n"
"BEQ _MSP \n"
"MRS R0, PSP \n"
"B HardFault_HandlerC \n"
"_MSP: \n"
"MRS R0, MSP \n"
"B HardFault_HandlerC \n"
".syntax divided\n") ;
}
/**
* HardFaultHandler_C:
* This is called from the HardFault_HandlerAsm with a pointer the Fault stack
* as the parameter. We can then read the values from the stack and place them
* into local variables for ease of reading.
* We then read the various Fault Status and Address Registers to help decode
* cause of the fault.
* The function ends with a BKPT instruction to force control back into the debugger
*/
void HardFault_HandlerC(unsigned long *hardfault_args){
volatile unsigned long stacked_r0 ;
volatile unsigned long stacked_r1 ;
volatile unsigned long stacked_r2 ;
volatile unsigned long stacked_r3 ;
volatile unsigned long stacked_r12 ;
volatile unsigned long stacked_lr ;
volatile unsigned long stacked_pc ;
volatile unsigned long stacked_psr ;
volatile unsigned long _CFSR ;
volatile unsigned long _HFSR ;
volatile unsigned long _DFSR ;
volatile unsigned long _AFSR ;
volatile unsigned long _BFAR ;
volatile unsigned long _MMAR ;
stacked_r0 = ((unsigned long)hardfault_args[0]) ;
stacked_r1 = ((unsigned long)hardfault_args[1]) ;
stacked_r2 = ((unsigned long)hardfault_args[2]) ;
stacked_r3 = ((unsigned long)hardfault_args[3]) ;
stacked_r12 = ((unsigned long)hardfault_args[4]) ;
stacked_lr = ((unsigned long)hardfault_args[5]) ;
stacked_pc = ((unsigned long)hardfault_args[6]) ;
stacked_psr = ((unsigned long)hardfault_args[7]) ;
// Configurable Fault Status Register
// Consists of MMSR, BFSR and UFSR
_CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;
// Hard Fault Status Register
_HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;
// Debug Fault Status Register
_DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;
// Auxiliary Fault Status Register
_AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;
// Read the Fault Address Registers. These may not contain valid values.
// Check BFARVALID/MMARVALID to see if they are valid values
// MemManage Fault Address Register
_MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
// Bus Fault Address Register
_BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;
__asm("BKPT #0\n") ; // Break into the debugger
}
Best Answer
So, here's the fun part: it may be impossible to cite exactly which line is throwing the fault. The reason is that a bug in your code may be causing a fault to appear elsewhere -or- the bug might be destroying all the state information in the system, which is super cool. What would really help, though, is to see your entire code base: including the linker scripts and startup code.
In general though, if you are ending up in hard-fault territory, Here are the first things I would check:
Faults caused by trying to dynamically allocate memory when there is no heap defined by your linker. What happens here is that some function is calling malloc (or one of its cousins) and library is failing because there is not enough space on the heap to allocate memory, so it crashes the program. This is a real possibility for you, you are using an RTOS & most vanilla linker scripts don't have heap space allocated. See this: https://stackoverflow.com/questions/10467244/using-newlibs-malloc-in-an-arm-cortex-m3
Faults caused by doing something silly like writing data past the end of an array. This can be really easy to do if you are using math to generate array indices or using pointers to elements directly. What (can) happen here is if your boundary checks are buggy, when you write data to your array, you may, in fact, just be overwriting everything! If this doesn't cause an error directly (e.g. writing to a read-only or protected location), it may just break your stack. Then you jump to a garbage location, and probably execute an invalid instruction and then fault.
I'd also take a look at this document, which is related to your Code Red post. Even though the instructions are for ARM Cortex-M3 and ARM Cortex-M4 the method of interpreting the results are the same.
Debugging Hard Fault & Other Exceptions
Those are just my top-two off-the-top reasons. If you post your entire code base, we can probably give you more direct help. Good luck!