This question was rewritten to remove several updates and improve clarity.
I have a Cortex M3 based (and rather obscure) MCU. I have a rather big project, written in C++, in Keil MDK5 with armcc compiler.
I have this function:
bool CanHandle::sendMsg(CanMsg & msg)
{
bool result = false; // <--------- problematic line
CAN_TxMsgTypeDef txMsg;
if (format == FrameFormat::EXTENDED)
{
txMsg.IDE = CAN_ID_EXT;
}
else
{
txMsg.IDE = CAN_ID_STD;
}
ENTER_CRITICAL_SECTION();
uint32_t bufN = CAN_GetEmptyTransferBuffer(set.mdrCan);
if (bufN != CAN_BUFFER_NUMBER)
{
CAN_Transmit(set.mdrCan, bufN, &txMsg);
isTxQueueFull = false;
}
else
{
isTxQueueFull = true;
result = false;
}
LEAVE_CRITICAL_SECTION();
return result;
}
When I compile with -O1, compiler produces this assembly listing for it:
0x0800068C E92D41FF PUSH {r0-r8,lr}
0x08000690 4604 MOV r4,r0
0x08000692 2700 MOVS r7,#0x00
0x08000694 7A20 LDRB r0,[r4,#0x08]
0x08000696 2600 MOVS r6,#0x00
0x08000698 F04F0801 MOV r8,#0x01
0x0800069C 2801 CMP r0,#0x01
0x0800069E D013 BEQ 0x080006C8
0x080006A0 F88D6005 STRB r6,[sp,#0x05]
0x080006A4 4860 LDR r0,[pc,#384] ; @0x08000828
0x080006A6 6800 LDR r0,[r0,#0x00]
0x080006A8 F3C00508 UBFX r5,r0,#0,#9
0x080006AC F8D401FC LDR r0,[r4,#0x1FC]
0x080006B0 F000FBCA BL.W CAN_GetEmptyTransferBuffer (0x08000E48)
0x080006B4 4601 MOV r1,r0
0x080006B6 2920 CMP r1,#0x20
0x080006B8 D009 BEQ 0x080006CE
0x080006BA 466A MOV r2,sp
0x080006BC F8D401FC LDR r0,[r4,#0x1FC]
0x080006C0 F000FBC4 BL.W CAN_Transmit (0x08000E4C)
0x080006C4 7266 STRB r6,[r4,#0x09]
0x080006C6 E004 B 0x080006D2
0x080006C8 F88D8005 STRB r8,[sp,#0x05]
0x080006CC E7EA B 0x080006A4
0x080006CE F8848009 STRB r8,[r4,#0x09]
0x080006D2 B004 ADD sp,sp,#0x10
0x080006D4 4638 MOV r0,r7
0x080006D6 E8BD81F0 POP {r4-r8,pc}
And here's the funny thing: when I try to step over line bool result = false;
, I get a hard fault with UNDEFINSTR flag set. PC recovered from stack shows some unexisting address.
But – and here's the really mysterious thing – if I step over assembly, or step into C code or set a breakpoint on line 2 and press run – everything is fine! No hardfault, program runs from there.
If I compile with -O0 or make result volatile, compiler produces different assembly and hard fault doesn't occur.
I tried using different versions of compiler or IDE – problem persists.
Running this program without debugger produces no fault.
Debugging in simulator produces no fault, but stepping over that particular line doesn't actually step over, it makes program runs indefinitely.
This function is called from main, so after it's end there is just while(1). I believe there is no problem in the outside code.
Code of the function is now at its minimum, if I remove any line, literally any line, problem goes away. I previously posted several wrong guesses, I removed them for now. I can't pinpoint any particular instruction or address that produces hard-fault.
All function calls there are dummies, they return immediately. CRITICAL_SECTION macros do not produce critical sections at all but something has to be at those lines or no fault occures.
I have literally no idea how that can happen. I don't know how debugger works exactly, what's the difference between stepping over and setting a breakpoint and hitting "run".
Best Answer
Some ideas which might help when debugging a hard fault on a cortex m4, maybe some of them are useful: - the line which causes the hard fault is put on stack at address +0x18, if the interrupt is synchronous, BFARVALID bit set, if not, it can be forced by setting the bit DISDEBUF from ACTLR system register.
Anothe thing, when executing code from flash, things like the wait state configuration, caches, prefetch buffers might sometime cause errors like this.
I currently have a similar issue, which seems to be influenced by a linker option(ignore_debug_references), not sure how yet...