Electronic – What can be the cause of an exceptionally large latency for the UART receive interrupt

atmegainterruptsuart

I receive data on the UART, using an 8-bit atmega, usually around 5 bytes connected, then a long pause. The total time for one byte (with start+stop bits, I don't use parity) is 160 us. However, the receive interrupt is triggered 60 to 100 us after the stop bit, and nearly half the time it does not trigger at all! (checked with scope)

There were some quite long interrupts, so I blamed them, but after disabling every interrupt besides the UART, the situation remains the same. The UART interrupts finishes in under 10 us (typically 7us) all the time. The signal strength is OK, it's 5V, just as the supply voltage.

First, after realizing that a lot of bytes were lost, I was thinking that the signal frequency was the cause, but I double checked it: the baud rate is perfect, the signal quality looks good, the baud rate error is close to zero. If that was the problem, I would get some lost interrupts (because some bits were lost) but the rest should happen at the proper time, shouldn't it? In my case the interrupt comes very late, if at all, and even when in comes, sometimes it contains rubbish. The signal on the pin is OK, I can read and evaluate it correctly on the scope.

I searched for the typical latency of the UART interrupt, but could not find anything. I strongly suspect that a wild variation between 60 and 100 us should not be normal.

Best Answer

I'm assuming that you're using the same debugging process I would do in this case -- one of the first instructions in the interrupt routine turns on a LED, and one of the last instructions in the interrupt routine turns that LED off.

Then you used a dual-trace oscilloscope with one probe clipped to the appropriate pin to watch the bytes going into the UART, and the other probe clipped to the pin driving the LED.

I'm assuming your UART-handler interrupt routine ends with the return-from-interrupt instruction (rather than using the return-from-subroutine instruction used by normal instructions).

There are 4 things that can cause a long latency between the end of the last byte of a message and the start of the UART handler:

  • Some previous byte in the message triggering the UART handler, and somehow it takes a long time before interrupts are re-enabled. Some people structure their interrupt routines so that after the UART handler finishes storing a byte in the appropriate buffer, it checks a bunch of other stuff before executing the return-from-interrupt instruction -- it increases jitter and latency, but sometimes those people do it anyway because it improves throughput.

  • Some other interrupt taking a long time to execute before it re-enables the interrupts by executing the return-from-interrupt instruction. (If you can make each and every interrupt turn on and off some other LED, it's pretty easy to see on the o'scope if this is the problem or to rule this out).

  • Some non-interrupt code "temporarily" turning off interrupts. (This increases jitter and latency, but people do it anyway, because it's often the easiest way to prevent data corruption when both some interrupt and some main-loop background task both work with the same piece of data). (If you can make every bit of code that does this turn on and off some other LED, it's pretty easy to see on the o'scope if this is the problem or to rule this out).

  • Instructions that take a long time to execute.

The traditional way to figure out exactly what is causing the problem is to save the current version of your code (you're using TortoiseHg or some other version control system, right?), and then deliberately hack and slash at a temporary copy of your code, stubbing out and completely removing code a few subroutines at a time, re-testing after each round of deletions, until you have a tiny -- yet technically "complete" and runnable -- program that exhibits the same problem.

Far too often people show us bits and pieces of a complete program -- the parts those people think are relevant -- and we can't help them because one of the pieces they omitted is causing the problem.

The process of reducing a program to a small test case is a very useful skill, because often while going through that process, you quickly discover what the real problem is.

Once you have such a tiny -- yet runnable -- program, please post it here. If you figure out what the problem is during that process, please tell us that as well, so the rest of us can avoid that problem.