I'm assuming that you're using the same debugging process I would do in this case --
one of the first instructions in the interrupt routine turns on a LED,
and one of the last instructions in the interrupt routine turns that LED off.
Then you used a dual-trace oscilloscope with one probe clipped to the appropriate pin to watch the bytes going into the UART, and the other probe clipped to the pin driving the LED.
I'm assuming your UART-handler interrupt routine ends with the return-from-interrupt instruction (rather than using the return-from-subroutine instruction used by normal instructions).
There are 4 things that can cause a long latency between the end of the last byte of a message and the start of the UART handler:
Some previous byte in the message triggering the UART handler, and somehow it takes a long time before interrupts are re-enabled. Some people structure their interrupt routines so that after the UART handler finishes storing a byte in the appropriate buffer, it checks a bunch of other stuff before executing the return-from-interrupt instruction -- it increases jitter and latency, but sometimes those people do it anyway because it improves throughput.
Some other interrupt taking a long time to execute before it re-enables the interrupts by executing the return-from-interrupt instruction. (If you can make each and every interrupt turn on and off some other LED, it's pretty easy to see on the o'scope if this is the problem or to rule this out).
Some non-interrupt code "temporarily" turning off interrupts. (This increases jitter and latency, but people do it anyway, because it's often the easiest way to prevent data corruption when both some interrupt and some main-loop background task both work with the same piece of data). (If you can make every bit of code that does this turn on and off some other LED, it's pretty easy to see on the o'scope if this is the problem or to rule this out).
Instructions that take a long time to execute.
The traditional way to figure out exactly what is causing the problem
is to save the current version of your code (you're using TortoiseHg or some other version control system, right?),
and then deliberately hack and slash at a temporary copy of your code,
stubbing out and completely removing code a few subroutines at a time,
re-testing after each round of deletions,
until you have a tiny -- yet technically "complete" and runnable --
program that exhibits the same problem.
Far too often people show us bits and pieces of a complete program --
the parts those people think are relevant --
and we can't help them because one of the pieces they omitted is causing the problem.
The process of reducing a program to a small test case is a very useful skill,
because often while going through that process,
you quickly discover what the real problem is.
Once you have such a tiny -- yet runnable -- program,
please post it here.
If you figure out what the problem is during that process,
please tell us that as well, so the rest of us can avoid that problem.
It looks like 4800 bps is the correct speed. The 9600 data is obviously (!) the same data sampled twice as often. Here is how you do that analysis:
Here's the 9600 baud data as it would appear as a bit sequence. The data is written LSB first, and I've represented the start and stop bits as lower-case o
(zero) and i
(one), respectively.
|06 ||3F ||60 ||0C ||FE ||80 ||60 ||CC |
o01100000io11111100io00000110io00110000io01111111io00000001io00000110io00110011i
Here's the 4800 baud data, stretched out to the same time scale:
| 71 | | 24 || 0F | | A4 |
o 1 0 0 0 1 1 1 0 io 0 0 1 0 0 1 0 0 io 1 1 1 1 0 0 0 0 i o 0 0 1 0 0 1 0 1 i
Note that each bit in the lower stream corresponds to two bits of the same value in the upper stream. Keep in mind that when running at 9600, your wiretap is resynchronizing on a high-to-low transition, so there's a little bit of "slop" around the byte boundaries at that speed.
It's also clear that an even slower speed would NOT be correct — there are single isolated ones and zeros in the data at 4800, which means that this is the minimum sampling rate for this data.
Best Answer
Time required would be 32 bytes * 12 bits/byte divided by the baud rate. (1 start bit + 8 data bits + 1 parity bit + 2 stop bits = 12 bits). The baud rate is the 'bit clock' that determines how long the bits are. A standard baud rate is 115200 bits per second. This may or may not be the baud rate you're using. Other standard baud rates are 2400, 4800, 9600, 19200, and 38400. If the baud rate is 115200, then the time required to transmit 32 bytes is 32 * 12 / 115200 = 3.33 ms.