Electronic – Digital audio clock recovery from UDP

audiodigital-logic

My question is how can i recover the clock timing of digital audio transmitted over ethernet using UDP?

I am streaming live audio over Ethernet using UDP. My set up uses two STM32F407 discovery boards, one for transmit the other receives. I can hear the audio data which i am outputting from the discovery board codec. When i use the logic analyser i can see the word select is out of sync with the data.
The delay between transmit and receive is 1mS. 32bit, 44KHz sample clock.
The audio sounds ok when people are speaking but loud noises create distortion. When music is played the higher frequencies are distorted. At moderate voice levels the system sounds ok.

Does any have or know of any algorythm for clock recovery from UDP data?
Would careful timing of the transmitted frames make things easier?

Thanks in advance
David

Best Answer

If your word select is out of sync with the data, you don't have the hardware interface set up correctly. You need to fix that problem first. We can't help you with that since you didn't share the code you're using. But once you get that straightened out, here are some general guidelines regarding setting the sample rate on the receiver.

On the transmit side, you put the incoming audio into a FIFO buffer. When that buffer fills to a certain level, or once a certain amount of time has passed, you take a set of audio samples out of that buffer and transmit them in a UDP packet.

UDP packets can get lost or arrive out of order, so you include a sequence number in the packet so that the receiving side can detect either of these occurrances. The packets also experience random delays over some range that is generally bounded.

On the receive side, you take the audio sample data out of the packet, verify the sequence number, and put the data into another FIFO. When this FIFO fills to a level that represents the range of typical packet delays, you start reading the audio samples out and sending them to your audio DAC at the nominal sample rate. If the FIFO ever "runs dry", set the (re-)starting threshold higher.

However, the transmit and receive sample clocks will not be perfectly synchronized. This means that the average amount of data in the receive-side FIFO will start to trend upward or downward over time. If the FIFO depth is increasing, it is necessary to increase the output audio sample rate slightly to match. Similarly, if it is decreasing, it is necessary to decrease the sample rate. These adjustments will cause the long-term average sample rate of the receiver to match that of the transmitter exactly.

(Note that there is a patent on this technique, but that doesn't mean you can't use it in a personal project.)