Electronic – arduino – Sending audio over ethernet

adcarduinoaudioenc28j60ethernet

I am trying to send audio from a microphone such as this over Ethernet to a computer. My first idea was to connect the mic to an arduino's ADC and send the data using an ethernet module such as the ENC28J60 but some people say that the micro controller can only send about 5kb/s.

Has anyone tried a similar setup where raw data is sent from an analog pin and measured the throughput?

(any ideas on a better way to send the data are also welcome)

Best Answer

Before I get into details, let me say that I have probably designed more professional-audio over Ethernet hardware than anyone else-- both in terms of number of different PCB designs as well as number of PCBs manufactured and shipped to end customers. Odds are very high that you have heard products where I have designed the audio over ethernet circuitry in them. (This is pro-audio only, and does not include VOIP or other non-pro products.)

Let's start with the issues:

Software: The hardware is honestly the easy part. The software is difficult. The closer you want to pro-audio performance the harder it is. Your application doesn't sound like pro-audio, but the software task is still not trivial.

Audio Clocking: Transmitting the audio data from point A to point B is relatively easy. Doing it in a way that the two devices have a synchronized audio clock is difficult. Non-pro applications solve this by doing sample rate conversion or just simple drop/duplicate samples as the audio clocks drift. There are difficulties and side effects of both of these, which increases the software difficulty immensely. Just saving the data to a file on the PC side of things is easy-- using it in a real-time way is hard.

Low-latency: How long it takes the audio to go from the Mic, over the network to the PC, and then used by the PC is called latency. The shorter the latency the harder things are. Just saving audio data to a file is a good example of super-long latency, and is one reason why that is also the easiest thing to do. A latency of <2.5 mS is damn hard to do correctly an robustly. The shorter the latency, the less issues there are with things like audio echo and stuff.

Bandwidth: Sending telephone quality audio with high latency is the easiest. Pro-audio quality with low latency is super hard. Using the mic, MCU, and Ethernet interface that you proposed is going to put you into the telephone quality side of things. There are many cases where raw bits-per-second of the Ethernet interface is not the only problem. Other issues like IRQ rate, packet transmit/receive time (not just overall bandwidth), and sometimes packet timing are super important.

Network Topology: As the audio quality goes up (and latency goes down) your network topology becomes really important. I am talking about the number of Ethernet switches, the type of switches, how they are connected, and the number/type of non-audio ethernet devices also on the network. For you this probably wouldn't be an issue, but you never know.

I think that your proposed solution would work for telephone audio quality with a high latency. You'll probably have to do sample dropping/repeating to deal with non-synchronized audio clocks. And it won't be all that great. You might be quite underwhelmed by the audio. I also think that you'll have a lot of software to write on the PC side of things. That being said, I would not do the project with that.

If I were doing the project, I would look at one of the new-ish ARM Cortex-M3 or M4 devices by TI or Freescale that includes a 100 mbps or gigabit ethernet controllers. Many of these things are less than US$10 each and can run at up to 100 MHz. The amount of RAM and Flash makes the task of writing software much easier.

For amusement, my current Audio Over Ethernet project uses an 800 MHz ARM Cortex-A8 with dual gigabit Ethernet ports and runs a customized version of Linux. The system as a whole (not just this Cortex-A8 device) can handle over 2048 audio channels at 48KHz, 32 bit, with an overall system latency of just 2.5 mS (including ADC, DAC, two times over the network, and lots of processing). Audio devices on the network have their sample clocks sync'd to less than 1 uS, even if there are 8+ switch hops in the middle.