Electronic – Subtracting Audio Signal from Ambient Noise Signal

diff-ampsignalsignal processing

So I'm pretty well-versed in the software world, and my buddy and I decided to tackle a hack idea to get into hardware hacking but we're trying to figure out whether our implementation will actually be feasible the way we're thinking about it.

The basic gist is this:

Some audio is playing through a car's speaker system. We have direct access to that digital audio stream. We also will be capturing the ambient sound, which is a mix of the audio song, any white noise, and conversation between the passengers in the cabin. We would like to take the latter (ambient) signal and subtract out the song's signal as much as possible to detect (roughly) whether or not a conversation is happening past a certain volume threshold (the resulting output for this measure can be 0 or 1 or analog).

Our proposed solution is to feed both signals into a differential amplifier and then capture the output and make meaning of it. We're not entirely sure whether this is the correct rabbit hole to jump down. We do know that this is possible via software with Least Mean Squares Filtering (or Finite Impulse Response filtering – although it doesn't bode too well in real time), but, again, we would like to try to get this done through hardware for the fun of it.

An issue we were thinking about is that the audio signals may be out-of-phase, but, since we're both noobs, we're not entirely sure how much of an issue this is in practicality.

Please excuse my ignorance if this is a dumb question. I tried looking for a similar question in the group but couldn't find anything that I perceived as close.

Best Answer

What you're doing is known as "echo cancelling" (also here) and it's a common problem in the implementation of speakerphones. The gist of the problem is that the version of the music that you're getting from the microphone has been heavily modified (relative to the "clean" version you're getting directly from the radio) by the reverberation of the space it's playing in. A direct subtraction of the two signals will not achieve anything useful.

Instead, an adaptive FIR filter is used to build up an electronic copy or model of the acoustic reverberation, and the output of this filter can then be subtracted from the microphone signal to get a useful amount of cancellation.

The real trick is the algorithm that's used to "train" the filter, which needs to use heuristics to decide what's reverberation and what's actual background noise or conversation. These algorithms are far from trivial.