Electronic – Preventing audio signal saturation

audiodsp

I am trying to implement an algorithm on audio signals. The algo operates on the signal on a frame-by-frame basis. One of the intermediate stages requires addition of the signal to the previous frames. The output frame is saturated at the end of this stage.

Clarification: The output frame of the addition operation is in double, though the final output frame HAS to be in short. Thus the problem with saturation.

I have tried the following for prevention of saturation:
1. hard-limiting: Limit any sample to a pre-defined max_val.
2. Normaliation: Compute the max. of the frame and then scale every sample in the frame appropriately.

None of these approaches have worked since I have to work on a frame by frame basis.

Any help on preventing saturation of signals is most welcome.

Best Answer

Your Attempts

By definition when you "hard-limit" values in your code you are causing saturation. It may not be saturation in the sense of overflowing your short, but you are still distorting the wave when it goes over a certain point. Here's an example:

Saturation

I realize you probably aren't hard-limiting on the bottom, but I had already drawn it before I realized that.

So, in other words, the hard limiting method won't work.

Now for your second approach, this method will cause you to do what some audio people actually do intentionally. You are causing every frame to be as loud as it possibly can. This method can work OK if you get the scaling right and are fine with your music sounding loud all of the time, but it isn't great for most people.

One Solution

If you know the max possible effective gain that your system can create, you can divide your input by this much. To figure out what this would be you will need to step through your code and determine what the max input is, give it a gain of x, figure out what the max output is in terms of x, and then determine what x should be in order to not ever saturate. You would apply this gain to your incoming audio signal before you do anything else to it.

This solution is OK, but isn't great for everyone as your dynamic range can be hurt a little since you usually wont be running at max input all of the time.

The other solution is to do some auto-gaining. This method is similar to the previous method, but your gain will change over time. To do this you can check your max value of each frame of your input. You will use will store this number and place a simple low pass filter on your max values and decide what gain to apply with this value.

Here is an example of what your gain versus input volume would be:

auto-gain

This type of system will cause most of your audio to have a high dynamic range, but as you start getting close to the max volume you can have you slowly reduce your gain.

Data Analysis

If you are wanting to find out what type of values your system is actually getting in real time then you will need to have some type of debugging output. This output will change depending on what platform your running on, but here's a general gist of what you would do. If you are on an embedded environment you will need to have some serial output. What you will do is at certain stages in your code output to a file or screen or something you can grab the data from. Take this data and put it in excel of matlab and graph all of them versus time. You will probably very easily be able to tell where stuff is going wrong.

Very Simple Method

Are you saturating your double? It doesn't sound like it, instead it sounds like you are saturating when you switch to a short. A very simple and "dirty" way of doing this is to convert the max of your double (this value is different depending on your platform) and scale that to be the max value of your short. This will guaranty that assuming you don't overflow your double that you wont overflow your short either. Most likely this will result in your output being much softer then your input. You will just need to play around and use some of the data analysis that I described above to make the system work perfectly for you.

More Advanced Methods that probably don't apply to you

In the digital world there is a trade off between resolution and dynamic range. What this means is that you have a fixed number of bits given to your audio. If you decrease the range that your audio can be in then you increase the bits per range that you have. If you think about this in the sense of volts and you have 0-5v input and 10bit adc then you have 10bits to give to a 5v range, usually this is done linearly. So 0b0000000000 = 0v, 0b1111111111 = 5v and you linearly assign the voltages to the bits. In reality, with audio, this isn't always a good use of your bits.

In the case of voice, your voltages versus probability of those voltages look something like this:

pdf

This means that you have a lot more of your voice in the lower amplitude and just a little amount in the high amount. So instead of assigning your bits lineally, you can remap your bits to have more steps in the lower amplitude range and thus less in the upper amplitude range. This gives you the best of both worlds, resolution where most of your audio is at, but limit your saturation by increasing your dynamic range.

Now, this remapping will change how your filters act and will probably need to rework your filters, but that is why this is in the "advanced" section. Also, since you are doing your work with a double and then converting it to a short, your short will probably need to be linear anyways. Your double already gives you much more precision then what your short will give you so there is probably no need for this method.

Related Topic