Your Attempts
By definition when you "hard-limit" values in your code you are causing saturation. It may not be saturation in the sense of overflowing your short, but you are still distorting the wave when it goes over a certain point. Here's an example:
I realize you probably aren't hard-limiting on the bottom, but I had already drawn it before I realized that.
So, in other words, the hard limiting method won't work.
Now for your second approach, this method will cause you to do what some audio people actually do intentionally. You are causing every frame to be as loud as it possibly can. This method can work OK if you get the scaling right and are fine with your music sounding loud all of the time, but it isn't great for most people.
One Solution
If you know the max possible effective gain that your system can create, you can divide your input by this much. To figure out what this would be you will need to step through your code and determine what the max input is, give it a gain of x, figure out what the max output is in terms of x, and then determine what x should be in order to not ever saturate. You would apply this gain to your incoming audio signal before you do anything else to it.
This solution is OK, but isn't great for everyone as your dynamic range can be hurt a little since you usually wont be running at max input all of the time.
The other solution is to do some auto-gaining. This method is similar to the previous method, but your gain will change over time. To do this you can check your max value of each frame of your input. You will use will store this number and place a simple low pass filter on your max values and decide what gain to apply with this value.
Here is an example of what your gain versus input volume would be:
This type of system will cause most of your audio to have a high dynamic range, but as you start getting close to the max volume you can have you slowly reduce your gain.
Data Analysis
If you are wanting to find out what type of values your system is actually getting in real time then you will need to have some type of debugging output. This output will change depending on what platform your running on, but here's a general gist of what you would do. If you are on an embedded environment you will need to have some serial output. What you will do is at certain stages in your code output to a file or screen or something you can grab the data from. Take this data and put it in excel of matlab and graph all of them versus time. You will probably very easily be able to tell where stuff is going wrong.
Very Simple Method
Are you saturating your double? It doesn't sound like it, instead it sounds like you are saturating when you switch to a short. A very simple and "dirty" way of doing this is to convert the max of your double (this value is different depending on your platform) and scale that to be the max value of your short. This will guaranty that assuming you don't overflow your double that you wont overflow your short either. Most likely this will result in your output being much softer then your input. You will just need to play around and use some of the data analysis that I described above to make the system work perfectly for you.
More Advanced Methods that probably don't apply to you
In the digital world there is a trade off between resolution and dynamic range. What this means is that you have a fixed number of bits given to your audio. If you decrease the range that your audio can be in then you increase the bits per range that you have. If you think about this in the sense of volts and you have 0-5v input and 10bit adc then you have 10bits to give to a 5v range, usually this is done linearly. So 0b0000000000 = 0v, 0b1111111111 = 5v and you linearly assign the voltages to the bits. In reality, with audio, this isn't always a good use of your bits.
In the case of voice, your voltages versus probability of those voltages look something like this:
This means that you have a lot more of your voice in the lower amplitude and just a little amount in the high amount. So instead of assigning your bits lineally, you can remap your bits to have more steps in the lower amplitude range and thus less in the upper amplitude range. This gives you the best of both worlds, resolution where most of your audio is at, but limit your saturation by increasing your dynamic range.
Now, this remapping will change how your filters act and will probably need to rework your filters, but that is why this is in the "advanced" section. Also, since you are doing your work with a double and then converting it to a short, your short will probably need to be linear anyways. Your double already gives you much more precision then what your short will give you so there is probably no need for this method.
Let's do some math:
16 tracks * 44 ksample/s *2 channels *16 bit = 22.528 Mbps
This is the minimum speed you need for the SPI interface, if you want to transmit all the data through a single serial port. Can be done, with an adequate clock, but you need a fast SD card (see here for the speed).
Then there is the microcontroller: you have to add 16 tracks and output them through a DAC, so you have 44*2 ksamples for each track, or
$$ 44 \cdot 10^{3} \cdot 2 \cdot 16 = 1408 \cdot 10^{3} $$
16-bit sums for every sample (probably with some scaling to avoid overflow) result in about 1.4 M operations/sec, that can be handled by a good 32-bit microcontroller. Probably a Cortex-M3, or better M4 (but M3 is probably better documented) can work for you.
I've just seen this which can be clocked up to 204 MHz, has 4 SPI interfaces, up to 40 MB/s, and has also a floating point unit that can help in the accumulation process (but may be too slow). You may also use the dual core structure to handle separately the processing and the output.
But for the DAC I think that you should go for an external converter, specifically designed for audio (which means 16 bit probably).
Update
It's not so clear how are you going to manage the 16 different tracks on the SD:
- what about pre-loading tracks on the internal memory of the MCU?
Check the I2S interface, which is a 4-wire serial protocol especially designed for audio applications.
Important question:
You said that you want also to record tracks and save them to the SD card: do you want to do that at the same time? You need the controller to encode the audio in WAV and store it, and the writing bandwith of the SD card is lower.
The looping feature WILL need some buffering memory (may also use the internal memory) because looping requires real time operation, and the SD card will introduce too much latency. You may need an external RAM, and you may also think about storing some data there to reduce delays.
Best Answer
Answering to myself, I found the perfect part for this, Analog Devices' ADAU1701. It does exactly what I need. It's cheap, really easy to program, and the software is free. All that is required is a 12.288MHz crystal and an EEPROM. It can even handle rotary encoders or potentiometers for volume control, or even pushbuttons. It can be controlled via I2C or SPI. It includes all the necessary DACs (four!) and ADCs (2). It's the perfect device for turning my stereo signal into 2.1, bass-boost it, LPF, control its volume, and much more.