Noise usually creeps up at various specific frequencies in audio, these frequencies change depending on the environment.
Option 1
The easiest way to get rid of noise is the put a band pass filter right around where the frequency of your voice is at. There may still be noise at the same frequency of your voice, but this will be much harder to deal with.
Option 2
I am not sure what Audacity does, but I have seen many programs that require a sample of "silence" and use that to determine the noise. In other words, you record your voice but leave a gap of dead air at the end or beginning. Then you can go analyze what frequency components are around in your dead air. From this you can know how much of each frequency to remove from your voice signal.
Let's do some math:
16 tracks * 44 ksample/s *2 channels *16 bit = 22.528 Mbps
This is the minimum speed you need for the SPI interface, if you want to transmit all the data through a single serial port. Can be done, with an adequate clock, but you need a fast SD card (see here for the speed).
Then there is the microcontroller: you have to add 16 tracks and output them through a DAC, so you have 44*2 ksamples for each track, or
$$ 44 \cdot 10^{3} \cdot 2 \cdot 16 = 1408 \cdot 10^{3} $$
16-bit sums for every sample (probably with some scaling to avoid overflow) result in about 1.4 M operations/sec, that can be handled by a good 32-bit microcontroller. Probably a Cortex-M3, or better M4 (but M3 is probably better documented) can work for you.
I've just seen this which can be clocked up to 204 MHz, has 4 SPI interfaces, up to 40 MB/s, and has also a floating point unit that can help in the accumulation process (but may be too slow). You may also use the dual core structure to handle separately the processing and the output.
But for the DAC I think that you should go for an external converter, specifically designed for audio (which means 16 bit probably).
Update
It's not so clear how are you going to manage the 16 different tracks on the SD:
- what about pre-loading tracks on the internal memory of the MCU?
Check the I2S interface, which is a 4-wire serial protocol especially designed for audio applications.
Important question:
You said that you want also to record tracks and save them to the SD card: do you want to do that at the same time? You need the controller to encode the audio in WAV and store it, and the writing bandwith of the SD card is lower.
The looping feature WILL need some buffering memory (may also use the internal memory) because looping requires real time operation, and the SD card will introduce too much latency. You may need an external RAM, and you may also think about storing some data there to reduce delays.
Best Answer
I do agree with the comments about how this question is very difficult to answer in this community, but I would like to provide a few of the more simple methods for you to look into. Ultimately a good and robust system will require the use of many different techniques and many many hours. This is why there is tons and tons of money put into voice recognition, and as you may know, still isn't great.
In general, people will speak within about the same frequency range every time they say a command word. If you look at the frequency domain of your signal, you can record what range was used when the person recorded their command word and then look for that in the future. You can get to the frequency domain using the FFT, wikipedia and some google search can help you with what this means and how to do it.
You can expand the frequency method to determine what the frequency is for each part of the word. For example, some people will raise the pitch of their voice as they finish a word. This could be another "signature" to look at.
Also people will generally speak the same speed for the same command word. For this you can look at amplitude of your signal to determine how long it took them to say each word and even the pause between two words. Then you can compare these pauses to your future signal.
Again, these are just a few basic methods, but should get you a sense of the type of things that can be done.