Audio is not that high bandwidth, so is within the range of what a microcontroller can handle.
The quality level you want makes a large difference in the amount of data you have to handle. If you just need to save and later replay voice, then 8 bit samples at 8 kHz is good enough. If the 8 bit values are not constrained to be linear, then you can get better overall signal to noise ratio with the same amount of data. This is what the phone company does.
At the other end is "Hi-Fi" audio, which is from 20 Hz to 20 kHz, usually at least 16 bits per sample (over 90 dB signal to noise ratio). To digitize such audio, you sample much faster than the Nyquist limit, then apply digital filtering, then dessimation. The reason you need digital filtering is that analog filtering can't be that accurate to have the very sharp drop off after 20 kHz you need in order to sample just a little faster than 40 kHz.
Let's say you do the worst case and end up with 16 bit samples at 44 kHz rate. That's only 88 kB/s, or 5.3 MB/minute. Any SD card can handle that data rate. 1 GB gives you over 3 hours of this Hi-Fi audio.
Of course if you just want the voice-quality audio, things are much easier, the data rates lower, and the storage requirements lower. At 8 kB/s just 1 MB lasts over 2 minutes. 1 GB would hold nearly 1 1/2 days of audio.
You could use a technique similar to digital processing, but without converting the signal to and from digital codes. For example you could use bucket-brigades clocked in and out at different frequencies, or a tape recorder with two heads and transport mechanisms (real time recording, half speed playback).
However if you want to halve the frequency of an arbitrarily long and complex audio waveform then you have a problem - whether using analog or digital processing. The signal is coming in twice as fast as it is being sent out, so in order to exactly preserve the original waveform you have to continuously store the input. Eventually you must run out of storage space, then you will have to 'catch up' to real time and lose a chunk of the signal.
A sufficiently powerful digital system could simply include massive amounts of storage, or it could apply Fourier transforms to break the signal up into its component frequencies, halve each one and then recombine them. The resulting waveform might not be identical to the original, but it should sound virtually the same.
If you just want to change the frequency of a repetitive waveform (eg. single note from a musical instrument) then you only need enough space to store a single cycle, or perhaps the duration of one note. You then have to accurately detect the end of the waveform so that it can be repeated seamlessly, and decide what to do about its envelope (eg. do you let the note play out at half real time, or force a faster decay?).
Best Answer
Another option is a Microchip Microstick. It comes with a 40 MIPS dsPIC chip that is suitable for implementing a reverb algorithm. Audio could be sampled with the on-chip ADC and output using PWM; very few external parts would be needed. The dsPIC chip on its own is about $5, it's a lot less in large quantities.