If you intend to use the mic input mic input to your sound card, it will have a current source or pullup resistor to supply a bias to the microphone to which it was intended.
It might not be good for the speaker to have DC flowing through it.
If your phone is connected to the telco, be very careful. The speaker may run at quite a CM voltage and the DC offset might be -48V (depending which way round it is wired).
Telephones wires are very good at attracting lightning. During a storm high common mode voltages may be on the earpiece.
Stricly speaking you should use an audio transformer to couple the earpiece signal into the sound card, this provides DC isolation and some protection against lightning. Adding a TVS across the output of that transformer would be a good idea too.
The ac signal on the earpiece might be quite small. Only a few hundred mV. For a mic input, this will be fine (and can be potted down at the output of your transformer) but for line levels, which are usually 1Vrms it might be very quiet.
Not all phones work the same way, so measure the voltage first using a scope and judge the divsor as required.
The audio transformer should also be protected against DC usign a DC blocking (aka coupling) capacitor.
This circuit protects against discharge and DC currents, though use with caution as a mistake will cost you a PC!
Best Answer
(a) You can have all the hard work done for you by eg products like the
I have no 'interest' in the company apart from having used their voice recorder ICs in the past with good results.
The basic ICs are essentially standalone and function conceptually as multimessage electronic tape recorders. They are usually combined with a microcontroller but do not need to be in many cases.
They say:
Their MLS ICs store the signal as analog levels in flash memory (good trick) and they also have a range of full digital recorders.
They have versions which use external SD Flash storage
BUT
(b) You can now get standard routines which allow operation of SD Flash from even relatively low performance microcontrollers. Direct memory fetch and output to DAC (on board or external - can be as simple as an R-2R resistor network and an opamp) and you have audio out. About 500 kB of flash for one minutes at 8 kB/second.
A-law and u-Law are curve shaping schemes which allow you to store a wider dynamic range within 8 bits. For what you are describing you can probably accept simple 8 bit store and output. But, A-law is essentially a lookup table.
You did not say if this is store once and play often or needs to be field recordable. If play once then a MIDI or other tune synthesizer takes far less memory and can be implemented withing many microcontrollers.
(c) You can play with ADPCM and fewer bits per sample and more but for your needs it's likely that off the shelf code and hardware will do the job. Acceptable cost is an issue - clever methods would allow a very low cost solution using a microcontroller.
(d) A web search is highly likely to turn up many DIY record and playback systems.
Low cost PIC speech recorder - 1999 - Circuit Cellar
PIC32 audio library
Application note - 1997~ - Microchip ANM643 - ADPCM using PICs