These should get you pretty far and the rest you can do with the datasheet. Start building piece by piece, from blinky to waveform to waveform that changes over time to tones. Some source might help with filtering and driving audio outputs (active LPF might do both neatly).
I suggest coming back with more specific questions.
I would recommend that you look at the modern SPI FLASH type parts. Winbond is one manufacturer that makes a nice range of parts that would be perfect for deployment in the SPI hardware interface on your AVR.
Digikey stocks the Winbond parts that are available in capacities as follows:
1 MByte for 0.50$ USD
http://www.digikey.com/product-detail/en/W25Q80BVSSIG/W25Q80BVSSIG-ND/2202664
2 MByte for 0.74$ USD
http://www.digikey.com/product-detail/en/W25Q16BVSSIG/W25Q16BVSSIG-ND/2208449
8 MByte for 3.01$ USD
http://www.digikey.com/product-detail/en/W25Q64DWSSIG/W25Q64DWSSIG-ND/3008691
Other sizes are made but not currently in stock.
These parts are very easy to add into a circuit. The SPI port pins SPI_CLK, SPI_MOSI and SPI_MISO connect directly to three pins on the SPI Flash. +3.3V and GND are suitable power for the part and make sure to add a 0.1uF bypass capacitor across the chip power pins. The remaining pins /HOLD and /WP can be simply pulled up to 3.3V via 10K resistors. For SPI FLASH parts such as these you also have to support a /CS pin to the part. You cannot just tie this to GND as the part is designed to reset the command state machine when you drive the /CS pin high so as to get ready for the next command.
SPI FLASH parts such as these are accessed in blocks of data. The types I linked to here support block sizes of 4K bytes which are initially addressed in the read/write commands via a multi-byte address included in the commands. In your application it would not be necessary to have a RAM buffer to load a whole read block into. You could start the read of a SPI FLASH block and then read out a few bytes at a time and feed those to your audio player hardware. Each time you cross the 4K boundary you can send the command again to address to the next block. It is however possible to send a command to start at some particular block and then as long as you hold the /CS pin low and continue to supply clocks it is possible to sequentially read out the whole memory part if need be.
Note that these parts support reading and writing in the single bit wide legacy modes which is presumably the way you would use them with the AVR. They also support reading and writing in 2-bit and 4-bit modes for super high speed data transfer modes but those modes require a SPI controller capable of dealing with the wider dual and quad serial data modes.
Best Answer
From what I can see, 8 bit AVRs aren't fast enough to play MP3. Instead projects mostly rely on decoder chips. The idea is that the decoder will decode the MP3 in hardware and produce output signal which can be later processed in an amplifier and sent to a speaker.
From what I can see, you'd either need to get a board which can do the decoding for you like this (this one has SD card reader, amplifier and a small speaker too) or this (both use VS1011E decoder) or make your own board which will house the decoder, amplifier, SD card reader and so on. Here you can find a project which explains how to do that and here is one which used AVR Butterfly platform. Both projects have screens and are battery powered from what I can see.