Method 1: Create ROMs in your FPGA design
Because you have the same data in every board, one option is to use block RAMs in the FPGA, configured as ROM. To do this you instantiate a block RAM, but don't connect to the write pins. Use a synthesis directive in your HDL code or UCF file to specify the initial contents of the RAM. Read the Spartan-3 Generation User's Guide (Chapter 4) to see how to instantiate the RAM and how to access the data from the RAM. If you use Xilinx ISE, there is probably also a "wizard" to generate the RAM block and set up the initial contents for you.
Unfortunately, the Spartan-3E you are using has only 350 kbits of block RAM, not 8 Mbits like you require. For this to work then, you'll have to work out a scheme to compress your data to fit in 350 kbits. The details of how to do this depend on what kind of data you have. If your data is especially random, it might not be reasonable to get this much compression.
Method 2: Store data in external memory
You say you have a 128 Mbit parallel flash and a 16 Mbit SPI flash. You will need to read the datasheets for these parts and understand how they work. Then write a state machine into your FPGA that can access these devices. But this is your job as the FPGA designer. Some random strangers on the internet are not going to design your FPGA for you.
To store the data onto the flash initially you have two choices. First would be, if you are building these boards in volume, you can have your board assembly shop pre-program the flash devices before assembling them onto the boards. Typically you give them a data file in some format they request, and they charge you some small extra fee to have the data flashed in before assembly.
Second option: Read the datasheet for the flash device. Write an FPGA design that allows you to send data from some other interface available on your board (Ethernet, USB, SPI, I2C, whatever), and load it into the flash. At manufacturing time, you load this design temporarily into your FPGA and program your flash; then you store a different "run-time" FPGA design into the on-board configuration PROM, that doesn't have the ability to modify the FLASH, and your users won't have the ability to mess up the data.
I'd go for anything Video (especially HD video) related:
- these boards often have reasonable FPGAs
- tend to be on a reasonable host interface
- are usually rather hard to kill due to the studio environments
One of favourite FPGA projects of a friend of mine, http://nsa.unaligned.org/, used HD transform boards.
Another of my personal favourites that I abused a bit myself, is a BlackMagick Intensity HD capture card, coming with a nice set of video peripherals, a decent microcontroler, and an Spartan 3.
After some abuse, it's probably the cheapest non-academic devkit for PCI/PCIe work on FPGA. It seems to be going new for $120-$150 on ebay these days, and you can probably score one with damaged video interface chips.
Best Answer
I would not use an FPGA.
You mention you have no experience with FPGA's, and yet you are interested in doing digital signal processing on an FPGA... numerical manipulation + signal processing are difficult enough to get right on a computer/DSP/microprocessor, where the programming tools are conventional programming. It seems to me that using a digital signal processing project for a first FPGA project is likely a recipe for frustration.
If you want to learn FPGAs, try doing something more suited to the development tools, like state machines or communications packet processing.
For a DSP project like what you've described, I'd recommend a DSP or a Cypress PSOC or an Analog Devices Microconverter (=microcontroller with ADC+DAC builtin) instead.
(full disclosure, which provides some context for my advice: I do not use FPGAs myself. I have used programmable logic = PLDs on rare occasions. My officemate does use FPGAs frequently, and I've seen enough VHDL/Verilog code looking over his shoulder to know that it is well suited for bit manipulations. He is a seasoned engineer with lots of experience with FPGAs; in a recent conversation with him where he was doing some fairly simple math on integers with different bit widths, I told him he needed to do sign-extending on the shorter bit width number in order to subtract it properly, and he got this look on his face, like "oh man, I don't want to have to do sign extension..." Adding and subtracting is not very hard in an FPGA. Beyond addition and subtraction, you really need to know the tools and libraries. And floating-point processing??!?!!??!?!!?)