For a variety of reasons, the Digilent Basys2 and the Xilinx Spartan-3E are not going to work for you.
For that amount of RAM, you will need something like DDR2 or DDR3 SDRAM running at a fairly high clock rate. This is not something that you can just solder down onto a prototype PCB. The PCB needs to be designed for interfacing to memory, ideally with the RAM already soldered down onto the PCB. If you have never designed such a circuit, I suggest you get an FPGA board that already has the RAM on it. This is going to be difficult, however, since all of the FPGA boards I saw (in 5 minutes of looking) had much less than 4 GB of RAM.
Next, you need something newer than the Spartan-3/3A/3E. While the S3 is an otherwise good FPGA, the Spartan-6 series has much more to offer you. It has much more internal RAM (but you still need external RAM), a lot more DSP48 blocks, and more logic. But most important, it has a much better SDRAM interface on it than what the Spartan-3 has. And by "much better", I mean "much easier for a beginner to successfully use". The memory controller in the S6 is a hard-IP, which for you means that it is much easier to achieve the strict timing requirements of interfacing to SDRAM.
But controlling DDR2/DDR3 SDRAM is not easy. Xilinx has a tool called the "Memory Interface Generator", which can be found in Core Generator. This will generate the memory interface logic for you, and gives you lots of cool features that will make your life easier.
An alternative to the Spartan-6 would be a Virtex-5 or any of the 7-series parts. All of these have memory interfaces as good or better than the Spartan-6.
This Virtex-7 based board has a socket for two SO-DIMM modules, which could get you past 4 GB of RAM-- but it is super expensive at US$5,000.
I should also mention that if you are doing this because you want to learn about FPGAs and electronics then I support your efforts. But if you think that this is somehow going to be faster than standard PC's then I have bad news for you. For the money, it is hard to beat an Intel i7 based quad-core machine with a reasonable GPU card. Just warning you, in case all you really care about is math speed.
By taking a look into the Cyclone II arquitecture description I believe there is no hardware divider in the architecture. This would mean division is somehow implemented in software, potentially in non-constant time; whereas in hardware it should have been constant time.
I tried to find out how division is implemented in Altera boards but I couldn't. This can depend on whether megafuncions are being used or if it is a block such as these.
Best Answer
The data bandwidth you mentioned is certainly part of the calculation, but only the beginning: the FPGA and the camera module need a compatible interface that can reach the required speed.
Whether your processing pipeline can be realized depends very much on your definition of "basic image processing". Ideally your algorithm is parallelizable so you can create multiple instances, and optimized for FPGA resource usage to avoid running out of limited resources like multipliers.
Resource usage on FPGAs is not always linear, so five copies of the same logic may use ten times as many LUTs as four copies, simply because you've run out of some "special" blocks and the fifth instance tries to emulate it, or because an instance needs to be wrapped around a special block.
Speaking in the abstract: you can always compile and simulate your design with the FPGA vendor's toolchain, even without the actual hardware. If the constraints are properly specified and synthesis succeeds without showing timing errors, then it should also run -- there is no dynamic reallocation of resources that would make behaviour unpredictable unless you explicitly add that (which may be necessary, e.g. to share multiplier units).
Interpreting errors from FPGA compilations and optimizing designs are rather complex topics though, which entire books have been written about. The compiler reporting a lack of resources could mean that you either need a larger FPGA with more tables, or you need to rewrite the algorithm, or both.