In effect, you're trying to recreate a color CRT controller with memory interface. This is perfectly possible, but it's much more involved than you realize. The physical implementation can be either an FPGA, as alex forencich suggests, or discrete chips. The discrete section will need something like the 74FCT series for horizontal timing, and can easily get by with 74HC for vertical timing.
First, as you realize, you'll have to generate display timing at 25 MHz - except that you won't. A 25 MHz VGA pixel clock implies that you're trying for 640 x 480 pixels, and this cannot be stored in a 64 kB RAM - it would require a 512 kB RAM. Instead, a 64 kB RAM will only support a 256 x 256 display, and this will only require a pixel clock of about 12.5 MHz. This is straightforward using a 9-bit binary synchronous counter, and can be realized with 3 74FCT161 counters. The vertical timing also uses a 9-bit counter, but 74HC161s can be used, since vertical timing is much slower than horizontal. The outputs of the two counters feed at least one static RAM, and there are at least 3 different approaches you can use for the interface.
1) FIFO - This is your first thought, but it's more complicated than you think. First, it only makes sense to transfer one byte (or 6 bits) of intensity data at a time, but you also have to store the address as well as the data. If you're going with a 64kB RAM, this means 16 bits of address along with 6 to 8 bits of intensity, and you'll need more than one FIFO. This in turn means that you'll need to ensure that the FIFOs remain synchronized. You'll also need to provide a mechanism to monitor the FIFO empty line and generate a write pulse to video RAM whenever the FIFO is not empty: that is, whenever there is data in the FIFO waiting to be written. Furthermore, you'll also need to provide a mechanism to keep memory writes from interfering with display reads. You can do this either by running the video RAM at 25 MHz, but alternating read and write cycles, or by permitting writes to RAM only during the non-display portions of the scan. This will occur during front porch, back porch, sync, or vertical blanking intervals.
2) Dual port RAM - Here's another device technology to look at. In this case you use the DPRAM as the video buffer, and feed one side from the video controller and the other from the Arduino. Be forewarned, a 64k x 8 DPRAM requires a package with a lot of pins.
3) Bank Switching - In this technique, you provide 2 video RAMs, and at any time one is being written to while the other is being read from. The state of the RAM is controlled by a flip-flop which can be triggered by the Arduino. So first (let's say) you read from bank A while writing to bank B. When you've completed writing a complete frame, you toggle the bank selector and the video is now read from B while A is being written to. This is in some ways more straightforward than the other two, but it does not permit local overwriting of areas of the image in the same way the other two approaches do.
You want to at least use hardware for your anti-aliasing filter.
This means filtering out signals and noise above the Nyquist rate (1/2 of your sampling rate). You have to do this in hardware because after sampling, interference or noise in the alias bands won't be distinguishable from the signal in your desired band.
Whatever other filtering you want to do will likely be easier if you can use more oversampling. At which point you will have a trade-off: Does increasing oversampling cost you more power or does using analog filters?
Best Answer
Have you tried the home built CPUs webring of sites? This one covers a lot of what you are looking for: http://cpuville.com/