One thing that used to be common for video graphic controllers is Video RAM or VRAM.
VRAM has two sets of data output pins, and thus two ports that can be used simultaneously. The first port, the DRAM port, is accessed by the host computer in a manner very similar to traditional DRAM. The second port, the video port, is typically read-only and is dedicated to providing a high throughput, serialized data channel for the graphics chipset.
Internally, VRAM reads an entire DRAM row and shifts it out sequentially to the video circuitry. This leave the DRAM available for use by the MPU. VRAM has largely been replaced by the use of SDRAM, "even though it is only single-ported and more overhead is required".
A technique I have used in the past is to use interleaved access to memory. It's a bit complex to explain (the devil is in the details), but I will outline the basics:
Basically the MPU accesses video memory in between pixel accesses by the video-controller. If this timing gets too tight, there are a couple things you can do that will greatly relieve the timing (usually only 1 of these is necessary):
- You can use 2 RAM chips (or banks) and interleave those using each chip for every-other pixel. In your case, this would effectively slow your pixel clock to 80ns per chip allowing MPU and video-controller access to have windows of 40ns each. This could be extended to more banks interleaving more pixels if necessary. This technique is called Interleaved Memory.
- You can increase the data-bus size of the video memory. The video-controller would read multiple pixels in a single access and use them sequentially. The MPU would either have a larger data-bus, each access would be directed to the appropriate byte (or word) and byte-selects would be used on the video memory, or a read-modify-write would have to be performed to write to the larger data size. In your case, it would probably be simplest to increase the video memory data bus to 16 or 32 bits (2 or 4 pixels), and probably then use an MPU with the same bus size.
If you interleave video accesses, you may want to consider the use of an FPGA or CPLD for your video memory controller.
Another method is to have 2 separate video memories and use bank-select. The MPU writes to one bank while the other is being used by the video-controller for display. When the MPU is finished writing, the bank accesses are swapped (usually during a sync pulse).
It is easy enough to calculate all you need from just the basic provided information.
For instance, the site I use most for a reference is this one: http://tinyvga.com/vga-timing/640x480@60Hz and it has all you need for 640x480 @ 60Hz (it specifies most common resolutions, but that's the simplest to work with).
It specifies everything in pixels and lines, and it provides a pixel clock frequency, as well as refresh frequencies. All you need though is the pixel clock and the number of pixels for each thing.
For instance, it gives a pixel clock of 25.175 MHz. That is not easy for most microcontrollers to generate, since it's both high frequency and high resolution - in general you can have one of those two - high frequency or high resolution. However, 25MHz is usually easy enough to generate, and is "close enough" for most monitors to cope with.
So we have a 25MHz pixel clock. We also have a "whole line" size of 800 pixels. That size includes the porches, sync and visible area. So a line of 800 pixels, at 25MHz clock, would be running at (25,000,000/800) 31250 Hz, or one line every 32µS.
The horizontal sync pulse - 96 pixels - would be (96/25,000,000) = 3.84µS long.
We know that a line takes 32µS, and there are 525 lines in a "whole frame", so 0.000032×525 = 0.0168s for a frame, or 59.524Hz. That's pretty close to the 60Hz for the specification.
So given a pixel clock, and a set of pixel periods, you can calculate anything. Of course, you can also go backwards. Given a frame rate and a resolution you can work out:
$$
60Hz × 525 = 31500Hz
$$
$$
31500Hz × 800px = 25.2MHz
$$
So that shows that even the given specifications aren't 100% exact, but there is a bit of flexibility in the VGA timings so you can bend your clock to suit you within certain bounds.
And while we're at it, generating VGA purely with software takes a lot of processing and often leaves you starved of CPU cycles to do anything. One of the most common "tricks" for making a VGA signal on a CPU is to use SPI to generate the pixel data stream. Even better if you have DMA in your microcontroller to output an entire line of data without the CPU having to do anything. The CPU is then just responsible for generating the sync pulses and loading the DMA system with the right addresses - the rest is done in the background. Of course, that leaves you with just a 1-bit monochrome display. If you happen to have an SQI interface, and enough RAM, you could make a 4-bit display (16 colours) easy enough.
Best Answer
OK, answering my own question because I finally got it : the FIFO feature of the DMA was not enabled, so any interruption or memory transfer was delaying access to the memory bus by the DMA.
Enabling it is much better now. This has been done with this :