Electronic – How is it possible to drive VGA displays at such high pixel clock frequencies

digital-logicfrequencypropagationsignalvga

I'm working on a digital circuit using discrete components to drive a 640×480 VGA display in a 80×30 text mode.

For a 640×480 display, the pixel clock is 25.175MHz, which has a period around 40ns. I don't understand how I'm supposed to be able provide a new pixel to the display this often.

The basic architecture for my circuit is as follows:

  1. Binary counter for horizontal pixels counts up at 25.175MHz to 800 (640 visible pixels + 160 for front porch, sync, back porch).
    At 800, increment vertical line counter (and reset at 525 lines)

  2. Using horizontal and vertical position, derive the x,y coordinate of current character.

  3. Using x,y coordinate of character, index into video memory to retrieve ASCII character.

  4. Use ASCII character to index in character ROM to obtain bit pattern for character

  5. Use parallel to serial shift register to convert 8 pixel line of character to individual bits at pixel clock frequency

If you follow the chain, it goes:
Counter -> RAM -> ROM -> Parallel to Serial Shift Register

Using the fastest components I can find, the propagation delays and access time add up to around
15ns + 20ns + 70ns + 15ns = 120ns, much greater than the 40ns period for 25MHz.

At even higher resolutions and refresh rates, you can have pixel clocks well above 100MHz which will be a 10ns period.

How is it possible to provide new pixels to the display every 10ns when just the access time for RAM/ROM are already well above it, not even considering all the other signals in your system?

Best Answer

There are two main reasons you are finding this challenging.

First, you are using older and more discrete (lower scale integration) parts than would have been used to do this in the era of VGA.

But next, you are using them in an atypical way. Specifically, your approach is not pipelined which means that you are having to add up multiple delays when determining your interval, and thus rate.

In contrast, synchronous digital designs which attempt to achieve speed try to do as little as possible between registers.

While the details would probably differ a little, crudely speaking it would work something like this:

  • You increment or reset the address, then that goes in a register.
  • You latch the address into the synchronous memory
  • You latch the output of the synchronous memory
  • You latch this into the address of the synchronous character generator
  • You latch the output of the character generator into the output register
  • apply the palette lookup...
  • into the synchronous DAC...

When you break a task down like this, you only get one combinatorial delay plus some propagation delay and register setup and hold times needing to fit between clocks.

A design built this way will take many clocks to produce an output - the latency will actually be higher than a purely combinatorial design. But it produces a new correct output on each cycle of a much faster clock.

And hey, it's video, it doesn't really matter if the CRT is drawing a dozen pixels behind the pixel counter - you of course take that into account in the timing of the sync signals so that they are correct compared to when the data actually comes out of the DAC.

In practice, almost all complex digital systems work this way, as it's a great idea - right up until a pipelined CPU hits a dependency on an earlier computational result or a conditional branch... Then things get interesting, as they'd talk about in the next lecture of a digital systems class - but fortunately your VGA situation is a lot simpler, especially if you don't yet worry about tearing effects if the character buffer changes while the screen is being drawn.

As a practical matter if you want to build this, do it in an FPGA. That will pretty much force synchronous memories on you if you use internal ones, or synchronous IO registers if you use external memory. You'll get a lot of nudging towards a proper design, the fabric itself will be faster than your discrete parts, and of course if you make a mistake you only need twiddle your thumbs while it recompiles rather than spend a long day re-wiring.