Design question for GPU

designgpu

On days I read a very interesting article, saying, that there's exist two general architectures in GPU design: IMR and TBDR architecture's.
One of them used in modern PC-based graphics card GPU's, and other in embedded devices, such as mobile phones, et cetra.

The core design principle behind the TBDR architecture is to reduce the system memory bandwidth
required by the GPU to a bare minimum. As transfer of data between system memory and the GPU is
one of the biggest causes of GPU power consumption, any reduction that can be made in this area
will allow the GPU to operate at a lower power. Additionally, the reduction in system memory
bandwidth use and the hardware optimizations associated with it (such as using on-chip buffers) can
boost application performance. Because of this development strategy, POWERVR graphics cores
have become dominant in the embedded electronic devices market.

I'm just curious: why do they even need to drive vertex data from GPU to CPu and vice versa?
Why not just load all the data (at the begining) in VRAM, so a GPU would have a riect acces to them without any delays? Because, as far as we all know, all operations, connected to operations on vertex or texel data are done in GPU internally, there's like no reason for CPU to even touch these data.
CPU in this chain are only needed to send commands to GPU, like, "Hey, you, GPU, turn on the camera, that I set previously on coordinates 0,0,0, by 25.61415 degress in X plane" or "Hey, GPU, set framebuffer resolution to 1024*768, 32bpp color-depth".
Maybe I missing some part of how GPU and CPU's are operating together todays, but anyway.
Hope you'll enlighten me a little.

Best Answer

There are a number of architectures, actually. And they're not as different as claimed by the different vendors (although there are still differences.)

NVIDIA typically builds GPUs that work a lot like general-purpose CPUs: There's a large framebuffer out in GDRAM, there's a number of large caches for texture fetch and buffering, and there are groups of pixel processors (typically 4x2 pixels) that operate on some particular primitive at a time. For framebuffer blending or writing, the I/O goes through the memory bus, somewhat helped by the caches. If multiple different objects cover the same pixels (overdraw) then the same pixels may go to/from the framebuffer more than once during a particular frame.

POWERVR and Intel typically build "tiled" processors, where they have a similar architecture (4x4 tiles IIRC) but to avoid the big input/output load on the GDDR, they sort all the primitives ahead of rasterization, so they only need to load/write a particular block of framebuffer pixels once per frame. At least, that's their design goal -- this is not always necessarily achieved. The benefit is that you need to spend less bandwidth on reading and writing back framebuffer pixels. However, approximately the same amount of texture bandwidth is used, so the savings aren't necessarily orders of magnitude (it's scene dependent.)

The Xbox architecture is a bit of a hybrid -- it has EDRAM that's bigger than itty-bitty tiles, but smaller than a full HD framebuffer. The GPU runs through the primitives, and writes them to the EDRAM, which then "resolves" to the output framebuffer (this also calculates things like anti-aliasing.) The EDRAM framebuffer is made to be effectively fast enough to be non-blocking. The screen will be split into some number of tile areas -- 1 through 4 for a particular frame, typically.

Related Topic