Electronic – Why not SRAM for FPGA in image processing

fpgaprocessingsdramsramvhdl

I'm beginning with VHDL coding and I've done some basic image processing on my development board.

I've noticed that most FPGA development boards often use DRAM (SDRAM, DDRAM) as RAM.
For example, I'm using a FPGA dev board from TERASIC and it uses DRAM.
Although DRAM memory seems to be cheaper that SRAM (Static RAM), from my point of view it is not optimized for image processing, let me explain:
DRAM data is accessed by block, SRAM data can be accessed (addressed) byte by byte (or word by word, or 36 bit at a time, whatever). This feature is more matched to image processing, for example, a simple mean filter where a pixel is the mean of its neighbour, say

Pix(x,y) = (Pix(x-1,y-1) + Pix(x,y-1) + Pix(x+1,y-1) + Pix(x-1,y) + Pix(x,y) 
           + Pix(x+1,y) + Pix(x-1,y+1) + Pix(x,y+1) + Pix(x+1,y+1))/9

is difficult to do with DRAM, you need to have an internal buffer of at least two lines to memorize lines y-1 and y (doing the processing on the y+1 line). Also it seems to take a huge amount of time to Quartus to do this simple processing (I'm using 2ports FIFO).

If the board had used SRAM like IS61LPS25632A from ISSI for example, the processing would be straightforward since the address of a pixel can be for example

ByteAdress <= x + (LINEWIDTH*y);

From a hardware point of view, with 9 read/write memory clock cycles the process is done for one pix and with some tricks in parallel processing this can be improved I suppose.
Then, is my understanding true? If not, where is the mistake?


EDIT:

Using internal FPGA RAM is not an option for me since there is not enough memory inside the FPGA device I use (Cyclone III) and I don't know if paying for a more expensive development kit just to have some more RAM inside FPGA is judicious.

My question is more related to code efficiency. Say for example that one has 8MB of SRAM. From my point of view, the code produced with this kind of RAM will be more efficient (speed and buffering) in the case of 2D processing (like image processing) because true random access is well suited for accessing data with (big) offset between then, like on this statement:

ByteAdress <= x + (LINEWIDTH*y);

Reorganising information is time consuming. I use a camera that outputs data pixels sequentially, line by line. I can only store these data on two port FIFO RAM component built with DRAM. I can't do even a simple transposition before storing the data with this. Surprisingly I didn't find any code on the web that deals with this 2D problem.

Best Answer

No fundamental reason why not. Synchronous SRAM is truly random access, fairly inexpensive, and easy to interface to.

Its downside in that it occupies a fairly narrow niche between the on-chip BlockRam (not much smaller, free until it forces you to select a larger chip, massively parallel and more flexible) and external DRAM (massive storage capacity at a price SSRAM can't match).

So up to 0.5 or 1MB, external SSRAM is unnecessary, and above 8MB or 16MB (numbers may vary according to your budget and current prices!), SSRAM becomes expensive enough that DRAM takes over despite its limits. Then - if you need random access - you have to massively reorganise the computation to read chunks (bursts or pages) from DRAM into BlockRam where you can process it fast before writing back bursts etc....

But if you have a role for SSRAM within that window, go for it. I have added simple home-made SSRAM boards to augment commercial FPGA platforms where necessary.