Electronic – Verilog memory designs with multiple read/write ports – poor circuit performance when synthesized

digital-logicmemoryverilog

I am interested in designing (with verilog) some memory structures that have multiple (let's say 3) read/write ports. I've been doing some studying on architecture and what I've heard is that these are not trivial hardware implementations, and can create a lot slower circuits.

With behavioural verilog I would imagine it's quite simple, something along the lines of:

always @ (posedge clk) begin
    if (read_enable) begin
        out1 <= mem[read_addr1];
        out2 <= mem[read_addr2];
        out3 <= mem[read_addr3];
    end
    //something similar if I want multiple writes
end

Assuming it synthesizes, will I have a crappy and slow circuit, and why? Can it be alleviated by going with a more custom design using gates instead of behavioural coding?

Thanks

Best Answer

Firstly, are you synthesizing to an FPGA?

This paper from Cypress shows a dual-port RAM as a block diagram. It's not quite clear from there, but the dual-port array in the middle is an array which has a double set of lines: 2 row selects, 2 write column sets, 2 read column sets.

Scaling beyond 2 is difficult because then you need 3, 4 etc sets of wires, and your RAM density goes down as you run out of space for wires.

If you write Verilog which implies more than 2 ports, the synthesis tool will build it out of flops with multiplexors on the front, consuming far more space than RAM cells.

Why do you actually need multiple ports? How large a RAM do you want? Building a memory arbitrator on the front of a normal RAM may be the solution you want.

Related Solutions

Electronic – Multi-Port RAM (1 write port, many read ports)

Assuming you need a read cycle on each port on each clock cycle, each BRAM will give you two read ports. Beyond that, you have to replicate the contents of the memory.

Is the bandwidth required at each port less than the raw bandwidth of the BRAM? In that case, you might consider multiplexing the ports. Use a counter that runs at the full speed of the BRAM to drive a multiplexer that scans the address bus for each port, feed these addresses to the BRAM, and then deliver the data (typically 2 clocks later) to the corresponding data bus for each port.

The downside of this approach is that the access latency for each port is now N clocks longer than the non-multiplexed case. There are various ways to deal with this latency, including adding additional pipeline stages to the other data paths.

Note that with a 2-port BRAM inside the module, you can scan two of the external ports at a time.

Electronic – Read and write values in Multidimensional arrays in verilog

This is not one question but breaking down the main points:

<= is a non-blocking assignment used when implying a flip-flop output.
= is a blocking assignment used when implementing combinatorial output.

example usage:

input [10:0] in_data;

reg [11:0] flip_flop;
reg [11:0] next_data

//Flip-flop
always @(posedge clock) begin
  flip_flop <= next_data;
end

//Combinatorial
always @* begin
  next_data = in_data + 11'd1;
end

You defined 3 different data types:

input [15:0] me;
reg [15:0] p_array [7:0];
reg abc_pqr [2:0];          //Same as reg [0:0] abc_pqr [2:0]

me is a standard 16 bit word. p_array is an 8 deep memory of 16 bit words.
NB: it is typical to define as reg [15:0] p_array [0:7]; abc_pqr [2:0]; is a 3 deep 1bit memory.

You have :

abc_pqr[0] <= me[0]; //This is a 1 bit assignment:
abc_pqr[1] <= me[1];
abc_pqr[2] <= me[2]; //<-- corrected this to 2

Looks valid.

Then :

p_array[abc_pqr[0]] <= me[0];

p_array needs a [7:0] wide index, you have only supplied 1 bit. and a p_array element is 16 bits wide your left hand side is again only 1 bit.

Best Answer

Related Solutions

Electronic – Multi-Port RAM (1 write port, many read ports)

Electronic – Read and write values in Multidimensional arrays in verilog

Related Topic