Electrical – Concatenating from Block RAM in Verilog

ramverilog

I have instantiated a block RAM module using Block Memory Generator segment of the Xilinx IP Core. Alternatively, I have coded my own simple single-port RAM module, much like on page 33 of these lecture slides (http://www-inst.eecs.berkeley.edu/~cs150/fa11/agenda/lec/lec10-sram1.pdf).

With each clock tick, I'm constantly updating the address and simultaneously writing to that block of my RAM module. Like this:

reg [5:0] address;
initial address = 6'b0;
always @(posedge clk)
begin
    address <= address + 1'b1;
end

block_ram uut (
    .clk(clk),
    .write_en(write_en),
    .address(address),
    .datain(datain),
    .dataout(dataout)
);

After populating the appropriate RAM addresses, what I would like to do is read back specific addresses and concatenate them together to make one larger wire, like so:

wire [624:0] concatenated_ram;
assign concatenated_ram = {ram[0], ram[1], ram[2], ram[3], ...}

The only way I can conceptualize this is by assigning the singular 'dataout' port of my RAM to a different wire depending on the address:

always @(*)
begin
    case (address)
        0: dataout1 <= dataout_from_RAM;
        1: dataout2 <= dataout_from_RAM;
        2: dataout3 <= dataout_from_RAM;
        ...
    endcase
end

Can anyone think of other options? Using a case statement to grab the data doesn't seem that efficient to me.

Thanks for any assistance in advance!!

Best Answer

I figured this out. I didn't really need block RAM at all for my purposes.

To describe the purpose of this exercise (for those interested), I am trying to send data from a text file through a serial communication to an FPGA. Every time the serial connection signals there is data to be sent, the FPGA logic should accept data and write it to an appropriate location. Basically, this amounts to a register file:

reg [15:0] RAM [0:63];  // 64 x 16-bit (128 byte) RAM
reg [5:0] addr;         // 6-bit addressing to 64 elements
always @(posedge clk)
begin
    if (write_data_flag == 1'b1)
    begin
        RAM[addr] <= data_from_USB;
        addr <= addr + 1'b1;
    end
end

When synthesized, this results in distributed RAM (implemented as an LUT), which eats into the available resources I have for synthesizing other logic. Since my RAM array is rather small, I think this is OK.