Electronic – Inferring BRAM with unused addresses efficiently

fpgaramvhdlxilinx

What is a correct way to infer a RAM with some unused higher addresses (using block RAMs)?
Using the code below (default values for generics, Xilinx synth and map) I get a RAM sized the same as if depth was set to 2**ADDRWIDTH:

entity foo is
    generic (
        DATAWIDTH : positive := 8;
        DATADEPTH : positive := 5000;
        ADDRWIDTH : positive := 13
    );
    port (
        clk_a  : in std_logic;
        we_a   : in std_logic;
        addr_a : in std_logic_vector(ADDRWIDTH-1 downto 0);
        di_a   : in std_logic_vector(DATAWIDTH-1 downto 0);
        do_a   : out std_logic_vector(DATAWIDTH-1 downto 0)
    );
end foo;

architecture bar of foo is
    type myram_type is array (DATADEPTH-1 downto 0) of std_logic_vector(DATAWIDTH-1 downto 0); --! type for ram content
    shared variable myram : myram_type; --! ram
begin
    process (clk_a)
    begin
        if rising_edge(clk_a) then
            if we_a = '1' then
                myram(conv_integer(addr_a)) := di_a;
            end if;
            do_a <= myram(conv_integer(addr_a));
        end if;
    end process;
end bar;

For example, I want a RAM with DATAWIDTH = 8 and DATADEPTH = 5000, so the address has to be ADDRWIDTH = 13 because ADDRWIDTH = 12would only allow to address 4096 RAM locations. Lets assume one block RAM ressource on my FPGA can hold 8192 bits.
If I handcoded this I required 5000*8/8192 rounded upwards = 5 block RAM ressources.
However, with the code above, synthesis and map of Xilinx results in 8 block RAM ressources being used, because thats what can be addressed by 13 bit wide addresses…
Nontheless, this is not really efficient use of ressources since 3 of the 8 block RAMs will never be used.
I tried to check if the address at the input is larger than DATADEPTH and then assign don't cares for the data, but that results in the whole ram being implemented as distributed RAM / LUTRAM.
Am I missing something important or do I have to use one big ugly generate for this?

Best Answer

Actually, using 8 BRAMs in an 8K×1 configuration, rather than 5 BRAMs in a 1K×8 configuration, is more efficient in several important ways.

With the 8 BRAMs, you can simply connect all of the address and control lines to all of the BRAMs, and one bit from the data input and data output buses to each of the BRAMs. No other logic is required at all.

On the other hand, with the 5-BRAM configuration, you'll need extra logic to decode the upper 3 address bits to enable one BRAM at a time, and you'll also need a 5:1 multiplexer on the data output bus to select the data from correct BRAM when reading. This uses extra resources within the FPGA, and it also adversely affects the timing, reducing the maximum clock frequency you can use.

If you really need to use the BRAM capacity as efficiently as possible, and you don't care about the timing and resource issues, then you'll have to explicitly code your memory as a module that uses five 1K×8 memories internally.