Electronic – Xilinx XST won’t infer block ram

spartan 6synthesisvhdlxilinx

I'm having trouble getting the design of my FPGA 80's computer to fit on a Papilio Duo board which is a Spartan 6 – xcs6slx9. The problem stems from RAM being inferred as distributed instead of block.

Short version : I'm using a generic entity to infer the RAM blocks (see below) and finding that for anything up to an address width of 11 it seems to go distributed, an address width of 12 or more XST is happy to put it into blocks. I've tried attributes to mark it as block but that doesn't seem to work.

Current solution : widen the address width of one instance, zeroing the high address bit… now the design fits.

Long version :

The design requires three dual port 2048x8bit ram modules. One port needs read/write access (cpu access), the other requires read only (video controller). The ports are async and run on different clock domains.

Originally I used this module: RamDualPort for this.

entity RamDualPort is
    generic
    (
        ADDR_WIDTH : integer;
        DATA_WIDTH : integer := 8
    );
    port
    (
        -- Port A
        clock_a : in std_logic;
        clken_a : in std_logic;
        addr_a : in std_logic_vector(ADDR_WIDTH-1 downto 0);
        din_a : in std_logic_vector(DATA_WIDTH-1 downto 0);
        dout_a : out std_logic_vector(DATA_WIDTH-1 downto 0);
        wr_a : in std_logic;

        -- Port B
        clock_b : in std_logic;
        addr_b : in std_logic_vector(ADDR_WIDTH-1 downto 0);
        dout_b : out std_logic_vector(DATA_WIDTH-1 downto 0)
    );
end RamDualPort;

architecture behavior of RamDualPort is 
    constant MEM_DEPTH : integer := 2**ADDR_WIDTH;
    type mem_type is array(0 to MEM_DEPTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
    shared variable ram : mem_type;
begin

    process (clock_a)
    begin
        if rising_edge(clock_a) then

            if clken_a='1' then
                if wr_a = '1' then
                    ram(to_integer(unsigned(addr_a))) := din_a;
                end if;

                dout_a <= ram(to_integer(unsigned(addr_a)));
            end if;

        end if;
    end process;

    process (clock_b)
    begin
        if rising_edge(clock_b) then

            dout_b <= ram(to_integer(unsigned(addr_b)));

        end if;
    end process;

end;

A couple of problems with this: 1) depending on address width some are being inferred as distributed (the main problem I'm asking about) but also 2) those that were getting inferred to block RAMS were being implemented as read-first which for async clocks has issues on Spartan 6's.

The only way I could find to fix the read-first issue was to make both ports read/write with a new module "RamTrueDualPort" as follows:

entity RamTrueDualPort is
    generic
    (
        ADDR_WIDTH : integer;
        DATA_WIDTH : integer := 8
    );
    port
    (
        -- Port A
        clock_a : in std_logic;
        clken_a : in std_logic;
        addr_a : in std_logic_vector(ADDR_WIDTH-1 downto 0);
        din_a : in std_logic_vector(DATA_WIDTH-1 downto 0);
        dout_a : out std_logic_vector(DATA_WIDTH-1 downto 0);
        wr_a : in std_logic;

        -- Port B
        clock_b : in std_logic;
        clken_b : in std_logic;
        addr_b : in std_logic_vector(ADDR_WIDTH-1 downto 0);
        din_b : in std_logic_vector(DATA_WIDTH-1 downto 0);
        dout_b : out std_logic_vector(DATA_WIDTH-1 downto 0);
        wr_b : in std_logic
    );
end RamTrueDualPort;

architecture behavior of RamTrueDualPort is 
    constant MEM_DEPTH : integer := 2**ADDR_WIDTH;
    type mem_type is array(0 to MEM_DEPTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
    shared variable ram : mem_type;
begin

    process (clock_a)
    begin
        if rising_edge(clock_a) then

            if clken_a='1' then

                if wr_a = '1' then
                    ram(to_integer(unsigned(addr_a))) := din_a;
                end if;

                dout_a <= ram(to_integer(unsigned(addr_a)));

            end if;

        end if;
    end process;

    process (clock_b)
    begin
        if rising_edge(clock_b) then

            if clken_b='1' then

                if wr_b = '1' then
                    ram(to_integer(unsigned(addr_b))) := din_b;
                end if;

                dout_b <= ram(to_integer(unsigned(addr_b)));

            end if;

        end if;
    end process;

end;

So that fixed the read-first issue and those rams going to block ram are now implemented as write-first (NB: I don't actually care about read-first/write-first except for the Spartan 6 corrupting ram read-first issue).

Now the problem is getting the smaller 2k (addrWidth 11) instances onto block ram. As mentioned I've tried attributes but it still insists on putting it in distributed ram. I couldn't find any documentation on ram_style for variables (as opposed to signals) but guessed this: (Note the bit ram:variable)

constant MEM_DEPTH : integer := 2**ADDR_WIDTH;
type mem_type is array(0 to MEM_DEPTH-1) of std_logic_vector(DATA_WIDTH-1 downto 0);
shared variable ram : mem_type;
ATTRIBUTE ram_extract: string;
ATTRIBUTE ram_extract OF ram:variable is "yes";
ATTRIBUTE ram_style: string;
ATTRIBUTE ram_style OF ram:variable is "block";

Now XST spits out this which suggests the attribute syntax is understood: (Note mention of ram_extract and ram_style)

Synthesizing Unit <RamTrueDualPort_1>.
    Related source file is "C:/Documents and Settings/Brad/Projects/fpgabee/Hardware/FPGABeeCore/RamTrueDualPort.vhd".
        ADDR_WIDTH = 12
        DATA_WIDTH = 8
    Set property "ram_extract = yes" for signal <ram>.
    Set property "ram_style = block" for signal <ram>.
    Found 4096x8-bit dual-port RAM <Mram_ram> for signal <ram>.
    Found 8-bit register for signal <dout_b>.
    Found 8-bit register for signal <dout_a>.
    Summary:
    inferred   1 RAM(s).
    inferred  16 D-type flip-flop(s).
    inferred   1 Multiplexer(s).
Unit <RamTrueDualPort_1> synthesized.

Synthesizing Unit <RamTrueDualPort_2>.
    Related source file is "C:/Documents and Settings/Brad/Projects/fpgabee/Hardware/FPGABeeCore/RamTrueDualPort.vhd".
        ADDR_WIDTH = 11
        DATA_WIDTH = 8
    Set property "ram_extract = yes" for signal <ram>.
    Set property "ram_style = block" for signal <ram>.
    Found 2048x8-bit dual-port RAM <Mram_ram> for signal <ram>.
    Found 8-bit register for signal <dout_b>.
    Found 8-bit register for signal <dout_a>.
    Summary:
    inferred   1 RAM(s).
    inferred  16 D-type flip-flop(s).
    inferred   2 Multiplexer(s).
Unit <RamTrueDualPort_2> synthesized.

However the 2k blocks still end up distributed:

2048x8-bit dual-port distributed RAM                  : 2
4096x8-bit dual-port block RAM                        : 1

If I take out the redundant address line (ie: put it back to addrWidth=11) all three instances end up distributed and the design no longer fits:

2048x8-bit dual-port distributed RAM                  : 3

What to do? I really don't want to switch back to coregen for this.

PS: I'm an amateur at this – be gentle!.

Best Answer

If you know precisely what you want to end up with, there's no need to have Xst try to infer it from a behavioral model.

You can instantiate a block RAM object directly in HDL code. Details on the appropriate syntax, and the options involved, can be found in Xilinx UG615: Spartan-6 Libraries Guide for HDL Designs, around page 274 ("RAMB16BWER"). You can also use the BRAM_TDP_MACRO macro, which is explained on page 20 of the same document.

You'll need to be familiar with how the Spartan-6 block RAM element works. Information on this is available in Xilinx UG383: Spartan-6 FPGA Block RAM Resources.

(Note that the block RAM has standard widths of 9, 18, or 36 bits; you will probably want to use it in 9-bit mode, and just ignore the extra bit. It's there for designs which need parity bits.)

Related Topic