Electronic – How to make Lattice Symplify Pro infer RAM correctly from VHDL code


I have a design on an iCE40 FPGA, I use iCEcube2 to compile the VHDL code and in my design I try to infer two small RAM buffers.

The type of the buffers is as follow :

type MESSAGE_T is array(0 to 31) of std_logic_vector(7 downto 0);

They are then accessed in two separate process, one process writes to buffer A and reads from buffer B, and the other process does the oposite. The idea is that one process recieves messages, while the other sends replies. There's a mechanism normally preventing reading and writing from the same buffer at the same time.

Now on to the actual issue : Symplify pro (part of the suite offered with iCEcube2) correctly infers RAM for the buffers:

 @N: CL134 :"D:\[...].vhd":189:8:189:23|Found RAM received_message, depth=32, width=8
 @N: CL134 :"D:\[...].vhd":189:8:189:23|Found RAM received_message, depth=32, width=8
 @N: CL134 :"D:\[...].vhd":189:8:189:23|Found RAM received_message, depth=32, width=8
 @N: CL134 :"D:\[...].vhd":196:8:196:20|Found RAM reply_message, depth=32, width=8

That's great ! However just after it seems to remove some redundancy. I get a ton of messages such as

  @W: CL169 :"C:\lscc\iCEcube2.2017.08\synpbase\lib\vhd\std.vhd":1:1:1:2|Pruning unused register received_message_31(7 downto 0). Make sure that there are no unused intermediate registers.

And finally it renounces using the hardware block RAMs but uses registers to simulate it instead.

 @W: FX703 :"d:\[...].vhd":196:8:196:20|Unable to map RAM instance reply_message[7:0] to RAM for technology specified. 
 @W: FX703 :"d:\[...].vhd":189:8:189:23|Unable to map RAM instance received_message_1[7:0] to RAM for technology specified. 
 @W: FX703 :"d:\[...].vhd":189:8:189:23|Unable to map RAM instance received_message[7:0] to RAM for technology specified. 
 @N: MF135 :"d:\[...].vhd":196:8:196:20|RAM reply_message[7:0] (in view: work.test_bitbus(behavioral)) is 32 words by 8 bits.
 @N: MF135 :"d:\[...].vhd":189:8:189:23|RAM received_message_1[7:0] (in view: work.test_bitbus(behavioral)) is 32 words by 8 bits.
 @N: MF135 :"d:\[...].vhd":189:8:189:23|RAM received_message[7:0] (in view: work.test_bitbus(behavioral)) is 32 words by 8 bits.
 @N: MF794 |RAM received_message[7:0] required 768 registers during mapping 

The problem is that it doesn't tell me at all WHY it's unable to map RAM instance.
This is not per-se catastrophic as the design can still work, but it is very wasteful of FPGA ressources and makes routing long and difficult.

EDIT : As for how the registers themselves are accessed, the code is actually quite complex and long so it'd be pointless to post it entirely here.
I tried basically 2 methods.

The first method is made of synchronous processes.

process(reset, clk)
  if reset='1'
       ....blah blah blah...
  elsif rising_edge(clk)
     case ...
     when XXX =>
         receive_message(aaa) <= bbb;  -- Some mutually exclusive reads and write to RAMs
         ccc <= reply_message(ddd);
     when YYY =>
         if eee=0
            receive_message(fff) <= ggg;
            jjj <= reply_message(kkk)
            receive_message(lll) <= mmm;
            nnn <= reply_message(ooo);
      ...   --- A dozen of other cases
  end if;
end process

The second method I tried (with only one of the two processes) was to make the process artificially computational (even though this very significantly complexities the code which is already complex) in order to explicit the SRAM address and data lines.

process(...AAA, BBB, CCC, DDD, sensitivity list here ....)
   next_BBB <= BBB;              -- By default registers retain the same value
   next DDD <= DDD;
   receive_message_adr <= 0;      -- Dummy default value

   case ... =>
   when XXX =>
         receive_message_adr <= AAA;       -- Explicit address and data bus for RAM access
         next_BBB <= received_message_data;
   when YYY =>
         reply_message_adr <= CCC;
         next_DDD <= received_message_data;

   ...         -- A dozen of other cases
end process;

-- Explicit asynchronous SRAM read for above thread
process(received_message, received_message_adr)
    received_message_data <= received_message(received_message_adr);
end process;

-- Explicit update to registers for logic described in above thread
process(reset, clk) is
      if reset='1'
          ...blah blah blah....
      elsif rising_edge(clk)
           BBB <= next_BBB;
           DDD <= next_DDD;
           ... a sh*tload of similar statement involving "next" signals
      end if;
end process;

Unfortunately, despite the added complexity (and decreased readability) in the code, the results were still the same, unable to map to RAM instance. I did not try this for both buffers simultaneously but only one of them. Reason for this is that one of the process is quite more complex, so it's harder to turn it into a computational process involving "next" signals.

Best Answer

In general, if things keep on not working, open the documentation and follow the synthesis style guide which will show you the exact VHDL/verilog you need to write to infer your favorite blockRAM.

That said, you should be able to write a simple behavioral model for easy cases such as this, as long as you follow the datasheet for the iCE40 family, which says clearly:

In all the sysMEM RAM modes, the input data and addresses for the ports are registered at the input of the memory array.

which can also be found in the timing diagrams of the Memory Usage Guide for the iCE40 Devices. Given a read address and read enable on the rising edge of your read-clock, the data will be available in the output register a fixed time later (so no extra reclocking is required internally).

Behavioral models that describe an input register for the read address, read enable and an output register for the read-data should correctly be inferred to a RAM (that might not meet timing because of the fixed time delay).

Your current read logic does neither.

Good luck.