Electronic – Different ways of using DSP slices in Spartan 6 FPGA

dspfpgahdlxilinx

I am reading the Spartan 6 DSP slice user guide, and I need to use the DSP slice in a project of mine.

I stumbled upon this question, which basically suggests 3 ways of using the DSP slices

Inferring the DSP slice
Using Core generator
Using the RAW DSP instantiating template

The second option is almost self explanatory, but I somehow don't feel like using it, since it feels somewhat superficial. I am interested in using the first and third options, since those options are the most customisable, but I am having a hard time understanding where to start.

These are the questions I have on my mind :

What are the different ways in which you can infer a DSP slice in your design ?
Where do I find the RAW DSP instantiating template? I have been googling for it, but I didn't find a definitive guide to it.
Building up on my second question, there are a lot of xilinx documents one can read to get information on a lot of things. Since I am a noob, I always get confused about what to read, and I always like to have a mindmap of what all is out there, and so what options I have. Is there a place where all the xilinx documents with their description are listed, or what all documents I can refer for a particular application ?

Best Answer

Inferring DSP slices is actually pretty straightforward. The Spartan 6 has DSP48A1 DSP slices, so take a look at Xilinx UG389. Page 15 has a block diagram of the DSP slice. XST is quite good about inferring DSP slices. Just make sure to get all of the pipeline registers in there for maximum performance, and make sure all of your bit widths are no wider than those shown on the block diagram. Here is a simple multiplier with AXI stream interfaces that infers a DSP slice on a Spartan 6: https://github.com/alexforencich/verilog-dsp/blob/master/rtl/dsp_mult.v .

Also take a look at the XST user guide, ug627, pages 98-121. One rather annoying thing to note: the pipelined multipliers in that section will not synthesize to completely pipelined DSP48 slices (they will probably infer slices, but you will get a performance penalty as the registers will not necessarily be in the correct locations). For example, the coding examples and block diagram on pages 104-108 all show a multiplier with one pipeline register before and three after. When I first looked at that, I assumed that XST would be smart enough to move the registers to match the actual DSP slice (it is possible to move registers "through" the multiplier without changing the operation). It isn't. You should add registers (with only synchronous resets!) exactly as shown in the DSP slice manual in order for XST to infer a DSP slice properly with the pipeline registers in the right places for maximum performance (note that this registers are implemented internally in the DSP slice; adding all of the pipeline registers shown in the DSP slice user guide will only result in a latency penalty - they will not consume fabric flip-flops). I would recommend printing out the DSP slice block diagram and tacking it up on the wall as a reference. And also don't forget to look at the synthesis logs to make sure the DSP slices are pulling in the pipeline registers correctly.

As far as a listing of documentation, there isn't one good place for everything (FPGAs, IP cores, software, etc.). For just the features of a single FPGA, take a look at the product page. For example, http://www.xilinx.com/products/silicon-devices/fpga/spartan-6.html#documentation . Make sure to select 'user guides', not 'datasheets'. That should give you a pretty comprehensive list of the Spartan 6 documentation.

Problem with Design #1

I have noticed that you must specify the two ports in two separate processes for XST to infer dual-port RAM - if you don't you won't get the two ports. Separate processes is also how Xilinx suggests infering Dual-port RAM in XST User Guide. Hence your Design #1 will only infer single-port ram.

You can see my general VHDL for infering dual-port RAM with XST at the bottom of this post. (Details: http://www.fpga-dev.com/infering-dual-port-blockram-with-xst/)

Problem with Design #2

In your Design #2, you register the addres twice, probably unintentionally. <= signal assignments are made at the end of the process, not immediately. This code is equivalent to yours, only with simpler signal names:

-- sequential context (A, B, C are signals):
if rising_edge(clk) then
  B <= A;
  C <= B;
end if;

Here C <= B; will not assign to C what was assigned to B on the previous line, since that assignment only takes effect at the end of the process. If the signals are bits and the stimuli is a pulse on A, this would be the result of the above code:

clk _|"|_|"|_|"|_|"|_|"|_|"|
A   ______|"""|_____________
B   __________|"""|_________
C   ______________|"""|_____

Declaring B a variable instead and assigning with := will assign immediately:

-- sequential context (A, C are signals; B is variable):
if rising_edge(clk) then
  B := A;
  C <= B;
end if;

yielding

clk _|"|_|"|_|"|_|"|_|"|_|"|
A   ______|"""|_____________
B   __________|"""|_________
C   __________|"""|_________

Infering dual-port BlockRam with XST

(More details on this at http://www.fpga-dev.com/infering-dual-port-blockram-with-xst/.)

Below is my parameterized module for generic dual-port RAM. It will successfully infer dual-port RAM, as desired, with XST.

(Remove the write enable-signals and write logic to get ROM instead of RAM.)

Specify width and depth with width and highAddr (one less than desired depth) generics.

library IEEE;
use IEEE.STD_LOGIC_1164.all;

entity genRAM is
  generic(
    width     : integer;
    highAddr  : integer -- highest address (= size-1)
  );
  port(
    -- Two sets of ports (A and B), each set having ports Adress, Data in,
    -- Data out and Write enable:
    Aaddr     : in  integer range 0 to highAddr        := 0;
    ADI       : in  std_logic_vector(width-1 downto 0) := (others => '0');
    ADO       : out std_logic_vector(width-1 downto 0) := (others => '0');
    AWE       : in  std_logic                          := '0';
    Baddr     : in  integer range 0 to highAddr        := 0;
    BDI       : in  std_logic_vector(width-1 downto 0) := (others => '0');
    BDO       : out std_logic_vector(width-1 downto 0) := (others => '0');
    BWE       : in  std_logic                          := '0';
    clk       : in  std_logic
  );
end genRAM;

architecture arch of genRAM is
  subtype TmemWord is bit_vector(width-1 downto 0);
  type    Tmem     is array(0 to highAddr) of TmemWord;
  shared variable memory: Tmem;

  process(clk) is
  begin
    if (rising_edge(clk)) then
      ADO <= To_StdLogicVector(memory(Aaddr));
      if (AWE = '1') then
        memory(Aaddr) := To_bitvector(std_logic_vector(ADI));
      end if;
    end if;
  end process;

  process(clk) is
  begin
    if (rising_edge(clk)) then    
      BDO <= To_StdLogicVector(memory(Baddr));
      if (BWE = '1') then
        memory(Baddr) := To_bitvector(std_logic_vector(BDI));
      end if;
    end if;
  end process;
end arch;

The code above implements read-first behavior. That means that if address 0x00 contains 0xcafe and you write 0xbabe to 0x00, the cycle after the write will display 0xcafe on the data-out port ("data is read to output port before being written to memory").

If you desire write-first behaviour, change order of the reading and writing for both processes, below is how it would be for port A:

-- excerpt for write-first behaviour:
if (AWE = '1') then
  memory(Aaddr) := To_bitvector(std_logic_vector(ADI));
end if;
ADO <= To_StdLogicVector(memory(Aaddr));

In the above case, data-out would display 0xbabe one cycle after the write ("data is written to memory before reading memory contents to output port").

Best Answer

Related Solutions

Electronic – FPGA firmware design: How big is too big

Electronic – Inferring Dual-Port Block RAM

Problem with Design #1

Problem with Design #2

Infering dual-port BlockRam with XST

Related Topic