Electronic – VHDL: Optmize signal comparisons for synthesis

asicfpgasynthesisvhdlxilinx

As a preface, there are certain coding styles used in VHDL/Verilog which help the synthesis tools infer different hardware(some better in perfomance than the other). For example using an if-else-if ladder would infer a series of mux'es whereas a case statement would infer a single wide multiplexer. These coding styles are not significant when doing only functional simulation, but substantial when targeting the RTL for ASIC or FPGA implementation. In case of FPGAs the CLB(configurable logic block) architecture defines the capabilities achieved using RTL(like the LUT input width).

Coming to the Question, I have seen many instances where two n-bit wide signals might have to be compared in VHDL. And I need some advice on the hardware inference. I will use the following code snippet to further narrow the question.

signal counter_a: unsigned(31 downto 0);
signal counter_b: unsigned(31 downto 0);
signal clk, trigger_en, count_b_en : std_logic;

counter_a_gen: process (clk) begin
    if(rising_edge(clk)) then
        counter_a <= counter_a + 1; -- free running counter
    end if;
end process;

counter_b_gen: process (clk) begin
    if(rising_edge(clk)) then
        if(count_b_en = '1') then
            counter_b <= counter_b + 1;
        else 
            counter_b <= (others => '0');
        end if;
    end if;
end process;

-- compare the counters to generate some logic
trigger_gen: process (clk) begin
    if(rising_edge(clk)) then
        if(counter_a = counter_b) then
            trigger_en <= '1';
        else 
            trigger_en <= '0';
        end if;
    end if;
end process;

The above snippet has two 32 bit counters counter_a and counter_b, which have to be compared in a sequential block. If I consider a 4-input LUT in an FPGA, the comparison would need multiple levels of logic. Such a path would be make it hard to meet timing because of the huge combinatorial delays. So my question is how do we make it optmized? In this case to increase the perfomance?

Best Answer

  • If your circuit can support delays to the trigger_en signal, you can split the comparison (for example 4 8 bits comparators) and pipeline the result over several cycles.

  • You can use several comparators in parallel with future values, keeping counter_a+1, counter_b+1,counter a+2, counter a+3... pipeline the result of each comparator (as above), then decide which comparison is valid from the value of count_b_en during the last 2 or 3 cycles. Lots of hardware !

"Huge combinatorial delay" : It really depends on your target frequency. FPGAs have fast carry propagation and direct paths between adjacent LUTs, so the delays are not very large for 32bits comparators. (128 bits is a large comparator, 32bits is quite narrow)

The suggestion above with lots of hardware may be theorically faster but practically slower sometimes because the additional hardware add propagation delays, signal load...