Electronic – vhdl synthesis optimization: counters in statemachines

fpgavhdlxilinx

I have a general question about the efficiency of a synthesizable state machine.

The first version uses the same counter for each state.
The second uses one own counter for each state.
Which version of the two is more efficient (logic area, speed…)??

How much area of the FPGA is occupied by the routing of the count1 signal when i use the same counter for each state.
Is it better to user one counter for each state??

I'hope somebody with more experience can explain which solution is the best (maybe a third version) and why.

Thank you!

Kind Regards,

Oliver

— 1. Version ===============================================================

signal count1: integer range 0 to 1000 := 1000;  
type mystates is (s1, s2, s3, s4);  
signal mymode: mystates := s1;  

BEGIN  

MyProcess: process(clk)  

BEGIN  

    IF (clk'event and clk = '1') THEN  

        case mymode is  
        when s1 =>   
            If (count1 = 0) then  
            mymode <= s2;
            count1 <= 555;
            -- (stuff)
            else  
            count1 <= count1 - 1;  
            end if;  
        when s2 => 
            If (count1 = 0) then  
            mymode <= s3;
            count1 <= 666;
            -- (stuff)
            else  
            count1 <= count1 - 1;  
            end if; 
        when s3 => 
            If (count1 = 0) then  
            mymode <= s4;
            count1 <= 784;
            -- (stuff)
            else  
            count1 <= count1 - 1;  
            end if; 
        when s4 =>  
            If (count1 = 0) then  
            mymode <= s1;
            count1 <= 1000;
            -- (stuff)
            else  
            count1 <= count1 - 1;  
            end if; 
        when others =>  
            Null;  
        end case;  

    END IF;

end process;

— 2. Version ===============================================================

signal count1, count2, count3, count4: integer range 0 to 1000 := 1000;  
type mystates is (s1, s2, s3, s4);  
signal mymode: mystates := s1;  

BEGIN  

MyProcess: process(clk)  

BEGIN  

    IF (clk'event and clk = '1') THEN  

        case mymode is  
        when s1 =>   
            If (count1 = 0) then  
            mymode <= s2;
            count1 <= 555;
            -- (stuff)
            else  
            count1 <= count1 - 1;  
            end if;  
        when s2 => 
            If (count2 = 0) then  
            mymode <= s3;
            count2 <= 666;
            -- (stuff)
            else  
            count2 <= count2 - 1;  
            end if; 
        when s3 => 
            If (count3 = 0) then  
            mymode <= s4;
            count3 <= 784;
            -- (stuff)
            else  
            count3 <= count3 - 1;  
            end if; 
        when s4 =>  
            If (count4 = 0) then  
            mymode <= s1;
            count4 <= 1000;
            -- (stuff)
            else  
            count4 <= count4 - 1;  
            end if; 
        when others =>  
            Null;  
        end case;  

    END IF;

end process;

Best Answer

The first version is going to be more efficient in terms of area and speed-- but neither one is very good, IMHO.

A very quick way to estimate the size/speed of something is to think about each output signal and how many inputs the output is derived from. For example: a <= b xor c. We say that 'a' is a function of 2 signals (b and c). The more "inputs" there are, the more logic is required and the slower it will run. Keep in mind that this is a super rough estimate, but is useful for, well, making super rough estimates.

In version 1, you have your mymode "output" which is a function of many inputs. The important input is count1, which is a 10-bit signal. So, without considering the other signals, you can say that mymode is at least a function of 10 inputs. On the other hand, version 2 has 4 count inputs (each 10 bits), so it is a function of at least 40 inputs. That's a lot of inputs, and will create a lot of logic that runs slow.

Now, here's a super easy way to make the logic for both versions smaller and faster. For this example, I'm just going to do a simple counter that "does something" when it finishes it's count. You can adapt the same techniques to your state machine. First, here's your version:

signal count :integer range 0 to 1000;  

process (clk)
begin
  if rising_edge(clk) then
    if load='1' then
      count <= some_constant_value;
    elsif count=0 then  -- This line is important
      do something here;
    else
      count <= count - 1;
    end if;
  end if;
end process;

And here is my version:

signal count :std_logic_vector (10 downto 0);  -- Note:  1 extra bit

process (clk)
begin
  if rising_edge(clk) then
    if load='1' then
      count <= some_constant_value - 1;
    elsif count(count'high)='1' then  -- This line is important
      do_something_here;
    else
      count <= count - 1;
    end if;
  end if;
end process;

Your version counts from N downto 0, while mine counts from N-1 downto -1. To make this work, I made the count signal 1 bit larger and also a SLV instead of an integer. But where this really makes things faster is that your version is doing an N-bit comparison where mine is just checking a single bit. In essence, I am using the carry-chain logic from the line "count <= count - 1" to also do my comparison. The carry-chain logic is already there for the counter, I'm just making it one bit longer. Since the carry-chain logic is super fast in an FPGA, and you're already using it, the resulting logic is super small and super fast.

For our 10 bit counter, the line "elsif count=0 then" would require three 4-input LUTs and 2 levels of logic in a Xilinx Spartan-3. My version requires 1 Flip-Flop (and associated carry-chain that would have otherwise gone unused) and essentially 0 levels of logic.

But let's say that the counter was 32 bits. Your version would require 11 LUTs and 3 levels of logic. Mine would stay the same at 1 FF and 0 levels.

When you apply my method to the two versions of your state machine, each version will work. Version 1 is still the better approach, but in some circumstances you can't have a single counter and so you must use Version 2. With my method, instead of having a function of at least 40 inputs, you have a function of at least 4 inputs. 4 is much better than 40!