I have a general question about the efficiency of a synthesizable state machine.
The first version uses the same counter for each state.
The second uses one own counter for each state.
Which version of the two is more efficient (logic area, speed…)??
How much area of the FPGA is occupied by the routing of the count1 signal when i use the same counter for each state.
Is it better to user one counter for each state??
I'hope somebody with more experience can explain which solution is the best (maybe a third version) and why.
Thank you!
Kind Regards,
Oliver
— 1. Version ===============================================================
signal count1: integer range 0 to 1000 := 1000;
type mystates is (s1, s2, s3, s4);
signal mymode: mystates := s1;
BEGIN
MyProcess: process(clk)
BEGIN
IF (clk'event and clk = '1') THEN
case mymode is
when s1 =>
If (count1 = 0) then
mymode <= s2;
count1 <= 555;
-- (stuff)
else
count1 <= count1 - 1;
end if;
when s2 =>
If (count1 = 0) then
mymode <= s3;
count1 <= 666;
-- (stuff)
else
count1 <= count1 - 1;
end if;
when s3 =>
If (count1 = 0) then
mymode <= s4;
count1 <= 784;
-- (stuff)
else
count1 <= count1 - 1;
end if;
when s4 =>
If (count1 = 0) then
mymode <= s1;
count1 <= 1000;
-- (stuff)
else
count1 <= count1 - 1;
end if;
when others =>
Null;
end case;
END IF;
end process;
— 2. Version ===============================================================
signal count1, count2, count3, count4: integer range 0 to 1000 := 1000;
type mystates is (s1, s2, s3, s4);
signal mymode: mystates := s1;
BEGIN
MyProcess: process(clk)
BEGIN
IF (clk'event and clk = '1') THEN
case mymode is
when s1 =>
If (count1 = 0) then
mymode <= s2;
count1 <= 555;
-- (stuff)
else
count1 <= count1 - 1;
end if;
when s2 =>
If (count2 = 0) then
mymode <= s3;
count2 <= 666;
-- (stuff)
else
count2 <= count2 - 1;
end if;
when s3 =>
If (count3 = 0) then
mymode <= s4;
count3 <= 784;
-- (stuff)
else
count3 <= count3 - 1;
end if;
when s4 =>
If (count4 = 0) then
mymode <= s1;
count4 <= 1000;
-- (stuff)
else
count4 <= count4 - 1;
end if;
when others =>
Null;
end case;
END IF;
end process;
Best Answer
The first version is going to be more efficient in terms of area and speed-- but neither one is very good, IMHO.
A very quick way to estimate the size/speed of something is to think about each output signal and how many inputs the output is derived from. For example: a <= b xor c. We say that 'a' is a function of 2 signals (b and c). The more "inputs" there are, the more logic is required and the slower it will run. Keep in mind that this is a super rough estimate, but is useful for, well, making super rough estimates.
In version 1, you have your mymode "output" which is a function of many inputs. The important input is count1, which is a 10-bit signal. So, without considering the other signals, you can say that mymode is at least a function of 10 inputs. On the other hand, version 2 has 4 count inputs (each 10 bits), so it is a function of at least 40 inputs. That's a lot of inputs, and will create a lot of logic that runs slow.
Now, here's a super easy way to make the logic for both versions smaller and faster. For this example, I'm just going to do a simple counter that "does something" when it finishes it's count. You can adapt the same techniques to your state machine. First, here's your version:
And here is my version:
Your version counts from N downto 0, while mine counts from N-1 downto -1. To make this work, I made the count signal 1 bit larger and also a SLV instead of an integer. But where this really makes things faster is that your version is doing an N-bit comparison where mine is just checking a single bit. In essence, I am using the carry-chain logic from the line "count <= count - 1" to also do my comparison. The carry-chain logic is already there for the counter, I'm just making it one bit longer. Since the carry-chain logic is super fast in an FPGA, and you're already using it, the resulting logic is super small and super fast.
For our 10 bit counter, the line "elsif count=0 then" would require three 4-input LUTs and 2 levels of logic in a Xilinx Spartan-3. My version requires 1 Flip-Flop (and associated carry-chain that would have otherwise gone unused) and essentially 0 levels of logic.
But let's say that the counter was 32 bits. Your version would require 11 LUTs and 3 levels of logic. Mine would stay the same at 1 FF and 0 levels.
When you apply my method to the two versions of your state machine, each version will work. Version 1 is still the better approach, but in some circumstances you can't have a single counter and so you must use Version 2. With my method, instead of having a function of at least 40 inputs, you have a function of at least 4 inputs. 4 is much better than 40!