Electronic – Is this matrix-vector multiplication function in VHDL parallelized


I have the following VHDL function that multiples a given mxn matrix a by a nx1 vector b:

function matrix_multiply_by_vector(a: integer_matrix; b: integer_vector; m: integer; n: integer)
return integer_vector is variable c : integer_vector(m-1 downto 0) := (others => 0);
    for i in 0 to m-1 loop
        for j in 0 to n-1 loop
            c(i) := c(i) + (a(i,j) * b(j));
        end loop;
    end loop;
    return c;
end matrix_multiply_by_vector;

It works well but what does this actually implement in hardware? Specifically, what I want to know is if it is smart enough to realize that it can parallelize the inner for loop, essentially computing a dot product for each row of the matrix. If not, what is the simplest (i.e. nice syntax) way to parallelize matrix-vector multiplication?

Best Answer

In 'hardware' (VHDL or Verilog) all loops are unrolled and executed in parallel.

Thus not only your inner loop, also your outer loop is unrolled.

That is also the reason why the loop size must be known at compile time. When the loop length is unknown the synthesis tool will complain.

It is a well known trap for beginners coming from a SW language. They try to convert:

int a,b,c;
   c = 0;
   while (a--)
     c +=  b;

To VHDL/Verilog hardware. The problem is that it all works fine in simulation. But the synthesis tool needs to generate adders: c = b+b+b+b...b;

For that the tool needs to know how many adders to make. If a is a constant fine! (Even if it is 4.000.000. It will run out of gates but it will try!)

But if a is a variable it is lost.