Electronic – Is this matrix-vector multiplication function in VHDL parallelized

fpgamatrixvhdl

I have the following VHDL function that multiples a given mxn matrix a by a nx1 vector b:

function matrix_multiply_by_vector(a: integer_matrix; b: integer_vector; m: integer; n: integer)
return integer_vector is variable c : integer_vector(m-1 downto 0) := (others => 0);
begin
    for i in 0 to m-1 loop
        for j in 0 to n-1 loop
            c(i) := c(i) + (a(i,j) * b(j));
        end loop;
    end loop;
    return c;
end matrix_multiply_by_vector;

It works well but what does this actually implement in hardware? Specifically, what I want to know is if it is smart enough to realize that it can parallelize the inner for loop, essentially computing a dot product for each row of the matrix. If not, what is the simplest (i.e. nice syntax) way to parallelize matrix-vector multiplication?

Best Answer

In 'hardware' (VHDL or Verilog) all loops are unrolled and executed in parallel.

Thus not only your inner loop, also your outer loop is unrolled.

That is also the reason why the loop size must be known at compile time. When the loop length is unknown the synthesis tool will complain.

It is a well known trap for beginners coming from a SW language. They try to convert:

int a,b,c;
   c = 0;
   while (a--)
     c +=  b;

To VHDL/Verilog hardware. The problem is that it all works fine in simulation. But the synthesis tool needs to generate adders: c = b+b+b+b...b;

For that the tool needs to know how many adders to make. If a is a constant fine! (Even if it is 4.000.000. It will run out of gates but it will try!)

But if a is a variable it is lost.

Related Solutions

Electronic – Multiplication in VHDL

If you multiply 2 5-bit numbers (A and B are both std_logic_vector(4 downto 0)) don't you need 10 bits (not 9) to store it in (so P should be std_logic_vector(9 downto 0)? (31*31 = 961: needs 10 bits)

But also - don't use std_logic_arith/_unsigned. Use ieee.numeric_std and then use the unsigned data type.

Electronic – Qm.n multiplication in VHDL

One reason such a function doesn't belong in numeric_std is that, in practice, you may need more control of the details...

Addition is quite straightforward and most tools and technologies implement it well.

But multiplication is difficult enough that FPGA manufacturers devote chunks of FPGA area to providing 18-bit signed multipliers with associated logic. Synthesis tools will use these, but perhaps not optimally. If you need a 32-bit multiply, you might get a badly pipelined (slow!) multiplication that you can improve on by splitting the multiply into 4 and summing partial products yourself. (synth tools are improving, so this may no longer be true).

Or you may need to round, or dither, instead of truncating the product.

Or one input is a constant, so that KCM (constant coefficient multipliers) unrolled in hardware yields a more efficient solution.

So multiplication is still not a one-size-fits-all operation, and it certainly wasn't when numeric_std was created. As Martin Thompson says, look at the newer fixed-point library for what is possible now.

As for performing your own fixed point scaling and truncation; I find it easier to reason starting at the MSB and working down...

Given your 8-bit Q2.5 format (signed!) numbers,

s_mm.nnnnn * s_mm.nnnnn = ss_mmmm.nn_nnnn_nnnn

just remember that multiplying the sign bits effectively gives you 2 identical sign bits EXCEPT for the case -4.0*-4.0 (more generally, both inputs -2**m). If you can guarantee this doesn't happen (e.g. you control the filter coefficients) you can simplify handling this case...

Best Answer

Related Solutions

Electronic – Multiplication in VHDL

Electronic – Qm.n multiplication in VHDL

Related Topic