If you multiply 2 5-bit numbers (`A`

and `B`

are both `std_logic_vector(4 downto 0)`

) don't you need 10 bits (not 9) to store it in (so `P`

should be `std_logic_vector(9 downto 0)`

? (31*31 = 961: needs 10 bits)

But also - don't use `std_logic_arith`

/`_unsigned`

. Use `ieee.numeric_std`

and then use the `unsigned`

data type.

One reason such a function doesn't belong in numeric_std is that, in practice, you may need more control of the details...

Addition is quite straightforward and most tools and technologies implement it well.

But multiplication is difficult enough that FPGA manufacturers devote chunks of FPGA area to providing 18-bit signed multipliers with associated logic. Synthesis tools will use these, but perhaps not optimally. If you need a 32-bit multiply, you might get a badly pipelined (slow!) multiplication that you can improve on by splitting the multiply into 4 and summing partial products yourself. (synth tools are improving, so this may no longer be true).

Or you may need to round, or dither, instead of truncating the product.

Or one input is a constant, so that KCM (constant coefficient multipliers) unrolled in hardware yields a more efficient solution.

So multiplication is still not a one-size-fits-all operation, and it certainly wasn't when numeric_std was created. As Martin Thompson says, look at the newer fixed-point library for what is possible now.

As for performing your own fixed point scaling and truncation; I find it easier to reason starting at the MSB and working down...

Given your 8-bit Q2.5 format (signed!) numbers,

```
s_mm.nnnnn * s_mm.nnnnn = ss_mmmm.nn_nnnn_nnnn
```

just remember that multiplying the sign bits effectively gives you 2 identical sign bits EXCEPT for the case -4.0*-4.0 (more generally, both inputs -2**m). If you can guarantee this doesn't happen (e.g. you control the filter coefficients) you can simplify handling this case...

## Best Answer

In 'hardware' (VHDL or Verilog)

allloops are unrolled and executed in parallel.Thus not only your inner loop, also your outer loop is unrolled.

That is also the reason why the loop size must be known at compile time. When the loop length is unknown the synthesis tool will complain.

It is a well known trap for beginners coming from a SW language. They try to convert:

To VHDL/Verilog hardware. The problem is that it all works fine in simulation. But the synthesis tool needs to generate adders:

`c = b+b+b+b...b;`

For that the tool needs to know how many adders to make. If

`a`

is a constant fine! (Even if it is 4.000.000. It will run out of gates but it will try!)But if

`a`

is a variable it is lost.