Why does hardware division take so much longer than multiplication on a microcontroller? E.g., on a dsPIC, a division takes 19 cycles, while multiplication takes only one clock cycle.

I went through some tutorials, including *Division algorithm* and *Multiplication algorithm* on Wikipedia. Here is my reasoning.

A division algorithm, like a *slow division method with restoring* on Wikipedia, is a recursive algorithm. This means that (intermediate) results from step `k`

are used as inputs to step `k+1`

, which means that these algorithms cannot be parallelized. Therefore, it takes at least `n`

cycles to complete the division, whereas `n`

is a number of bits in a dividend. For 16-bit dividends, this is equal to at least 16 cycles.

A multiplication algorithm doesn't need to be recursive, which means that it is possible to parallelize it. However, there are many different multiplication algorithms, and I don't have a clue which one may be used by microcontrollers. How does multiplication work on a hardware/microcontroller?

I've found a Dadda multiplier algorithm, which is supposed to take only one clock cycle to finish. However, what I don't get here is that Dadda's algorithm proceeds in three steps, whereas results from step 1 are used in step 2, etc. According to this, this would take at least three clock cycles to finish.

## Best Answer

A divider maps much less elegantly to typical hardware. Take Lattice ICE40 FPGAs as examples.

Let us compare two cases: this 8x8 bit to 16 bit multiplier:

and this divider that reduces 8 and 8 bit operands to 8 bit result:

(Yes, I know, the clock doesn't

doanything)An overview of the generated schematic when mapping the

multiplierto an ICE40 FPGA can be found here and thedividerhere.The synthesis statistics from Yosys are:

## multiply

## divide

It's worth noting that the size of the generated verilog for a full-width multiplier and a maximally-dividing divider aren't that extreme. However, if you'll look at the pictures below, you'll notice the multiplier has maybe a depth of 15, whereas the divider looks more like 50 or so; the critical path (i.e. the longest path that can occur during operation) is what defines the speed!

You won't be able to read this, anyway, to get a visual impression. I think the differences in complexity are possible to spot. These are single cycle multiplier/dividers!

## Multiply

Multiply on an ICE40 (warning: ~100 Mpixel image)

## Divide

(Divide on an ICE40) (warning: ~100 Mpixel image)