My notation: n x n → m
Two operands with length "n" bits and product with length "m" bits.
1. Multiplication result depends on operands being signed /unsigned
Ex. 4x4 → 8:
Fh x Fh = E1h, unsigned integer: 15 x 15
Fh x Fh = 01h, signed integer: -1 x -1
Note: The 4 LSbs are the same for signed and unsigned multiplication.
2. Signed product Fh x 8h (-1 x -8) depends on target size
4 x 4 → 8: 08h (or 8), OK
4 x 4 → 4: 8h (or -8), INCORRECT (overflow).
Generalization: Negating (or getting the two's complement) a byte containing -128, a word containing -32768, or a double word containing -2147483648, cause no changing in the operand, but the NEG instruction will set the overflow flag (ex. for x86 architecture).
You are correct, and the diagram is wrong. You definitely do need wider adders as you move down the tree as shown. Each of the results from the first tier of adders is 34 bits wide, those from the second tier are 36 bits wide, etc. Third tier sums are 40 bits, and fourth tier sums are 48 bits.
By the time you get to the last (fifth) tier, you're adding together a 48-bit number (the sum of the high-order partial products) and a 32-bit number (the shifted sum of the low-order partial products).
Note that you get 1 LSB of the final product before the first tier of adders, another one from their output, two more LSBs from the second tier, and so forth. By the time you get to the input of the fifth tier, you already have 16 LSBs of the result, and the final adder gives you the remaining 48 bits.
While it is tempting at first glance to pull off MSBs in a similar manner, it is always possible to construct an example in which a carry in the final adder needs to propogate all the way to the MSB of the result.
Side note: I have a book Computer Architecture: A Quantitative Approach, 2nd edition, published in 1996 by the same two authors. I'm wondering whether my book is a predecessor of yours, or a completely different effort. It does not have the figure you show in your question in it.
Best Answer
If you are talking about a general, clocked, shift-and-add multiplier then the total delay will be \$NT_p\$, where \$N\$ is the number of bits and \$T_p\$ is the clock period.The clock period has to be higher than the time it takes for one N-bit addition to take place so in summary Delay = \$NT_p \geq NT_N\$ where \$T_N\$ is the time for one N-bit addition to be completed.
For carry save multiplication, Delay = \$NT_1 + T_N\$, where \$T_1\$ is the time it takes for one 1-bit addition to be computed.So comparing the two multiplication schemes we can see that carry save addition will be faster than shift-and-add multiplication for \$N > 2\$.Shift-and-add multiplication is resource efficient whilst carry save multiplication is time efficient.