Electronic – adding two 64-bits number with m-bit carry ripple adder and multiplexer, a questions

addercomputer-architecturecomputersdelaymath

I ran into a question from computer architecture class. The professor says that for adding two 64-bit numbers A and B we use m-bit carry ripple adders and multiplexers such as following:

enter image description here

and that for a given m, say 8, we get the minimum amount of delay. Why is such a design faster?

Best Answer

This is a carry select adder. The speed benefit comes from a 2:1 multiplexer selecting M bits being faster than a ripple carry adder of M bits. (Each bit from the multiplexer is only dependent on sum bit and the select bit (i.e., the carry-in), whereas the more significant bits of a ripple carry adder are dependent on the carry-out from the previous bits. The multiplexer delay would be the delay of a one-bit multiplexer plus the 1:M fan-out delay for the carry-in from the previous block into M bits.)

The more significant chunks of M bits are added in parallel with the first M bits, so their delay is zero (the additions will be completed at that same time that the first M bit addition is completed since the inputs are available at the same time and the adders are the same).

Since the latency of a simple carry select adder is the sum of the multiplexer latencies and the latency of the M-bit ripple carry adder, making M larger can reduce the number of multiplexers but increases the ripple carry adder delay. For a 64-bit adder, this latency would be Lrca + ((64/M)-1) * Lmux

In addition, since the multiplexer adds some delay (and the fan-out delay for the multiplexer does not increase linearly with M — e.g., a 7-bit multiplexer would not be 16.7% slower than a 6-bit multiplexer), ripple carry adders for more significant bits may be able to use more bits without adding delay.

Ignoring the complexities of real implementations such as size-dependent multiplexer delay, if the a one bit full adder has N times the latency of a multiplexer, then a minimum-latency 64-bit carry select adder with all stages having the same size would balance ripple carry delay (Lrca = N * M * Lmux) and multiplexer delays ((64/M -1) * Lmux). Finding N * M * Lmux = (64/M -1) * Lmux is solving the quadratic equation N * M2 + M - 64 = 0. If N = 2 (a one-bit full adder has twice the latency of a multiplexer), M = 5.41. In this case M = 5 and M = 6 both give a total latency of a 64-bit adder of 22 multiplexer delays (13 stages of 5 bits: 2 * 5 + 12; 11 stages of 6 bits: 2 * 6 + 10) or 11 one-bit adder delays.

Under the same assumptions but allowing adders of different widths, the widths would be 5, 5, 5, 6, 6, 7, 7, 8, 8, 7. (The second stage is the same width as the first stage, the third stage could be 5.5 bits but that is not an integer number of bits.) This would give a total delay 19 multiplexer delays (10 stages with an initial 5 bit ripple carry adder delay: 2 * 5 + 9) or 9.5 one-bit adder delays.

If the latency of a one-bit full adder equaled the multiplexer latency (or more correctly the latency of a M-bit carry ripple adder equaled M times the latency of a M-bit 2:1 multiplexer including the fan-out delay), then the quadratic equation to solve would be M2 + M - 64 = 0; M = 7.52, so 8-bit stages would be minimal latency (8 + 7 = 15 multiplexer delays). As the number of bits in the entire adder approaches infinity, the ideal M for a simple carry select adder approaches the square root of the number of bits (the linear term becomes insignificant).

(The multiplexer delay might also be exploited to allow more significant bits to arrive slightly later. Bits for the third stage could arrive one multiplexer delay later without increasing the latency of the 64-bit adder. A similar technique, width pipelining, divides the adder into multiple pipeline stages but with single cycle latency for dependent operations if they can start with just the least significant bits of the operands. Intel's Pentium 4 implemented such a width-pipelined (also called staggered) ALU.)

Note that such carry-select adders are not the lowest latency adders. For example, using a carry-lookahead adder instead of a ripple carry adder would reduce the delay in generating the carry bit, providing the select signal for the multiplexers earlier.