Let's say that we want to do a good job of testing this, but without going through the entire 2^32 space of possible operands. (It is not possible for such adder to have such a bug that it only affects a single combination of operands, requiring an exhaustive search of the 2^32 space, so it is inefficient to test it that way.)
If the individual adders are working correctly, and the ripple propagation between them works correctly, then it is correct.
I would giver priority to some test cases which focus on stressing the carry rippling, since the adders have been individually tested.
My first test case would be adding 1 to 1111..1111 which causes a carry out of every bit. The result should be zero, with a carry out of the highest bit.
(Every test case should be tried over both commutations: A + B and B + A, by the way.)
The next set set of test cases would be adding 1 to various "lone zero" patterns like 011...111, 1011...11, 110111..111, ..., 1111110. The presence of a zero should "eat" the carry propagation correctly at that bit position, so that all bits in the result which are lower than that position are zero, and all higher bits are 1 (and, of course, there is no final carry out of the register).
Another set of test cases would add these "lone 1" power-of-two bit patterns to various other patterns: 000...1, 0000...10, 0000...100, ..., 1000..000. For instance, if this is added to the operand 1111.1111, then all bits from that bit position to the left should clear, and all the bits below that should be unaffected.
Next, a useful test case might be to add all of the 16 powers of two (the "lone 1" vectors), as well as zero, to each of the 65536 possible values of the opposite operand (and of course, commute and repeat).
Finally, I would repeat the above two "lone 1" tests with "lone 11": all bit patterns which have 11 embedded in 0's, in all possible positions. This way we are hitting the situations that each adder is combining two 1 bits and a carry, requiring it to produce 1 and carry out 1.
A system of ripple-carry adders is a sequence of standard full adders that makes it possible to add numbers that contain more bits than that of a single full adder. Each full adder has a carryin (Cin) and a carryout (Cout) bit, and the adders are connected by connecting Cout on step k to Cin on step k+1 (Cin on step 0 is C0 in the picture, Cout on step 3 is C4 in the picture)
The challenge with ripple-carry adders, is the propagation delay of the carry bits. Assume that, in an instant the values of A and B change, such that
A1 = 0
B1 = 1
A0 = 1
B0 = 1
Since A0 and B0 are high, the first full adder will produce a carry, i. e. C1 = 1. However, it takes some time for the logic to settle down, so C1 doesn't change until a little after A1 and B1 changed. Thus, before C1 shows up, the second full adder does not produce a carry, but as C1 shows up, the second adder will recompute and produce a carry, i. e. C2 = 1. In the worst case, C4 is not correctly computed until 4*propagation delay, and Cn is not computed until n*propagation delay.
A carry-lookahead adder system solves this problem, by computing whether a carry will be generated before it actually computes the sum. There are multiple schemes of doing this, so there is no "one" circuit that constitutes a look-ahead adder. The idea is something like this:
The calculation of C4 is no faster than in the the ripple-carry above, nor is PG and GG - the magic only happens when you put several of these blocks together to add even larger numbers.
The important to note part of the picture, is that the purple block is producing three values: C4, PG (Propagate) and GG (Generate). PG goes high if this block will propagate Cin to Cout, and GG goes high if the block will generate an overflow regardless of Cin. (Also, the block may neither propegate nor generate a carry, in which case both PG and GG are low, and Cout is 0.) PG and GG can be calculated in the purple block regardless of the value of C0 - thus, when C0 finally arrives, the purple block can simply consult its previously calculated result, and if the result is a "propagate," then C0 is propagated directly to C4; this is four times faster than propagating through all the four full adders.
The reason why the block has the outputs PG and GG is so that, in a hierarchal fashion, we can acquire even greater propagation speedups.
Also see: http://faculty.kfupm.edu.sa/COE/abouh/Lesson3_3.pdf
Best Answer
This is a carry select adder. The speed benefit comes from a 2:1 multiplexer selecting M bits being faster than a ripple carry adder of M bits. (Each bit from the multiplexer is only dependent on sum bit and the select bit (i.e., the carry-in), whereas the more significant bits of a ripple carry adder are dependent on the carry-out from the previous bits. The multiplexer delay would be the delay of a one-bit multiplexer plus the 1:M fan-out delay for the carry-in from the previous block into M bits.)
The more significant chunks of M bits are added in parallel with the first M bits, so their delay is zero (the additions will be completed at that same time that the first M bit addition is completed since the inputs are available at the same time and the adders are the same).
Since the latency of a simple carry select adder is the sum of the multiplexer latencies and the latency of the M-bit ripple carry adder, making M larger can reduce the number of multiplexers but increases the ripple carry adder delay. For a 64-bit adder, this latency would be Lrca + ((64/M)-1) * Lmux
In addition, since the multiplexer adds some delay (and the fan-out delay for the multiplexer does not increase linearly with M — e.g., a 7-bit multiplexer would not be 16.7% slower than a 6-bit multiplexer), ripple carry adders for more significant bits may be able to use more bits without adding delay.
Ignoring the complexities of real implementations such as size-dependent multiplexer delay, if the a one bit full adder has N times the latency of a multiplexer, then a minimum-latency 64-bit carry select adder with all stages having the same size would balance ripple carry delay (Lrca = N * M * Lmux) and multiplexer delays ((64/M -1) * Lmux). Finding N * M * Lmux = (64/M -1) * Lmux is solving the quadratic equation N * M2 + M - 64 = 0. If N = 2 (a one-bit full adder has twice the latency of a multiplexer), M = 5.41. In this case M = 5 and M = 6 both give a total latency of a 64-bit adder of 22 multiplexer delays (13 stages of 5 bits: 2 * 5 + 12; 11 stages of 6 bits: 2 * 6 + 10) or 11 one-bit adder delays.
Under the same assumptions but allowing adders of different widths, the widths would be 5, 5, 5, 6, 6, 7, 7, 8, 8, 7. (The second stage is the same width as the first stage, the third stage could be 5.5 bits but that is not an integer number of bits.) This would give a total delay 19 multiplexer delays (10 stages with an initial 5 bit ripple carry adder delay: 2 * 5 + 9) or 9.5 one-bit adder delays.
If the latency of a one-bit full adder equaled the multiplexer latency (or more correctly the latency of a M-bit carry ripple adder equaled M times the latency of a M-bit 2:1 multiplexer including the fan-out delay), then the quadratic equation to solve would be M2 + M - 64 = 0; M = 7.52, so 8-bit stages would be minimal latency (8 + 7 = 15 multiplexer delays). As the number of bits in the entire adder approaches infinity, the ideal M for a simple carry select adder approaches the square root of the number of bits (the linear term becomes insignificant).
(The multiplexer delay might also be exploited to allow more significant bits to arrive slightly later. Bits for the third stage could arrive one multiplexer delay later without increasing the latency of the 64-bit adder. A similar technique, width pipelining, divides the adder into multiple pipeline stages but with single cycle latency for dependent operations if they can start with just the least significant bits of the operands. Intel's Pentium 4 implemented such a width-pipelined (also called staggered) ALU.)
Note that such carry-select adders are not the lowest latency adders. For example, using a carry-lookahead adder instead of a ripple carry adder would reduce the delay in generating the carry bit, providing the select signal for the multiplexers earlier.