Let's say that we want to do a good job of testing this, but without going through the entire 2^32 space of possible operands. (It is not possible for such adder to have such a bug that it only affects a single combination of operands, requiring an exhaustive search of the 2^32 space, so it is inefficient to test it that way.)
If the individual adders are working correctly, and the ripple propagation between them works correctly, then it is correct.
I would giver priority to some test cases which focus on stressing the carry rippling, since the adders have been individually tested.
My first test case would be adding 1 to 1111..1111 which causes a carry out of every bit. The result should be zero, with a carry out of the highest bit.
(Every test case should be tried over both commutations: A + B and B + A, by the way.)
The next set set of test cases would be adding 1 to various "lone zero" patterns like 011...111, 1011...11, 110111..111, ..., 1111110. The presence of a zero should "eat" the carry propagation correctly at that bit position, so that all bits in the result which are lower than that position are zero, and all higher bits are 1 (and, of course, there is no final carry out of the register).
Another set of test cases would add these "lone 1" power-of-two bit patterns to various other patterns: 000...1, 0000...10, 0000...100, ..., 1000..000. For instance, if this is added to the operand 1111.1111, then all bits from that bit position to the left should clear, and all the bits below that should be unaffected.
Next, a useful test case might be to add all of the 16 powers of two (the "lone 1" vectors), as well as zero, to each of the 65536 possible values of the opposite operand (and of course, commute and repeat).
Finally, I would repeat the above two "lone 1" tests with "lone 11": all bit patterns which have 11 embedded in 0's, in all possible positions. This way we are hitting the situations that each adder is combining two 1 bits and a carry, requiring it to produce 1 and carry out 1.
There are flip-flops and there are flip-flops.
The RTL (resistor-transistor logic) schematic you show is a simple bistable multivibrator that is either set or reset by pulses on the E1 and E2 inputs. For exmaple, pulsing E1 high will cause A1 to go low and A2 to go high.
"Elements of Computing Systems" is talking about a different kind of flip-flop: the master-slave edge-triggered flip-flop. Rather than being driven by pulses, this kind of flip-flop reacts to the rising edge of a (typically) square-wave clock signal. The output of the flip-flop immediately after such a clock edge matches the input right before that same clock edge. This is where the t and t-1 notation comes from.
In its easiest to understand form, the D-type master-slave flip-flop consists of eight NAND gates (or eight NOR gates in RTL) and two inverters. As you might guess, this gets cumbersome to draw as a schematic using resistors and transistors. It's much easier to draw the schematic for one gate, and then use a symbol to represent that logical function in higher-order structures.
However, in the days when computers were really built using discrete transistors, master-slave logic was relatively rare. Instead, multi-phase clocks were generated so that the simpler pulse-driven flip-flops could be used, keeping the overall circuit complexity down.
Best Answer
Part of the disconnect may be coming from the "H-Tree" clock distribution network being a physical layout concept in addition to a schematic level concept. The key part of the H-tree is that since every branch is physically similar, and has the same number of clock loads, that the actual clock skew will be minimized. Another key point is that you would include inverters even if the outputs are not used just to keep the distribution network balanced.
You need clock buffering on anything other than a small design, as each flip-flop clock pin will "slow down" the clock transition time to a point that your flip-flop runs slower and has a more ambiguous switching time. Significant clock skew in your design can cause you to pull your hair out during static timing analysis and verification.
If I were implementing this design in an ASIC, I might instruct the place and route tool to generate the clock network with a "H-tree" layout. If you have to turn in a (hand designed) physical layout, then you have a guideline to work from. If you just have to turn in a schematic... I guess show the clock buffer tree.