The worst case scenario for a Ripple-Carry Adder (RCA) is when the LSB generates a carry out, and the carry ripples through the entire adder from bit 0 to bit (N - 1). An example pattern would be 00000001 + 11111111. In adder terminology, bits 7-1 are "Propagators", and bit 0 is a "Generator". The critical path is from the carry-out of the LSB to the carry-out of the MSB, and every adder is in the critical path.
The idea behind a Carry-Skip Adder (CSA) is to reduce the length of this critical path by giving the carry path a shortcut if all bits in a block would propagate a carry. A block-wide propagate signal is fairly easy to compute, and each block can calculate its own propagate signal simultaneously. So the worst case is still the same scenario, but what happens looks a bit different.
Lets say we still have the same problem of 0000......001 + 0111.....111. The first block will calculate a carry in the first bit, and will propagate the carry through bits 1, 2, and 3. At this point, the first block carry-out signal is valid. The propagate select signals are already valid, since it is 2-3 gate delays and the carry signal is 4 gate delays. The carry-in multiplexer for bits 8-11 gets the carry signal from the carry-out of bit 3 since bits 4-7 would propagate a carry. Note that this takes 1 gate delay, while a normal RCA would take 4 gate delays. Each block will add 1 gate delay to the carry signal.
If the MSB killed carry propagation, then that would cause the last CSA block to ripple carry the input, which would take another 4 gate delays. This setup of a LSB generate and a MSB kill is the new worst case. The source of the critical path is the same between the RCA and CSA, but the critical path is different.
If an arbitrary block generated a carry by itself, the carry will always propagate to the next block. However, if the second block generates a carry itself, or kills the carry, than that is the end of the critical path. If the second block propagates the carry, then we see the advantage of the CSA architecture.
Also, when the term "critical path" is used, it generally implies that you are considering a set of inputs that will cause the worst-case delay. Your scenarios that you are providing give "ugly" cases that may have large delay, but it isn't the largest delay.
The carry input for each adder subunit (marked "PFA") is located on the bottom of the subunit schematic. It gets injected via the ripple carry subunit, which is duplicated for each adder unit.
Now I see what the issue is.
The reason the CLC has G and P outputs is for cascading into another CLC so that higher-order carries can be looked ahead (lookaheaded?). However:
If there are only four bits in the adder, then the
logic circuit used for C1 can be used to generate C4 from these two outputs; we will
later refer to the C1 logic block as OC (Output Carry) for generating the output
carry from an adder, in this case, C4.
So, you need to duplicate the AND and OR gates at the LSb of the CLC in order to get C4 from G0-3 and P0-3.
Best Answer
The number of full-adders required for a carry look-ahead adder is the same as for a ripple carry adder, and in both cases it is just the number of bits to be added. The difference between these adders is how the carry signal is generated.
The total propagation delay for the ripple carry adder is essentially equal to the number of bits times the delay from carry-in to carry-out for a single full adder. It's harder to calculate the total propagation delay for a CLA adder because you have to know the delay through the CLA logic, which is not directly related to the propagation delay through a full adder cell. The CLA logic is usually implemented as 4-bit slices, so the important parameter would be the delay from carry in to carry out for the slice, multiplied by the number of slices needed.
Of course, these are rough estimates that are generally applicable when the number of bits is large (say 32 or more) and the overall delay is dominated by the carry propagation.