While internally computing all the answers, and then using a mux to select among them will work, it certainly is not a minimal design.
Consider that you can bit-slice the problem; instead of a single block of logic with two 8 bit inputs, you could partition this as two 4-bit sections, as long as you can link them to get a correct overall result. Fortunately, linking the slices is no worse than a single bit, which in the case of addition represents the carry bit. So each 4-bit slice has a carry-in bit and a carry-out bit. (Note that logicals like AND and NOR won't even need this, though if later on you implement left/right shifts, this bit is easily re-purposed).
Carried to an extreme you could use 8 slices of 1-bit each. It's useful to think about the 1-bit slices, because it makes it easier to think about an approach that scales back up to larger slices. So with a 1-bit slice, you have just 7 inputs: the 4 bit function code, a bit from input A, a bit from input B, and a carry-in bit. You also have just two outputs: function out, and carry out. So now you can write the two output functions in terms of just 7 inputs, which is within the realm of human ability to reasonably reduce. You'll end up with a handful of gates that won't necessarily always compute all the functions, but it doesn't matter what happens within the slice, only that it produces the correct result when viewed from outside.
Now you can go a couple of ways. One way is to simply use 8 of these 1-bit slices and you're done. Another way is to make larger slices and then use those. Going from 1-bit to 2-bits, the equations go from 7 inputs to 9, and 4-bits will require functions of 13 inputs. It's not necessarily easy, but will give more compact results than the compute-everything-then-mux approach. Besides, if you look at the internals of a 74181 4-bit ALU slice, you won't see a mux in there.
It is possible that there is a "clock" line which must get an impulse for each of the 4 carry full adders until the right result is available at the output.
If you have a clock pin somewhere feed it manually with at least 5 impulses or connect it to a continuous clock signal. Let the input lines be static and check if the carry is correct...
Best Answer
There are two common ways for flags to influence instructions:
influence on the control flow, in the form of a condition skip (eg. skip the next instruction if the carry flag is set), or conditional jmp/goto (conditional call and condtional return are more rare but do exist)
influence on the data, which is most common in shifts (shift in from the carry bit, and/or shift out to the carry bit), an add-with-carry instruction (and likewise subtract-with-carry) which are both very usefull in implementing shits, add, and substract on data that is larger than the natural size of the CPU
The ARM achitecture took the first idea to the extreme: every instruction had a condition field that specified under which flag settings the instruction was executed. If the flag settings were different, the instruction had no effect (executed as a NOP). This avoided a lot of branches (both conditional and non-conditional) in the control flow, which is important for a pipelined CPU.