The master-slave arrangement doesn't strictly solve the metastability issue, AFAICT. It is commonly used to cross over between different clock domains of synchronous logic, but I don't quite see what improvement it does on purely asynchronous input (the slave gets a clear state, but it may be derived of a metastable transition anyway). It could simply be an incomplete description, as you could add a hysteresis function by combining the outputs of the two registers.
As for the differences between SR, JK, D or even T flip-flops, it tends to boil down to which inputs are asynchronous. The simplest SR latches do not toggle with S=R=1, but simply keep whichever state was kept last (or in the worst case, oscillate with a gate delay), that's the race. The JK, on the other hand, will transition on the clock edge - synchronous behaviour. It is thus their nature that a T register can only be synchronous, and an asynchronous D latch is transparent while latching. The SR register you describe doesn't have the T function, which can be useful depending on the function. For instance, a ripple counter can be described purely with T registers. Simply put, the JK gives you a complete set of operations (set, clear, toggle, and no-op) without costing an extra control line.
In synchronous logic, we frequently use wide sets of registers to implement a larger function. It doesn't strictly matter there if we use D, T, JK or whatever registers, as we can just redesign the logic function that drives them to include feedback (unless we need to build that logic - i.e. in 74 family logic). That's why FPGAs and such tend to have only D registers in their schematic representations. What does matter is that the register itself introduces the synchronous operation - steady state until the next clock. This allows combining plenty of side-by-side registers or ones with feedback functions.
As for the choice between delayed-pulse and clock-synchronous logic, it's not an automatic one. Some early computers (f.e. PDP-1) and even some highly energy efficient ones (f.e. GreenArrays) use the delayed-pulse design, and it is in fact comparable to a pipelined design in synchronous logic. The Carry-Save adder demonstrates the crucial difference - it's a pipelined design where you actually don't have a known value, not even intermediate, until the pulse from the last new value to enter has come out the other end. If you know at the logic design stage repeated accumulation but only the final sum is used, it may be the best choice. Meanwhile, FPGAs are typically designed with only a few clock nets and therefore do not adapt well to delayed-pulse logic (though it can be approximated with clock gating).
I hope this is more helpful than further confusing... interesting questions!
Best Answer
A race condition is a timing-related pheonomenon. A standard S-R FF (two cross-coupled NAND or NOR gates) is stable for any stable input.
The 'fun' is in the S=1 R=1 input, the memory situation. The state of the FF depends on which state came before the 11, if it was 01 the FF is in Q=1 state, if it was 10 the FF is in the Q=0 state. This is the classical memory effect of a FF.
But if it was 00 and both inputs changed to 1 suffiently close to each other in time, the FF can enter a metastable state, which can last significantly longer than the delay time of the gates. In this state the outputs can either slowly drift towards their final sate, or show a damped oscillation before settling on the final state. The time required to settle is unbounded, but has a distribution that quickly falls off for t >> gate-delay.
In normal operation, from 00 input, one input becomes 1, and the feedback loop in the flipflop propagates this (or rather, the remaining 0 input) through both gates, until the FF is in a stable state. When the other input also turns 1 while the propagation from the first is still taking place, that also starts to propagate, and it is anyone's guess which one will win. In some cases neither wins immediatyely, and the FF enters the metastable state.
The race condition is that, from a 00 input state, one input changes to 0, and the second one also changes to 0 before the effect of the first change has setteled. Now the effects of the two changes are 'racing' for priority.
The explanation stated is for a simple Set-Reset FF (or latch, or how you want to call it). A level-triggered circuit (I would call that a Latch) can be thought of as a RS-FF with both inputs gated by the enable input (CLK in this diagram):
In this circuit, a simulatenous 00 -> 11 transition of the hidden 'inputs' of the cross-coupled NANDS still causes a race condition. Such a transition can occur (due to the delay caused by the inverter) when the D input changes simultaenously with the CLK input changing from 1 to 0.
A real clocked (edge-triggered) memory circuit can be thought of consisting of two latches, enabled by the opposite clock levels (master-slave arrangement). Obviously the first latch is still susceptible to the same race condition.
PS googling for the appropriate pictures I got them from How 1-bit was stored in Flip flop? :)