The master-slave arrangement doesn't strictly solve the metastability issue, AFAICT. It is commonly used to cross over between different clock domains of synchronous logic, but I don't quite see what improvement it does on purely asynchronous input (the slave gets a clear state, but it may be derived of a metastable transition anyway). It could simply be an incomplete description, as you could add a hysteresis function by combining the outputs of the two registers.
As for the differences between SR, JK, D or even T flip-flops, it tends to boil down to which inputs are asynchronous. The simplest SR latches do not toggle with S=R=1, but simply keep whichever state was kept last (or in the worst case, oscillate with a gate delay), that's the race. The JK, on the other hand, will transition on the clock edge - synchronous behaviour. It is thus their nature that a T register can only be synchronous, and an asynchronous D latch is transparent while latching. The SR register you describe doesn't have the T function, which can be useful depending on the function. For instance, a ripple counter can be described purely with T registers. Simply put, the JK gives you a complete set of operations (set, clear, toggle, and no-op) without costing an extra control line.
In synchronous logic, we frequently use wide sets of registers to implement a larger function. It doesn't strictly matter there if we use D, T, JK or whatever registers, as we can just redesign the logic function that drives them to include feedback (unless we need to build that logic - i.e. in 74 family logic). That's why FPGAs and such tend to have only D registers in their schematic representations. What does matter is that the register itself introduces the synchronous operation - steady state until the next clock. This allows combining plenty of side-by-side registers or ones with feedback functions.
As for the choice between delayed-pulse and clock-synchronous logic, it's not an automatic one. Some early computers (f.e. PDP-1) and even some highly energy efficient ones (f.e. GreenArrays) use the delayed-pulse design, and it is in fact comparable to a pipelined design in synchronous logic. The Carry-Save adder demonstrates the crucial difference - it's a pipelined design where you actually don't have a known value, not even intermediate, until the pulse from the last new value to enter has come out the other end. If you know at the logic design stage repeated accumulation but only the final sum is used, it may be the best choice. Meanwhile, FPGAs are typically designed with only a few clock nets and therefore do not adapt well to delayed-pulse logic (though it can be approximated with clock gating).
I hope this is more helpful than further confusing... interesting questions!
Internally, a flip-flop (the term includes everything from simple D latches to more complex edge-triggered J-K master-slave flip-flops) is an asynchronous state machine. It is created by combining ordinary logic gates with feedback.
For example, here's one way to construct a master-slave D flip-flop:
simulate this circuit – Schematic created using CircuitLab
Each of the internal sections is a simple set-reset latch with an enable input. Because the two enables are driven with opposite levels of the "CLK" input, the output can only change state on its rising edge.
Note that while this design is conceptually simple to understand, it is NOT typical of how commercial chips (e.g., 7400-series) are constructed internally. If you study SSI/MSI databooks (the older TI books were especially good), you'll see several other ways to construct flip-flops from gates.
Once you have an edge-triggered flip-flop of any sort, you can use it (or multiple copies of it) to create synchronous state machines that only make transitions on clock edges.
Best Answer
The two flip-flops are used to avoid issues with metastability. The idea is that the first flip-flop has some small probability of going metastable, but if it does, it's much less likely that the metastable state will propagate to the second flip-flop.
A "hardened" flip-flop has higher internal gain than an "ordinary" flip-flop, which means that a metastable state should decay more quickly, reducing the chances of propagation. However, I've only ever seen "hardening" discussed in the context of custom IC design, and I'm not aware of hardened devices that are available as discrete devices.
Putting a delay between the flip-flops is actually counterproductive, as this would reduce the window of time for the metastable state to decay.
So, just use two flip-flops of whatever technology best fits in with the rest of your circuit.