The master-slave arrangement doesn't strictly solve the metastability issue, AFAICT. It is commonly used to cross over between different clock domains of synchronous logic, but I don't quite see what improvement it does on purely asynchronous input (the slave gets a clear state, but it may be derived of a metastable transition anyway). It could simply be an incomplete description, as you could add a hysteresis function by combining the outputs of the two registers.
As for the differences between SR, JK, D or even T flip-flops, it tends to boil down to which inputs are asynchronous. The simplest SR latches do not toggle with S=R=1, but simply keep whichever state was kept last (or in the worst case, oscillate with a gate delay), that's the race. The JK, on the other hand, will transition on the clock edge - synchronous behaviour. It is thus their nature that a T register can only be synchronous, and an asynchronous D latch is transparent while latching. The SR register you describe doesn't have the T function, which can be useful depending on the function. For instance, a ripple counter can be described purely with T registers. Simply put, the JK gives you a complete set of operations (set, clear, toggle, and no-op) without costing an extra control line.
In synchronous logic, we frequently use wide sets of registers to implement a larger function. It doesn't strictly matter there if we use D, T, JK or whatever registers, as we can just redesign the logic function that drives them to include feedback (unless we need to build that logic - i.e. in 74 family logic). That's why FPGAs and such tend to have only D registers in their schematic representations. What does matter is that the register itself introduces the synchronous operation - steady state until the next clock. This allows combining plenty of side-by-side registers or ones with feedback functions.
As for the choice between delayed-pulse and clock-synchronous logic, it's not an automatic one. Some early computers (f.e. PDP-1) and even some highly energy efficient ones (f.e. GreenArrays) use the delayed-pulse design, and it is in fact comparable to a pipelined design in synchronous logic. The Carry-Save adder demonstrates the crucial difference - it's a pipelined design where you actually don't have a known value, not even intermediate, until the pulse from the last new value to enter has come out the other end. If you know at the logic design stage repeated accumulation but only the final sum is used, it may be the best choice. Meanwhile, FPGAs are typically designed with only a few clock nets and therefore do not adapt well to delayed-pulse logic (though it can be approximated with clock gating).
I hope this is more helpful than further confusing... interesting questions!
They are asynchronous PRESET and CLEAR (active low). In this setup, there is the constraint that PRE and CLR cannot both be active (low) at the same time (or else both Q and Q' are 1). You could give one priority if you wanted by modifying the topology a bit.
Additionally, if either PRE or CLR are active (low) at the rising clock edge, the output states will not necessarily be inverted (as you pointed out). But, because the edge detector pulse is narrow, the PRE or CLR will quickly propagate through Q and Q' after the edge detector pulse ends assuming they are held through the length of the pulse.
In essence, whichever 'holds' longer will win: either the data will pass through if the edge detector pulse stays active longer than the PRE/CLR signals stay active (low), or the PRE/CLR signals will stay active longer than the edge detector pulse and over-write whatever D put in there.
In practice, these constraints would be represented the library characterization files. There would be a setup and hold arc defining the timing between the clock, d, PRE, and CLR to prevent any unwanted states.
Or, if the circuit was used in a more custom way, its designers would need to make sure they understood the operation of the pulse latch (not really a flip flop, imo) and how to properly enable or reset it.
You can tell the PRE and CLR are asynchronous easily by looking at the signal flow. PRESET and CLR pass to the output without any gating by the CLK signal (which would make them synchronous).
To prove this to yourself, assume the clock is not toggling, and both PRESET and CLEAR are '1' (inactive). Also, this circuit cannot have both PRE and CLR low at the same time.
Also assume the initial states for Q and Q':
PRE = 1, CLEAR = 1
Q = 1, Q' = 0
As long as you don't touch anything, everything will stay as it is (latched).
Now, pull CLR down to '0' without toggling the clock or data.

As shown in the image above, this clears Q from '1' to '0'. And, clock has not toggled. This means the circuit is asynchronous. From here the CLR signal can be inactivated (returned high) and the circuit will still hold its state.
An easy well to tell is the gating of the clock relative to the clear/enable/reset signals.
Best Answer
My guess is that the pulses are too short for the logic analyzer to catch them. Logic analyzers sample the signals at a constant rate, and if the time between samples is longer than your pulse width then the logic analyzer simply won't see it.
Try decreasing the sampling time significantly, or switch to an oscilloscope.