Well I'll hazard an answer and say that the problem to me looks like output capacitance of the opamp and the board traces. The description of the way the voltage settles to me looks like charging and discharging of a capacitor. Very quickly at first and then slower and slower as the voltage settles. I'd almost describe that as exponential function.
The evaluation board documentation says quite clearly that the minimum load is 1000 ohms and that is reached on the board using 953 ohm resistor which should be placed in series with the 50 ohm resistor of the measurement device.
From the schematic, we see that the resistor R6 "does not apply" to THS4221EVM board, so I'd take that to mean that it isn't there (BOM says so too). The result of that is that we have nothing to drive the output signal to the ground, so what we have is remaining basically a capacitor (Take a look at the PCB layers. Which circuit component consists of two parallel plates? And what do the ground planes and the trace near R6 look like?), which, well, behaves exactly like a capacitor should behave. You have a resistor formed using op-amp's output resistance, resistance of the traces and the R5 through the capacitor consisting of the output trace and ground plane. I'd say that it's capacitance is around 250 pF, but I'm not experienced enough to claim that the number is correct.
Now about the bad performance of the board side: First, what are you going to do with an op-amp whose output is floating? It's basically useless! I'd say it's reasonable to exclude such usage scenario when we're constructing a device.
Cases where the input impedance of the next stage of the circuit is high enough so that it seems that the op-amp is floating are special. High impedance circuits require experience to make properly and countless factors (such as parasitic capacitances and inductances everywhere) need to be taken into account. For that reason, we have the R6 pads. If we do place R6 and connect the op-amp to a high impedance circuit, well have a high impedance in parallel with low impedance. The result which op-amp sees is going to be lower than the lower of the two impedances and the op-amp will be able to swing the voltage quickly, since by decreasing the impedance, we decreased the time constant of the parasitic capacitor circuit we have here. If we placed R6 there all the time, even when the next stage has low input impedance, we'd be providing unnecessarily high load on the op-amp and we'd be wasting power too.
Also note that all characteristics in the datasheet are made with load resistor of 499 ohms, unless otherwise noted (and settling times do use that resistance)!
There are several points in the AD843 datasheet that appear to be relevant.
I first spotted the "Overdrive Recovery" time (under the general category of "Frequency Response"). Note that the recovery time for positive overdrive (found in your negative peak detector) is significantly longer than the recovery time for negative overdrive (found in your positive peak detector). Half a microsecond when your pulse width is on the order of 2.5 µs (half-cycle @ 200 kHz) could be significant.
Second, the datasheet specifically mentions that the AD843 has trouble driving capacitive loads. Your 10 nF capacitor is more than an order of magnitude larger than any example load they mention in the datasheet.
Third, there is a peak detector circuit given in the datasheet. The topology is slightly different from yours, but more significantly, they use the AD843 in the output stage, but use an AD847 in the input stage "since the AD847 can drive an arbitrarily large value of capacitance".
Best Answer