A ICE (In-Circuit Emulator) replaces the target chip. It acts like the real chip to the rest of the circuit, but has all kinds of hooks inside so you can see what's going on, set break points, load new code, grab traces, etc. A ICD (In-Circuit Debugger) uses special debug hardware added to the target chip for that purpose and tries to give you ICE-like capability. Unfortunately, marketing people have gotten envolved and tried to redefine these long standing terms in their attempt to deceive you into thinking their product is better than the next one. Microchip's RealIce is a particularly egregious example of this. It is real, but the one thing it's not is a ICE.
A real ICE (not the RealIce) is the best in-circuit debugging environment. Unfortunately these have pretty much gone away because of the high cost of making a special bondout version of the target chip for use in the ICE, and the fact that speeds have gotten so high that taking anything off chip is problematic. Another problem is that a ICE requires the target chip be in a socket, or requires a special adapter mounted in place of the target chip so the ICE can connect to its lines.
So today we're stuck with ICDs. Fortunately they do most of the things you would want to do with a ICE. They even have one advantage in that the code is running on the real target chip, not something trying to be like the target chip. The downside is that they require on-chip resources so aren't completely transparent to your code and hardware like a ICE is. The ICD needs access to debugging lines, which can often have multiple roles. You can't use those pins in other roles while debugging. The amount of debug circuitry built into each part must be kept to a small fraction of the total else the cost would be too high, so features have to be compromised. One nice feature that would be too expensive to add on every chip is true trace capability, since that requires a large RAM buffer.
Every problem can be eventually solved with a variety of tools. It's not whether you can solve it, but how long and how much effort it takes. When I was regularly using ICEs (Microchip ICE-2000 and ICE-4000), I didn't use the trace feature often, but when I did other means would have been significantly more costly. Sometimes you have a bug where a variable suddenly has the wrong value in it. You step thru the code and everything is fine and the routine that manipulates the variable seems to do everything right, but when you run it eventually things crap out and you find that variable trashed. The cause is some other code with a bad pointer, buffer overflow, stack mismatch, or the like. With a ICE you can set a breakpoint on the variable being changed, then look backwards in the trace buffer to see how code got there and how things got messed up.
Most of the time, a ICD will do well enough. Especially with large chips, the couple pins dedicated to debugging isn't that much of a problem. Nowadays I mostly use the RealIce for debugging. It's a lot more stable and less flaky than the ICD2. You learn to live with it.
It depends on the kind of jtag interface that you have. In my experience, what I've noticed (happens on MSP430 and Atmel.ARM7TDMI) is that when you have watches on variables or breakpoints, or even any kind of control via the debugger, the core is halted periodically to run the boundary scan and all that. This will mess quite extensively with timing. If you have a free timer available, I'd suggest using its interrupt to toggle a pin every few microseconds and see whether this is happening and to what degree. Minimizing the number of breakpoints and watches may help, butI can't be sure of that. In fact, I have a feeling itll be target and IDE dependent also.
Timing issues such as this (RAM access) I'd suggest you investigate with an oscilloscope instead. Jtag is better used with slower events, algorithms, and places where the code can be safely halted.
Best Answer
JTAG cables can be built around all sorts of stuff. Xilinx JTAG cables, for example, have a Cypress chip and an FPGA. Atmel cables generally contain an AVR microcontroller with USB support. They will also usually contain some interface/level translation/protection/isolation components. It really depends on the manufacturer, they're all proprietary and mutually incompatible. Generally you need to have the cable that works with whatever software you need to use. If all you need is OpenOCD, then an FTDI based cable is fine. But if you want to use, say Xilinx ChipScope? Then you need to pay up for either the real thing from Xilinx or a chinese knockoff.
The links you have are not for simple JTAG cables, they are far more specialized. I would personally consider these to be a full-on piece of test equipment. They are basically specialized protocol analyzers. They are designed to interface with specialized trace hardware that is incorporated into the device under test. Trace hardware is distinct from JTAG. It's purpose is to record the complete execution trace of the running software (i.e. all branches taken) across all execution cores and pass it to the external trace collection system (the box in question) over a high speed bus. The trace is then analyzed offline. This is NOT the same as debugging that can be done over JTAG by setting breakpoints and stepping through the code. Trace collection is supposed to be completely transparent to the running program (no breakpoints or added code). Since the processor under test can be executing several hundred million instructions per second, storing the trace as it is produced requires a lot of bandwidth and fast memory. The linked devices support the Aurora protocol (probably among others), which is an 8b/10b encoded high speed serial protocol, somewhat similar to USB 3, serial ATA, serial gigabit/10G ethernet, and PCIe. It's capable of transferring data at 6.25 Gbps, significantly more than what the USB link back to the PC can handle, so the captured data must be stored in onboard RAM for offline analysis. These devices will contain rather high end FPGAs with internal high-speed deserializers to capture the data along with quite a bit (several GB) of fast DRAM, probably DDR2 or perhaps even DDR3.