Electronic – How does JTAG program an MCU

flashjtag

How would JTAG program an MCU with flash memory? I realise that this probably varies from chip to chip, but I'm assuming there's some process they all have in common. Specifically, I'm asking in regards to the LPC1768, which doesn't specify how you would do this is the datasheet.

Best Answer

How would JTAG program an MCU with flash memory?

In most MCUs, JTAG is not directly connected to flash. There is actually a stack of access methods, each with their protocol. A debugger / in-system programmer has to "talk" to all of them to actually reach the Flash.

Specifically, I'm asking in regards to the LPC1768

LPCs are ARM Cortex-M based. They use the debug infrastructure from ARM. The path from JTAG to Flash is:

JTAG -> JTAG-DP -> AHB-AP -> Main AHB bus -> Flash

But we will not actually take this direct path. LPCs expose flashing function through IAP.

Let's detail the steps:

JTAG

JTAG is the usual name for a wire protocol that exposes a chain of TAPs (Test Access Ports) through 4 wires (TCK TMS TDI TDO). A JTAG Chain is a big chain of shift registers, with a standardized method for selecting register of each TAP, and accessing register value. TAPs can expose an arbitrary set of registers of arbitrary size.

JTAG-DP (JTAG Debug Port) is a TAP specified by ARM, it mainly uses two 32-bit registers called DPACC and APACC (35 bits actually, because of concatenation with 3 operation bits), allowing access to AP and DP. This is entry point for ARM debug model.

ARM Debug port and Access port

ARM's Debug Port is a gateway to Access Ports. Access ports expose interface to something else. DP can multiplex accesses to 256 APs. Most MCUs in LPCs range contain only one AP, which gives access to internal AHB (Amba Host Bridge) i.e. the internal switching matrix that interconnects the CPU and all other IPs. (Well, actually, AHB-AP is not directly connected to main bus, but goes to a debug bus tightly coupled to CPU, see Cortex design documents for gory details).

ARM's design for Cortex-M debug is memory-based: debugger interface gives access to memory (address+data, read/write, etc.), and CPU debug management (halting, inspecting registers, etc.) is done through memory-mapped registers accessible through the memory interface (See Chap 10 in Cortex-M3 TRM).

Main bus

When we get there, we have access to main memory bus and we can control CPU. Through memory bus, we have access to all internal IPs as if we were running code from CPU.

Chip-specific init

Today, most MCUs involve proper power management. This generally involves two main aspects: power gating (taking off power from parts of the chip) and clock management (oscillator enabling and clock routing).

Most chips do not magically enable power gates and clocks when a debugger is plugged in, so debugger also has to do platform management and perform proper initialization of various MCU IPs before actually reaching the internal Flash.

Flash

So, are we able to talk to Flash IP then ?

Yes, but not efficiently.

If we do all the memory accesses from the external debugger by the book, this will work, but will be extremely slow. The problem we have with JTAG access is it generally involves going through an USB-based probe with a big (~milliseconds) round-trip time. Flash IP accesses usually involve an algorithm like:

  1. Enable write access
  2. Write destination address
  3. Write one data word
  4. Wait for IP to be ready by polling
  5. Trigger write operation
  6. Wait for IP to be ready by polling
  7. Go back to 2, ad libitum

There is too much polling, if we loose a few milliseconds for every iteration of this loop, our few tens of KiB of code will take ages to program. We'll try to eliminate this.

Efficient Flash access

General case is to upload a little program in RAM of MCU able to copy chunks of data from RAM to Flash. Idea is to avoid USB round trips by making big unconditional uploads of data from debugger to RAM (they can be batched in one USB transaction), and let the CPU do the copy to flash (which is generally done word-by-word).

Some manufacturers (either because they want to hide implementations details of their flash IP, or because they want to ease their customer's lives) implement a set of ROM-based routines you can call directly from debugger port to do different kind of tasks, including chip identification, programming, erasing. NXP implements such kind of ROM in the LPC lineup, they call it IAP.

Variations around this pattern

JTAG alternatives exist. ARM has such an option called SWD. SWD exposes the same DP (debug port) and AP (Access port) register model. There exist SWD/JTAG variants (called SWJ-DP) that can dynamically switch from JTAG to SWD and vice-versa.

ARM DP/AP model alternatives exist. Former ARMs have a different model, and every other CPU vendor has its own way of bridging the JTAG (or other debug wire protocol) to internals.

Bridging debug access port to memory is an option, but other vendors make the debug port access CPU registers directly. Then debugger may access memory either by injecting actual CPU instructions in the CPU (like loads and stores), or have special pseudo-registers that trigger memory accesses. Ti CC2xxx and Mips are examples of such architectures.

Some vendors also chose to have a direct path from Debug port to Flash IP, but this is quite uncommon for today's MCUs where we have a debug capability anyway (because it gives indirect access to flash). This used to be common for parts where the internal CPU had no write access to internal flash.