Funny, I use both at work :)
The Cortex-M3 (we use STM32s) is a general purpose MCU that is fast and big (flash storage) enough for most complex embedded applications.
However, the R4 is a different beast entirely - at least the Texas Instruments version I use: the RM42, similar to the TMS570. The RM42 is a Cortex-R4 with two cores running in "lock-step" for redundancy, which means that one core is 2 instructions ahead of the other and is used for some error checking and correction.
Also, one of the cores are (physically) mirrored/flipped and turned 90 degrees to improve radiation/noise resilience :)
The RM42 runs at a higher clock speed than the STM32 (100MHz vs 72MHz) and has a slightly different instruction set and performs some of the instructions faster than the M3 (e.g. division instructions execute in one cycle on the R4, not sure they do on M3).
HW timers are VERY precise compared to Cortex-M3. Usually we need a static offset to correct for drift on the M3s - not so with the R4 :)
Where I'd call a Cortex-M3 a general purpose MCU, I'd call the Cortex-R4 a complex real-time/safety MCU. If I am not mistaken, the RM42 is SIL3-compliant...
IMO the R4 is a big step up in complexity even if you're not planning to actually use the real-time/safety features.
A really nice example of the complexity difference: The SPI peripheral has 9 control and status registers on the STM32 whereas the RM42 has 42. It's like this with all the peripherals :)
EDIT:
For what it's worth, in my use cases the Cortex-R4 @ 100MHz is usually 50-100% faster than the Cortex-M3 @ 72MHz when performing the exact same tasks. Maybe because the R4 has data and instruction caches?
Another comparison, a few 1000 lines of C and ASM code are executed on reset before reaching the call to main()
with the subset of the safety features I currently use :D and not peripheral initialization or anything, just startup and self test (CPU, RAM, Flash ECC etc.).
This page has more details
I think the correct thing to say is that for a given architecture, such as the ARMv7-M architecture of the Cortex-M3 core, the instruction set is the same for all processors. However, the behavior of some instructions may vary because of implementation-defined (i.e. optional) functionality in the processor. Instructions that try to access optional capabilities that are not implemented in a particular processor may cause exceptions.
To find the features that may be implementation defined, search the appropriate ARM Architecture Reference Manual for IMPLEMENTATION, in all capitals.
Best Answer
source