A good book that introduces Cortex-M3 based microcontrollers is "The definitve guide to the ARM Cortex-M3, Second Edition" written by Joseph Yiu. It contains both hardware and software explanations.
There's a software interface standard for Cortex-M processors that is called CMSIS (Cortex Microcontroller Interface Software). You can download standard pheripherals librairies from the microcontroller's manufacturer website which contains the CMSIS header files and source files.
You will also need a startup code to initialize your microcontroller before executing the main loop. It is usually provided by the IDE but I don't know if there's Eclipse packages that can do the job for you.
Funny, I use both at work :)
The Cortex-M3 (we use STM32s) is a general purpose MCU that is fast and big (flash storage) enough for most complex embedded applications.
However, the R4 is a different beast entirely - at least the Texas Instruments version I use: the RM42, similar to the TMS570. The RM42 is a Cortex-R4 with two cores running in "lock-step" for redundancy, which means that one core is 2 instructions ahead of the other and is used for some error checking and correction.
Also, one of the cores are (physically) mirrored/flipped and turned 90 degrees to improve radiation/noise resilience :)
The RM42 runs at a higher clock speed than the STM32 (100MHz vs 72MHz) and has a slightly different instruction set and performs some of the instructions faster than the M3 (e.g. division instructions execute in one cycle on the R4, not sure they do on M3).
HW timers are VERY precise compared to Cortex-M3. Usually we need a static offset to correct for drift on the M3s - not so with the R4 :)
Where I'd call a Cortex-M3 a general purpose MCU, I'd call the Cortex-R4 a complex real-time/safety MCU. If I am not mistaken, the RM42 is SIL3-compliant...
IMO the R4 is a big step up in complexity even if you're not planning to actually use the real-time/safety features.
A really nice example of the complexity difference: The SPI peripheral has 9 control and status registers on the STM32 whereas the RM42 has 42. It's like this with all the peripherals :)
EDIT:
For what it's worth, in my use cases the Cortex-R4 @ 100MHz is usually 50-100% faster than the Cortex-M3 @ 72MHz when performing the exact same tasks. Maybe because the R4 has data and instruction caches?
Another comparison, a few 1000 lines of C and ASM code are executed on reset before reaching the call to main()
with the subset of the safety features I currently use :D and not peripheral initialization or anything, just startup and self test (CPU, RAM, Flash ECC etc.).
This page has more details
Best Answer
The CPU should not give you any problems, as you say the instruction set of an M4 is a superset of the M0/M0+ instruction set. Note that the timing might be different, so busy-wait based timing might not work the same.
Peripherals can be a PITA, I would not assume they are the same unless the datasheets read the same.