Your question is a little confused, but perhaps this will help clear it up.
There are two areas to consider:
- whether the execution core can directly operate on items in memory
- the speed of operations on memory
Small embedded microcontrollers
These small microcontrollers have no external RAM. All RAM is internal, but some of it is used for specific things like registers.
For example, the Microchip PICs you mention have a "W" register. This is just in normal RAM like everything else, but instructions with two operands usually require one of them to be in the W register.
This greatly simplifies the design of the microcontroller at the electronics level and keeps costs/power low. It also has other benefits like predictable timing (in cycles) for instructions.
This is why you will see instructions that load W with a value, operate on it and then copy it back from W to elsewhere in memory. The compiler uses the register because it has to.
Larger processors
Other processors (CPUs) such as x86/64 have external RAM which is a big difference. Notice now a "register" means something very different because we have different types of memory.
External to the CPU is large quantities of RAM, internal to the CPU are a number of smaller blocks of memory. Some of these are storage registers that hold an amount of data, usually the same as the data width of the architecture. So for a 32 bit Intel processor the registers (such as EAX, EBX etc) are 32 bits wide.
These processors have more complicated instructions that can often operate on either registers or external RAM. Data for an instruction does not always need to be in a register. Therefore why would we bother? The answer is speed. Where there is a choice the compiler will use registers to reduce execution time.
These complicated processors have different access times for different types of memory. Registers that are on the CPU die are very quick to access. So if you have a variable which is in constant use throughout some code it makes sense to load it into a register, operate on it repeatedly and then copy it back to external RAM when finished.
Some registers are only legal for a specific access width (ie, -w32 may not be correct), or may not read back the written values which could cause a problem with verification.
There might also be sequence or state restrictions on accessing things.
An option that should work around most conceivable issues would be to craft a tiny program to do the job which would linked to run in RAM. You could substitute the data into its binary after figuring out the offset, upload the modified version, and run it. Or you could have the program get values from a region of RAM outside of the file's extents, which you would set before running. With finer grained control of the stlink you could also pass values in CPU registers, though you might(?) need the alternate open source command line program rather than ST's to do that (this small routine in RAM method is incidentally how that program accomplishes writing to flash)
Best Answer
According to their reference book
I would guess they had plans when making the CRC in silicon to use 4 bytes and in the end only needed the upper 3.