Since this is an electrical engineering site, I will attempt to give a lower level perspective.
Any device, such as a hard disk controller, must have some way of transferring data. This could be a parallel data bus or a serial one, it could be through radio waves or it could even be through human read LEDs and human set DIP switches (thus not necessarily needing a CPU at all). So then, the question comes up: How is the best way to connect a device to the CPU? You could make the device look for a specific address on the CPU address bus and then it would send or recieve data only when the CPU reads or writes on that address. This would skip RAM entirely, allowing the CPU to copy data directly from the device to a processor register. Of course in your architecture you would have to make sure that there were no conflict and that RAM doesnt also respond to the request generating a bus collision. To avoid that, you could add a special IO line in your CPU architecture that is asserted to read and write from devices and disables the RAM. You could even add a special IO bus to the CPU architecture if you wanted.
This was actually how many devices were connected to the CPU in the early days (ie 1970s and 80s). Consider, however, a hard drive and its typical use. Typically a processor manipulates data that is stored in registers, but registers are not very big so for large amounts of data RAM is used as a high speed place to store data. The hard drive would be too slow for such use. If this data needs to be stored permenantly, however, then the data must be transferred from RAM to the hard drive. Using the method I specified above, the CPU would have to be constantly tied up transferring byte per byte the data it wants to store in the hard drive. This turned out to be quite painful in the early days, so a mechanism called Direct Memory Access (DMA) was created to allow devices to read and write from RAM directly. The CPU could simply send a command to the hard drive controller, for example, to read some section of the hard drive and put it in RAM or vice versa. The CPU then would be free to do other things while the HD did its business. The transfer was also alot faster than the old CPU based method.
So to answer your question: Are there other ways to connect devices? Absolutely! But you'll find that the fastest way to tranfer large amounts of data from a device is not through the Processor, but directly from the device to RAM, therefore most modern devices use RAM as an intermediary store.
Your question is a little confused, but perhaps this will help clear it up.
There are two areas to consider:
- whether the execution core can directly operate on items in memory
- the speed of operations on memory
Small embedded microcontrollers
These small microcontrollers have no external RAM. All RAM is internal, but some of it is used for specific things like registers.
For example, the Microchip PICs you mention have a "W" register. This is just in normal RAM like everything else, but instructions with two operands usually require one of them to be in the W register.
This greatly simplifies the design of the microcontroller at the electronics level and keeps costs/power low. It also has other benefits like predictable timing (in cycles) for instructions.
This is why you will see instructions that load W with a value, operate on it and then copy it back from W to elsewhere in memory. The compiler uses the register because it has to.
Larger processors
Other processors (CPUs) such as x86/64 have external RAM which is a big difference. Notice now a "register" means something very different because we have different types of memory.
External to the CPU is large quantities of RAM, internal to the CPU are a number of smaller blocks of memory. Some of these are storage registers that hold an amount of data, usually the same as the data width of the architecture. So for a 32 bit Intel processor the registers (such as EAX, EBX etc) are 32 bits wide.
These processors have more complicated instructions that can often operate on either registers or external RAM. Data for an instruction does not always need to be in a register. Therefore why would we bother? The answer is speed. Where there is a choice the compiler will use registers to reduce execution time.
These complicated processors have different access times for different types of memory. Registers that are on the CPU die are very quick to access. So if you have a variable which is in constant use throughout some code it makes sense to load it into a register, operate on it repeatedly and then copy it back to external RAM when finished.
Best Answer
There is throughput and latency.
On very simple, slow cores, the cache runs at the same speed as the CPU and can provide data in 1 cycle, so data is available immediately without stalling. When there is a cache miss, data is taken from main memory, and initial latency can be over 10 cycles. The good thing is that once the first data is available, the following data can be obtained quickly, hence the idea of burst transfers and cache fills. The CPU only needs a byte, or a 32bits word, but 32 or 64 bytes are transferred at once from memory to the cache.
On more advanced CPUs, the ones with L1, L2, DRAM and gigahertz clock, even the L1 cache contents cannot be obtained immediately. For instruction, there are mechanisms for predicting the instruction flow and fetching instructions in advance : Continuously fetch consecutive addresses, unless the instruction is a branch, a call,... For data, it is more complex. Using pipelining, some CPUs are able to have several outstanding data transfers before stalling. The real current solution for mitigating long latencies is out of order execution, the CPU does as much work as possible, even executing instructions not in program order, in order to hide the long latency of instructions like data reads and writes.