They're not quite the same. The registers are the places where the values that the CPU is actually working on are located. The CPU design is such that it is only able to actually modify or otherwise act on a value when it is in a register. So registers can work logic, whereas memory (including cache) can only hold values the CPU reads from and writes to.
Imagine a carpenter at work. He has a few items in his hands (registers) and then, very close by on his workbench (cache) things he is frequently working on, but not using right this moment, and then in the workshop (main memory) things that pertain to the project at hand but that are not immediately important enough to be on the workbench.
EDIT: Here's a simple explanation for how register logic works.
Let's imagine we have four registers named R1..R4. If you compile a statement that looks like this:
x = y + z * 3;
the compiler would output machine code that (when disassembled) looks something like this:
LOAD R1, ADDRESS_Z //move the value of Z into register 1
MUL R1, 3 //multiply the value of register 1 by 3
LOAD R2, ADDRESS_Y //move the value of Y into register 2
ADD R1, R2 //adds the value in R2 to the value in R1
STORE R1, ADDRESS_X //move the value of register 1 into X
Since most modern CPUs have registers that are either 32 or 64 bits wide, they can do math on any value up to the size they can hold. They don't need special registers for smaller values; they just use special ASM instructions that tell it to only use part of the register. And, much like the carpenter with only two hands, registers can only hold a small amount of data at once, but they can be reused, passing active data in and out of them, which means that "a lot of registers" don't end up being needed. (Having a lot available does allow compilers to generate faster code, of course, but it's not strictly necessary.)
It depends.
Some (CISC) CPUs have byte-wise loads that can address individual bytes so the byte of interest is the low-order 8-bits on the bus; the rest of the bits are masked off.
Many RISC CPUs will do word-load, barrel shift, while others will do word-load, bit shift and in the middle, are ones that do word-load, byte shift.
Some CPUs will do consecutive word-loads when a two-byte value spans a 32-bit boundary, shifting and masking the words together.
CPU families may do different implementations depending on the particular processor model. That explains why there is no description of the implementation; it's a decision only the vendor cares about.
As for performance, you will just have to test it on the particular CPU and memory configurations you care about.
Best Answer
Note: since you did not mention which instruction set architecture you are asking about, I have to make some assumptions and guesswork in my answer. Also, it looks like the textbook or the learning material refers to an architecture that is not the same as today's "desktop CPUs", so please cite the name of the textbook or learning material so that we know what architecture it is referring to.
Without further information, my internet search seems to indicate it may be referring to this book: Microprocessors and Microcontrollers: Architecture, Programming and System Design 8085, 8086, 8051, 8096
The
I
inMVI
refers to immediate. In assembly programming, an "immediate value" is a value that is directly encoded into the instruction itself.For simplicity I will first focus on the case of architectures having a fixed instruction size, say, each assembly instruction is 32-bit.
Part of the 32-bit is used to store the opcode that specifies the operation, such as addition, subtraction, load from memory, store to memory, branching, etc.
MVI
is the mnemonic for an opcode that will set a particular register to a particular value.The remaining bits of the 32-bit instruction are used for purposes depending on what the opcode is. Different opcodes use those remaining bits in different ways.
For
MVI
, some of the remaining bits specify which of the CPU register will be updated with that "immediate value". The rest of the bits are used to encode this immediate value.I should emphasize that, typically, the CPU does not need to make an additional memory request to fetch this immediate value. The reason is that the CPU has already loaded the entire instruction (32-bit) from memory, before it can perform instruction decode step on it. Thus, the opcode, the register identifier, and the immediate value are all loaded into the instruction decoder.
The instruction decoder can pass this immediate value into the arithmetic and logic unit (ALU) via one of the ALU input ports. The ALU will be set to perform nothing - pass the same input value as output. The register file is configured to accept this value from the ALU output port, and store it into the destination register, according to the instruction decoder's parsing of the
MVI
instruction.The immediate value is technically part of the instruction.
Similar instructions exist across a lot of architectures, though details might differ.
In some cases, the architecture word size is 32-bit, meaning that the ALU and memory operations are 32-bit wide, and the instruction size is also fixed at 32-bit. Since the opcode and the register identifier took up some of the instruction bits, it is not possible for the immediate value to be 32-bit. Instead, the immediate value is limited to fewer number of bits. For example, if opcode is 6-bit and the register identifier is 5-bit (supporting up to 32 registers), the remaining number of bits available for the immediate value is
32 - 6 - 5 == 21
bits. Depending on the architecture, this 21-bit immediate value might be interpreted as signed or unsigned.In some other architectures, the immediate value is not packed into the instruction itself, but is stored immediately next to it.
|| Address | Instruction data || || 0 | MVI || || 2 | 0x1234 || || 4 | whatever instruction that follows ||
For these architectures, it might require an additional memory access. The
MVI
instruction at address0
causes the instruction decoder to treat the data at address2
not as instruction, but as the value for theMVI
instruction.Although this design allows the
MVI
instruction to load the full-width data (16-bit) into the register, notice that it creates a danger for a branch instruction to specify address2
as the jump target. Since the data can be any arbitrary 16-bit value, a branch instruction that lands at this address will mis-interpret that data as an instruction opcode (to whatever opcode that happens to have the same bit patterns as that 16-bit value), and therefore will execute an arbitrary instruction that is unintended by the assembly language programmer.In yet some other architectures, it is possible that they do not have the equivalent of
MVI
at all. Instead, the value must be loaded from memory, typically named aLD
(load).|| Address | Data / Instruction || || 0 | LD R1, (address) 128 || || 2 | whatever instruction that follows || || ... | ... || || 128 | 0x1234 ||