I have the following byte code one the left and and its byte representation on the right:
mov eax, 0x1 ; 0: b8 01 00 00 00
mov ebx, 0x2 ; 5: bb 02 00 00 00
add eax, ecx ; a: 01 c8
I am not sure that I understand correctly how CPU loads this byte code from cache to registers.
Here is my vision of the process. Let's assume that this byte code already in L1 cache.
As far as I understand:
- CPU reads single byte
b8
and understands that this opcode means for him that he needs to load to theEAX
register very next 32 bits [quick question: does CPU load this byte to the register or not?] - Therefore, CPU loads from cache to
EAX
next 4 bytes01 00 00 00
- CPU reads single byte
bb
and understands that this opcode means for him that he needs to load to theEAX
register next 32 bits - Therefore, CPU loads from cache to
EBX
very next 4 bytes02 00 00 00
- CPU reads single byte
01
and understands that he needs to read one more byte to figure out what registers should be sumed. - So he reads the very next byte
c8
and understands that he needs to sum values inEAX
andEBX
registers
If my idea is correct it means that CPU reads from cache not only 32 bit word per read, but although 1 byte per read. Am I correct?
If not, please provide me explanation how CPU executes this hex code.
Best Answer
Most real world CPUs today fetch multiple bytes from the instruction cache at the same time (many cases, up to a full cache line, which is 64 bytes long). Multiple parallel decode engines then figure out the instruction boundaries of 1 or more instructions in that single cache read and proceed to issue several instructions in the downstream stages of scheduling and execution.
In other words, cache reads, decode, issue, scheduling and execution happens for multiple instructions at any given time for most superscalar processors (which most of mainstream processors are).