I've been reading about CPUs and how they are implemented, and some big complex architectures (looking at you x86) have instructions that load from memory during one clock cycle. Since one address points to a single byte, how is it possible that I can write:
mov eax, DWORD PTR ds:[esi]
where I'm loading a double word (4 bytes!) from memory and chucking it into eax
. How does this work with only one clock cycle? Wouldn't it have to access 4 addresses? The DWORD starts from ds:[esi]
and ends up at [ds:[esi] - 3]
meaning it has to compute 4 effective address, but it does it in one cycle.
How?
Thanks
Best Answer
Because the width of the data bus and the size of the smallest addressable unit are two separate things.
Just because you can specify addresses at the byte level, does not mean you have to have an 8 bit data bus. Most (possibly all) modern x86 processors use a 64 bit data bus and every time they read from memory, they read 64 bits. If you only requested 8 bits, the excess is simply discarded.
If you request more than 64 bits (for example, if loading 128 bit SSE registers), then there will be multiple memory accesses.
Many processors also have a concept of alignment, which basically means that every memory access is on a address divisible by the data bus width. Most can still fetch unaligned memory, but if it crosses an alignment boundary (for example, requesting 32 bits at address 0xFC on a 64 bit aligned system), you'll get multiple memory accesses, even if it would otherwise fit in the data bus.
Here's a few other notes regarding some aspects of your question: