As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:
struct Point
{
int xcoord;
int ycoord;
};
Now imagine a statement like:
int y = points[i].ycoord;
where points[]
is an array of Point
. Assuming the base of the array is already in EBX
, and variable i
is in EAX
, and xcoord
and ycoord
are each 32 bits (so ycoord
is at offset 4 bytes in the struct), this statement can be compiled to:
MOV EDX, [EBX + 8*EAX + 4] ; right side is "effective address"
which will land y
in EDX
. The scale factor of 8 is because each Point
is 8 bytes in size. Now consider the same expression used with the "address of" operator &:
int *p = &points[i].ycoord;
In this case, you don't want the value of ycoord
, but its address. That's where LEA
(load effective address) comes in. Instead of a MOV
, the compiler can generate
LEA ESI, [EBX + 8*EAX + 4]
which will load the address in ESI
.
LEA
means Load Effective Address
MOV
means Load Value
In short, LEA
loads a pointer to the item you're addressing whereas MOV loads the actual value at that address.
The purpose of LEA
is to allow one to perform a non-trivial address calculation and store the result [for later usage]
LEA ax, [BP+SI+5] ; Compute address of value
MOV ax, [BP+SI+5] ; Load value at that address
Where there are just constants involved, MOV
(through the assembler's constant calculations) can sometimes appear to overlap with the simplest cases of usage of LEA
. Its useful if you have a multi-part calculation with multiple base addresses etc.
Best Answer
One significant difference between
LEA
andADD
on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is,LEA
is processed by (one of) the unit(s) dealing with addressing (which happens at an early stage in the pipeline), whileADD
goes to the ALU(s) (arithmetic / logical unit), and late in the pipeline. That means a superscalar x86 CPU can concurrently execute aLEA
and an arithmetic/logical instruction.The fact that
LEA
goes through the address generation logic instead of the arithmetic units is also the reason why it used to be called "zero-clocks"; it takes no time to execute because address generation has already happened by the time it would be / is executed.It's not free, since address generation is a step in the execution pipeline, but it's got no execution overhead. And it doesn't occupy a slot in the ALU pipeline(s).
Edit: To clarify,
LEA
is not free. Even on CPUs that do not implement it via the arithmetic unit it takes time to execute due to instruction decode / dispatch / retire and/or other pipeline stages that all instructions go through. The time taken to doLEA
just occurs in a different stage of the pipeline for CPUs that implement it via address generation.