What’s the purpose of the CIL nop opcode

assemblybytecodecil

I'm going through MSIL and noticing there are a lot of nop instructions in the MSIL.

The MSDN article says they take no action and are used to fill space if the opcode is patched. They're used a lot more in debug builds than release builds.

I know that these kinds of statements are used in assembly languages to align later instructions, but why are MSIL nops needed in MSIL?

(Editor's note: the accepted answer is about machine-code NOPs, not MSIL/CIL NOPs which the question originally asked about.)

Best Answer

NOPs serve several purposes:

They allow the debugger to place a breakpoint on a line even if it is combined with others in the generated code.
It allows the loader to patch a jump with a different-sized target offset.
It allows a block of code to be aligned at a particular boundary, which can be good for caching.
It allows for incremental linking to overwrite chunks of code with a call to a new section without having to worry about the overall function changing size.

Related Solutions

What’s the purpose of the LEA instruction

As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:

struct Point
{
     int xcoord;
     int ycoord;
};

Now imagine a statement like:

int y = points[i].ycoord;

where points[] is an array of Point. Assuming the base of the array is already in EBX, and variable i is in EAX, and xcoord and ycoord are each 32 bits (so ycoord is at offset 4 bytes in the struct), this statement can be compiled to:

MOV EDX, [EBX + 8*EAX + 4]    ; right side is "effective address"

which will land y in EDX. The scale factor of 8 is because each Point is 8 bytes in size. Now consider the same expression used with the "address of" operator &:

int *p = &points[i].ycoord;

In this case, you don't want the value of ycoord, but its address. That's where LEA (load effective address) comes in. Instead of a MOV, the compiler can generate

LEA ESI, [EBX + 8*EAX + 4]

which will load the address in ESI.

Assembly language and compiled languages

Well, it relates a bit to your question, indeed. The point is that compilers produce inefficient machine code at times for various reasons, such as not being able to completely analyze your code, inserting automatic range checks, automatic checks for objects being null, etc.

On the other hand if you write assembler code by hand and know what you're doing, then you can probably write some things much more efficient than the compiler, although the compiler's behavior may be tweaked and you can usually tell it not to do range checking, for example.

Most people, however, will not write better assembler code than a compiler, simply because compilers are written by people who know a good deal of really weird but really cool optimizations. Also things like loop unrolling are usually a pain to write yourself and make the resulting code faster in many cases.

While it's generally true that everything that a computer executes is machine code, the code that runs differs greatly depending on how many abstraction levels you put between the machine and the programmer. For Assembler that's one level, for Java there are a few more ...

Also many people mistakenly believe that certain optimizations at a higher abstraction layer pay off at a lower one. This is not necessarily the case and the compiler may just have trouble understanding what you are trying to do and fail to properly optimize it.

Best Answer

Related Solutions

What’s the purpose of the LEA instruction

Assembly language and compiled languages

Related Topic