How do we go from assembly to machine code(code generation)

assemblyccode generationcompiler

Is there an easy way to visualize the step between assembling code to machine code?

For example if you open about a binary file in notepad you see a textually formatted representation of machine code. I assume that each byte(symbol) you see is the corresponding ascii character for it's binary value?

But how do we go from assembly to binary, what's going on behind the scenes??

Best Answer

Look at the instruction set documentation, and you will find entries like this one from a pic microcontroller for each instruction:

example addlw instruction

The "encoding" line tells what that instruction looks like in binary. In this case, it always starts with 5 ones, then a don't care bit (which can be either one or zero), then the "k"s stand for the literal you are adding.

The first few bits are called an "opcode," are are unique for each instruction. The CPU basically looks at the opcode to see what instruction it is, then it knows to decode the "k"s as a number to be added.

It's tedious, but not that difficult to encode and decode. I had an undergrad class where we had to do it by hand in exams.

To actually make a full executable file, you also have to do things like allocate memory, calculate branch offsets, and put it into a format like ELF, depending on your operating system.