Low-Level – Register-Machine Bytecode for Specific Code

low-levelvirtual machine

I hope this kind of question isn't off-topic on this site.

I'm finally getting the hang of what a stack based machine is, and how to compile code for it. For example, the following code: 2 * 5 + 1 would compile to this bytecode:

push 2
push 5
mult
push 1
add

However what I want to understand is, the fundamental differences between compiling for a stack based machine and compiling for a register based machine.

Basically I want to see what a register-machine's bytecode would look like, as opposed to a stack-machine's bytecode.

To understand this, I'd like to ask you to show me what the bytecode for the above expression would look like, for a register machine. I.e. what this: 2 * 5 + 1 would compile to.

Best Answer

Register machine bytecode often comes in three-address form, that is, it talks about data flow relationships between registers, and operations take explicit destination registers. So a basic instruction set may look like this:

set register, constant
mul out, in1, in2
add out, in1, in2

You could assume you have an infinite number of registers r0, r1, etc. and rewrite this in SSA form:

set r0, 2       // int r0 = 2;
set r1, 5       // int r1 = 5;
mul r2, r0, r1  // int r2 = r0 * r1;
set r3, 1       // int r3 = 1;
add r4, r2, r3  // int r4 = r2 + r3;

Then various analyses and optimisations become easy. You can reuse registers that are no longer active:

set r0, 2       // int r0 = 2;
set r1, 5       // int r1 = 5;
mul r0, r0, r1  // r0 *= r1;
set r1, 1       // r1 = 1;
add r0, r0, r1  // r0 += r1;

You can also allow instructions to take immediate constants, rather than loading everything into registers:

set r0, 2      // int r0 = 2;
mul r0, r0, 5  // r0 *= 5;
add r0, r0, 1  // r0 += 1;

When you do that, you notice a resemblance with existing register architectures such as x86, and indeed you can perform register allocation to map your virtual register machine onto real hardware:

mov eax, 2
mul eax, 5
add eax, 1

In either case, the bytecode is a flat representation of a data and control flow graph:

+---+
| 2 |
+---+
  r0
  |
  | +---+
  | | 5 |
  | +---+
  |   r1
  |   |
+-------+
|   ×   |
+-------+
  r2
  |
  | +---+
  | | 1 |
  | +---+
  |   r3
  |   |
+-------+
|   +   |
+-------+
  r4
  |

In the stack notation, each stack operation corresponds to one row of this diagram, and the width of the diagram at any point corresponds to the maximum stack depth at that point. In the SSA form, each register corresponds to a point in the graph where a value is produced. They’re simply different structures for talking about the same thing.

Related Topic