Virtual Machines – Why Use Stack or Register Machines?

bytecodevirtual machine

(This is an extremely newbie-ish question).

I've been studying a little about Virtual Machines.

Turns out a lot of them are designed very similarly to physical or theoretical computers.

I read that the JVM for example, is a 'stack machine'. What that means (and correct me if I'm wrong) is that it stores all of it's 'temporary memory' on a stack, and makes operations on this stack for all of it's opcodes.

For example, the source code 2 + 3 will be translated to bytecode similar to:

push 2
push 3
add

My question is this:

JVMs are probably written using C/C++ and such. If so, why doesn't the JVM execute the following C code: 2 + 3..? I mean, why does it need a stack, or in other VMs 'registers' – like in a physical computer?

The underlying physical CPU takes care of all of this. Why don't VM writers simply execute the interpreted bytecode with 'usual' instructions in the language the VM is programmed with?

Why do VMs need to emulate hardware, when the actual hardware already does this for us?

Again, very newbie-ish questions. Thanks for your help

Best Answer

A machine, virtual or not, needs a model of computation which describes how computation is carried out on it. By definition, as soon as it computes, it implements some model of computation. The question then is: What model should we choose for our VM? Physical machines are constrained by what can be effectively and efficiently done in hardware. But, as you note, virtual machines have no such constraints, they are defined in software using arbitrarily high level languages.

There are, in fact, virtual machines that are high-level as you describe. They are called programming languages. The C standard for example dedicates the bulk of its pages to defining a model for the so-called "C abstract machine" which describes how C programs behave, and by extension (as-if rule) how a conforming C compiler (or interpreter) should behave.

Of course, we usually don't call that a virtual machine. A VM is usually takes to mean something lower-level, closer to hardware, not intended to be directly programmed, designed to be executed efficiently. This selection bias means that something that accepts high-level composable code (like what you describe) wouldn't be considered a VM because is executes high-level code.

But to get the to the point, here are some reasons to make a VM (as in, something targeted by a bytecode compiler) register-based or the like. Stack and register machines are extremely simple. There's a sequence of instructions, some state, and semantics for each instruction (a function State -> State). No complex tree reductions, no operator precedence. Parsing, analysing and executing it is very simple, because it's a minimal language (syntactic sugar is compiled away) and designed to be machine-read rather than human-read.

In contrast, parsing even the simplest C-like languages is quite hard, and executing it requires non-local analyses like checking and propagating types, resolving overloads, maintaining a symbol table, resolving string identifiers, turning linear text into a precedence-driven AST, and so on. It builds on concepts that come natural to humans but have to be painstakingly reverse engineered by machines.

JVM bytecode, for example, is emitted by javac. It virtually never needs to be read or written by humans, so it's natural to gear it towards consumption by machines. If you optimized it for humans, the JVM would just on every startup read the code, parse it, analyze is, and then convert it into an intermediate representation resembling such a simplified machine model anyway. Might as well cut out the middle man.

Related Topic