Python – Difference between PyPy and JVM

compilerinterpretersjvmpythonvirtual machine

From my understanding the default Python interpreter(CPython) compiles source code into bytecode and then interprets the bytecode into machine code. PyPy on the other hand makes use of JIT to optimize often interpreted bytecode into compiled machine code. How is this different then the JVM? The JVM is an interpreter + compiler. It compiles source code to bytecode and then optimizes the often interpreted bytecode into compiled machine code. Is there any other difference?

Best Answer

There are two major differences between the two.

First: the JVM is abstract, PyPy is concrete. The JVM is a specification, a piece of paper. PyPy is an implementation, a piece of code.

There are many different implementations of the JVM which work very differently. Some only interpret the JVM byte code, some only compile it statically ahead-of-time, some only compile it dynamically just-in-time, some have both an interpreter and a JIT compiler, some have an interpreter and multiple JIT compilers, some have no interpreter and only one JIT compiler, some have no interpreter and multiple JIT compilers. Some have tracing JITs, some have method-at-a-time JITs, some have both. Some have native threads, some have green threads. Some have tracing garbage collectors, some have reference counting garbage collectors. And so on and so forth.

Second: PyPy is more general. It is not an implementation of a specific language, it is a framework for easily creating efficient language implementations. There are a lot of different language implementations built using the PyPy framework, there's Topaz (an implementation of Ruby), HippyVM (an implementation of PHP), Pyrolog (Prolog), RSqueak (a Squeak VM), PyGirl (a GameBoy emulator), langjs (JavaScript), and also implementations of Io and Scheme. And of course also an implementation of Python.

Since you asked specifically about the compilers, there is a very important distinction between PyPy's JIT and the JIT compilers of other mixed-mode engines. In a typical mixed-mode engine (e.g. Oracle HotSpot JVM, IBM J9 JVM, Rubinius, Apple Squirrelfish FX, …), the interpreter and the compiler run side-by-side and process the same program. The interpreter starts off, interpreting the program, and once it has been determined that it would be beneficial to compile (parts of) the program, the program gets handed off to the compiler and compiled.

In PyPy, however, the compiler doesn't compile the program that is interpreted by the interpreter. It compiles the interpreter itself as it is interpreting the program!

Now, why would you do something like this? Think about what this means: if you JIT compile the interpreter while it is interpreting the program, what you end up with, is a specialized version of the interpreter which can only interpret that one program, all together compiled to native code. But, an interpreter which can only interpret one single program is indistinguishable from that program. So, in other words, you have just compiled that program without even knowing anything about that program!

This has to do with PyPy being intended as a framework: this way, you only need one JIT compiler and it works for all languages! The only thing you have to write if you want to implement a new language in the PyPy framework, is the interpreter. You get the JIT compiler "for free". And the interpreter can be very simple, it doesn't have to perform any aggressive optimizations or so, because the JIT compiler is quite good. (For example, HippyVM, the PHP implementation using PyPy, is almost 8 times faster than the Zend Engine (the standard PHP implementation) and twice as fast as Facebook's aggressively optimized high-performance PHP implementation HHVM.)