From my understanding the default Python interpreter(CPython) compiles source code into bytecode and then interprets the bytecode into machine code. PyPy on the other hand makes use of JIT to optimize often interpreted bytecode into compiled machine code. How is this different then the JVM? The JVM is an interpreter + compiler. It compiles source code to bytecode and then optimizes the often interpreted bytecode into compiled machine code. Is there any other difference?
Python – Difference between PyPy and JVM
compilerinterpretersjvmpythonvirtual machine
Related Solutions
q1. pypy is the interpreter, a RPython program which can interpret Python code, there is no output language, so we can't consider it as a compiler, right?
PyPy is similar to CPython, both has a compiler+interpreter. CPython has a compiler written in C that compiles Python to Python VM bytecode then executes the bytecode in an interpreter written in C. PyPy has a compiler written in RPython that compiles Python to Python VM bytecode, then executes it in PyPy Interpreter written in RPython.
q2. Can compiler py2rpy exist, transforming all Python programs to RPython? In which language it's written is irrelevant. If yes, we get another compiler py2c. What's the difference between pypy and py2rpy in nature? Is py2rpy much harder to write than pypy?
Can a compiler py2rpy exists? Theoretically yes. Turing completeness guarantees so.
One method to construct py2rpy
is to simply include the source code of a Python interpreter written in RPython in the generated source code. An example of py2rpy compiler, written in Bash:
// suppose that /pypy/source/ contains the source code for pypy (i.e. Python -> Nothing RPython)
cp /pypy/source/ /tmp/py2rpy/pypy/
// suppose $inputfile contains an arbitrary Python source code
cp $inputfile /tmp/py2rpy/prog.py
// generate the main.rpy
echo "import pypy; pypy.execfile('prog.py')" > /tmp/py2rpy/main.rpy
cp /tmp/py2rpy/ $outputdir
now whenever you need to translate a Python code to RPython code, you call this script, which produces -- in the $outputdir -- an RPython main.rpy
, the RPython's Python Interpreter source code, and a binary blob prog.py. And then you can execute the generated RPython script by calling rpython main.rpy
.
(note: since I'm not familiar with rpython project, the syntax for calling the rpython interpreter, the ability to import pypy and do pypy.execfile, and the .rpy extension is purely made up, but I think you get the point)
q3. Is there some general rules or theory available about this?
Yes, any Turing Complete language can theoretically be translated to any Turing Complete language. Some languages may be much more difficult to translate than other languages, but if the question is "is it possible?", the answer is "yes"
q4. ...
There is no question here.
No. The reason why there are speed differences between languages like Python and C++ is because statically-typed languages give the compiler tons of information about the structure of the program and its data which allows it to optimize both computations and memory access. Because C++ knows that variable is of type int, it can determine the optimal way to manipulate that variable even before the program is run. In Python on the other hand, the runtime doesn't know what value is in a variable right until the line is reached by the interpreter. This is extremely important for structures, where in C++, the compiler can easily tell the size of the structure and every location of its fields within memory during compilation. This gives it huge power in predicting how the data might be used and lets it optimize according to those predictions. No such thing is possible for languages like Python.
To effectively compile languages like Python you would need to:
- Ensure that the structure of data is static during the execution of the program. This is problematic because Python has eval and metaclasses. Both which make it possible to change the structure of the program based on the input of the program. This is one of the things that give Python such expressive power.
- Infer the types of all variables, structures and classes from the source code itself. While it is possible to some degree, the static type system and algorithm would be so complex it would be almost impossible to implement in a usable way. You could do it for a subset of the language, but definitely not for the whole set of language features.
Best Answer
There are two major differences between the two.
First: the JVM is abstract, PyPy is concrete. The JVM is a specification, a piece of paper. PyPy is an implementation, a piece of code.
There are many different implementations of the JVM which work very differently. Some only interpret the JVM byte code, some only compile it statically ahead-of-time, some only compile it dynamically just-in-time, some have both an interpreter and a JIT compiler, some have an interpreter and multiple JIT compilers, some have no interpreter and only one JIT compiler, some have no interpreter and multiple JIT compilers. Some have tracing JITs, some have method-at-a-time JITs, some have both. Some have native threads, some have green threads. Some have tracing garbage collectors, some have reference counting garbage collectors. And so on and so forth.
Second: PyPy is more general. It is not an implementation of a specific language, it is a framework for easily creating efficient language implementations. There are a lot of different language implementations built using the PyPy framework, there's Topaz (an implementation of Ruby), HippyVM (an implementation of PHP), Pyrolog (Prolog), RSqueak (a Squeak VM), PyGirl (a GameBoy emulator), langjs (JavaScript), and also implementations of Io and Scheme. And of course also an implementation of Python.
Since you asked specifically about the compilers, there is a very important distinction between PyPy's JIT and the JIT compilers of other mixed-mode engines. In a typical mixed-mode engine (e.g. Oracle HotSpot JVM, IBM J9 JVM, Rubinius, Apple Squirrelfish FX, …), the interpreter and the compiler run side-by-side and process the same program. The interpreter starts off, interpreting the program, and once it has been determined that it would be beneficial to compile (parts of) the program, the program gets handed off to the compiler and compiled.
In PyPy, however, the compiler doesn't compile the program that is interpreted by the interpreter. It compiles the interpreter itself as it is interpreting the program!
Now, why would you do something like this? Think about what this means: if you JIT compile the interpreter while it is interpreting the program, what you end up with, is a specialized version of the interpreter which can only interpret that one program, all together compiled to native code. But, an interpreter which can only interpret one single program is indistinguishable from that program. So, in other words, you have just compiled that program without even knowing anything about that program!
This has to do with PyPy being intended as a framework: this way, you only need one JIT compiler and it works for all languages! The only thing you have to write if you want to implement a new language in the PyPy framework, is the interpreter. You get the JIT compiler "for free". And the interpreter can be very simple, it doesn't have to perform any aggressive optimizations or so, because the JIT compiler is quite good. (For example, HippyVM, the PHP implementation using PyPy, is almost 8 times faster than the Zend Engine (the standard PHP implementation) and twice as fast as Facebook's aggressively optimized high-performance PHP implementation HHVM.)