q1. pypy is the interpreter, a RPython program which can interpret Python code, there is no output language, so we can't consider it as a compiler, right?
PyPy is similar to CPython, both has a compiler+interpreter. CPython has a compiler written in C that compiles Python to Python VM bytecode then executes the bytecode in an interpreter written in C. PyPy has a compiler written in RPython that compiles Python to Python VM bytecode, then executes it in PyPy Interpreter written in RPython.
q2. Can compiler py2rpy exist, transforming all Python programs to RPython? In which language it's written is irrelevant. If yes, we get another compiler py2c. What's the difference between pypy and py2rpy in nature? Is py2rpy much harder to write than pypy?
Can a compiler py2rpy exists? Theoretically yes. Turing completeness guarantees so.
One method to construct py2rpy
is to simply include the source code of a Python interpreter written in RPython in the generated source code. An example of py2rpy compiler, written in Bash:
// suppose that /pypy/source/ contains the source code for pypy (i.e. Python -> Nothing RPython)
cp /pypy/source/ /tmp/py2rpy/pypy/
// suppose $inputfile contains an arbitrary Python source code
cp $inputfile /tmp/py2rpy/prog.py
// generate the main.rpy
echo "import pypy; pypy.execfile('prog.py')" > /tmp/py2rpy/main.rpy
cp /tmp/py2rpy/ $outputdir
now whenever you need to translate a Python code to RPython code, you call this script, which produces -- in the $outputdir -- an RPython main.rpy
, the RPython's Python Interpreter source code, and a binary blob prog.py. And then you can execute the generated RPython script by calling rpython main.rpy
.
(note: since I'm not familiar with rpython project, the syntax for calling the rpython interpreter, the ability to import pypy and do pypy.execfile, and the .rpy extension is purely made up, but I think you get the point)
q3. Is there some general rules or theory available about this?
Yes, any Turing Complete language can theoretically be translated to any Turing Complete language. Some languages may be much more difficult to translate than other languages, but if the question is "is it possible?", the answer is "yes"
q4. ...
There is no question here.
This definition would clearly put the usual execution of Java bytecode in the domain of interpretation, no matter how much JIT compilation is done. I have encountered opinions in discussions on this site that clearly and vehemently state the opposite, i.e. that Java Bytecode execution thingies are compilers.
Well, the two definitions aren't mutually exclusive. An interpreter can contain a compiler (in fact, most modern interpreters contain at least a bytecode compiler).
But I think the intuitive definition most people here use is something like:
- a compiler creates native code that is then run directly by the CPU
- an interpreter has some kind of "interpreter main loop" that reads instructions (either source code statements or something like precompiled P-Code or bytecode) and performs instructions accordingly.
By this distinction, an "interpreter" is usually an order of magnitude slower than a "compiled" program. So when we're talking about performance, this definition is more useful than the classic CS definition.
Best Answer
They get tested thoroughly via usage by thousands or even millions of developers over time.
Also, the problem to be solved is well defined (by a very detailed technical specification). And the nature of the task lends itself easily to unit / system tests. I.e. it is basically translating textual input in a very specific format to output in another kind of well defined format (some sort of bytecode or machine code). So it is easy to create and verify test cases.
Moreover, usually the bugs are easy to reproduce too: apart from the exact platform and compiler version info, usually all you need is a piece of input code. Not to mention that the compiler users (being developers themselves) tend to give far more precise and detailed bug reports than any average computer user :-)