q1. pypy is the interpreter, a RPython program which can interpret Python code, there is no output language, so we can't consider it as a compiler, right?
PyPy is similar to CPython, both has a compiler+interpreter. CPython has a compiler written in C that compiles Python to Python VM bytecode then executes the bytecode in an interpreter written in C. PyPy has a compiler written in RPython that compiles Python to Python VM bytecode, then executes it in PyPy Interpreter written in RPython.
q2. Can compiler py2rpy exist, transforming all Python programs to RPython? In which language it's written is irrelevant. If yes, we get another compiler py2c. What's the difference between pypy and py2rpy in nature? Is py2rpy much harder to write than pypy?
Can a compiler py2rpy exists? Theoretically yes. Turing completeness guarantees so.
One method to construct py2rpy
is to simply include the source code of a Python interpreter written in RPython in the generated source code. An example of py2rpy compiler, written in Bash:
// suppose that /pypy/source/ contains the source code for pypy (i.e. Python -> Nothing RPython)
cp /pypy/source/ /tmp/py2rpy/pypy/
// suppose $inputfile contains an arbitrary Python source code
cp $inputfile /tmp/py2rpy/prog.py
// generate the main.rpy
echo "import pypy; pypy.execfile('prog.py')" > /tmp/py2rpy/main.rpy
cp /tmp/py2rpy/ $outputdir
now whenever you need to translate a Python code to RPython code, you call this script, which produces -- in the $outputdir -- an RPython main.rpy
, the RPython's Python Interpreter source code, and a binary blob prog.py. And then you can execute the generated RPython script by calling rpython main.rpy
.
(note: since I'm not familiar with rpython project, the syntax for calling the rpython interpreter, the ability to import pypy and do pypy.execfile, and the .rpy extension is purely made up, but I think you get the point)
q3. Is there some general rules or theory available about this?
Yes, any Turing Complete language can theoretically be translated to any Turing Complete language. Some languages may be much more difficult to translate than other languages, but if the question is "is it possible?", the answer is "yes"
q4. ...
There is no question here.
Is it ok to have multiple classes in the same file in Python?
Yes. Both from a philosophical perspective as well as a practical one.
In Python, modules are a namespace that exist once in memory.
Say we had the following hypothetical directory structure, with one class defined per file:
Defines
abc/
|-- callable.py Callable
|-- container.py Container
|-- hashable.py Hashable
|-- iterable.py Iterable
|-- iterator.py Iterator
|-- sized.py Sized
... 19 more
All of these classes are available in the collections
module and (there are, in fact, 25 in total) defined in the standard library module in _collections_abc.py
There are a couple of issues here that I believe makes the _collections_abc.py
superior to the alternative hypothetical directory structure.
- These files are sorted alphabetically. You could sort them in other ways, but I am not aware of a feature that sorts files by semantic dependencies. The _collections_abc module source is organized by dependency.
- In non-pathological cases, both modules and class definitions are singletons, occurring once each in memory. There would be a bijective mapping of modules onto classes - making the modules redundant.
- The increasing number of files makes it less convenient to casually read through the classes (unless you have an IDE that makes it simple) - making it less accessible to people without tools.
Are you prevented from breaking groups of classes into different modules when you find it desirable from a namespacing and organizational perspective?
No.
From the Zen of Python , which reflects the philosophy and principles under which it grew and evolved:
Namespaces are one honking great idea -- let's do more of those!
But let us keep in mind that it also says:
Flat is better than nested.
Python is incredibly clean and easy to read. It encourages you to read it. Putting every separate class in a separate file discourages reading. This goes against the core philosophy of Python. Look at the structure of the Standard Library, the vast majority of modules are single-file modules, not packages. I would submit to you that idiomatic Python code is written in the same style as the CPython standard lib.
Here's the actual code from the abstract base class module. I like to use it as a reference for the denotation of various abstract types in the language.
Would you say that each of these classes should require a separate file?
class Hashable:
__metaclass__ = ABCMeta
@abstractmethod
def __hash__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Hashable:
try:
for B in C.__mro__:
if "__hash__" in B.__dict__:
if B.__dict__["__hash__"]:
return True
break
except AttributeError:
# Old-style class
if getattr(C, "__hash__", None):
return True
return NotImplemented
class Iterable:
__metaclass__ = ABCMeta
@abstractmethod
def __iter__(self):
while False:
yield None
@classmethod
def __subclasshook__(cls, C):
if cls is Iterable:
if _hasattr(C, "__iter__"):
return True
return NotImplemented
Iterable.register(str)
class Iterator(Iterable):
@abstractmethod
def next(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
raise StopIteration
def __iter__(self):
return self
@classmethod
def __subclasshook__(cls, C):
if cls is Iterator:
if _hasattr(C, "next") and _hasattr(C, "__iter__"):
return True
return NotImplemented
class Sized:
__metaclass__ = ABCMeta
@abstractmethod
def __len__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Sized:
if _hasattr(C, "__len__"):
return True
return NotImplemented
class Container:
__metaclass__ = ABCMeta
@abstractmethod
def __contains__(self, x):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Container:
if _hasattr(C, "__contains__"):
return True
return NotImplemented
class Callable:
__metaclass__ = ABCMeta
@abstractmethod
def __call__(self, *args, **kwds):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Callable:
if _hasattr(C, "__call__"):
return True
return NotImplemented
So should they each have their own file?
I hope not.
These files are not just code - they are documentation on the semantics of Python.
They are maybe 10 to 20 lines on average. Why should I have to go to a completely separate file to see another 10 lines of code? That would be highly impractical. Further, there would be nearly identical boilerplate imports on each file, adding more redundant lines of code.
I find it quite useful to know that there is a single module where I can find all of these Abstract Base Classes, instead of having to look over a list of modules. Viewing them in context with each other allows me to better understand them. When I see that an Iterator is an Iterable, I can quickly review what an Iterable consists of by glancing up.
I sometimes wind up having a couple of very short classes. They stay in the file, even if they need to grow larger over time. Sometimes mature modules have over 1000 lines of code. But ctrl-f is easy, and some IDE's make it easy to view outlines of the file - so no matter how large the file, you can quickly go to whatever object or method that you're looking for.
Conclusion
My direction, in the context of Python, is to prefer to keep related and semantically similar class definitions in the same file. If the file grows so large as to become unwieldy, then consider a reorganization.
Best Answer
Code written in different languages can interact in a number of ways.
At the source level, cross-compilation from one language into the other can be done for some combinations of languages (for example, Google's GWT includes a java-to-javascript compiler; the Glasgow Haskell compiler can compile to C; early versions of C++ compiled to C). Most of the time, however this is not really feasible.
Languages that share a virtual platform, such as the JVM or the .NET runtime, can usually interact through mechanisms exposed by the platform - for example all JVM languages can access Java libraries and use them to communicate among each other, and they can call methods and use classes created in any other JVM language.
Many programming languages, including Python, offer a mechanism to interface with native libraries, typically written in C. Using such a mechanism, it is possible to call native functions from another, more high-level, language. Popular libraries often have bindings readily available. This technique is usually referred to as a "Foreign Function Interface". The Python-into-C interface is the CFFI.
Another option is to build two completely separate programs and have them interact at runtime. There are various mechanisms to achieve this; the easiest is through a pipe (look into the
subprocess
module for python): basically, one program calls the other, sending input to its stdin and reading the result back from its stdout. This makes one program a subprocess of the other; if you need both to be long-lived and started independently, data can be passed back and forth through named pipes, (local) network sockets, shared files, and (depending on the platform) other means. Which one is best depends.