Compiler Design – Writing a Compiler for Languages with Runtime Code Rewriting

compilerlispmacros

There are some programming languages, like the many dialects of Lisp, that allow for macro-metaprogramming: rewriting and altering sections of code before the code is run.

It is relatively trivial to write a simple interpreter for Lisp (mostly because there is only very little special syntax). However, I cannot understand how it would be possible to write a compiler for a language that allows you to rewrite code at-runtime (and then execute that code).

How is this done? Is the compiler itself basically included in the generated compiled program, such that it can compile new sections of code? Or is there another way?

Best Answer

Macros have the advantage to be expanded at compile time

The idea of Lisp macros is to be able to fully expand them at compile time. Then no compiler is needed at runtime. Most Lisp systems allow you to fully compile code. The compilation step includes the macro expansion phase. There is no expansion needed at runtime.

Often Lisp systems include a compiler, but this is needed when code is generated at runtime and this code would need to be compiled. But this is independent of macro expansion.

You will even find Lisp systems which don't include a compiler and even no full interpreter at runtime. All code will be compiled before runtime.

FEXPRs were code modifying functions, but were mostly replaced by Macros

In earlier times in the 60s/70s many Lisp systems included so-called FEXPR functions, which could translate code at runtime. But they could not be compiled before runtime. Macros replaced them mostly, since they enable full compilation.

An example of a macro interpreted and compiled

Let's look at LispWorks, which has both an interpreter and a compiler. It allows to mix interpreted and compiled code freely. The Read-Eval-Print-Loop uses the Interpreter to execute code.

Let's define a trivial macro. But the macro prints the code it gets called with, every time the macro runs.

CL-USER 45 > (defmacro my-if (test yes no)
               (format t "~%Expanding (my-if ~a ~a ~a)" test yes no)
               `(if ,test ,yes ,no))
MY-IF

Let's define a function which uses the macro from above. Remember: here in LispWorks the function will be interpreted.

CL-USER 46 > (defun test (x y)
               (my-if (> x y) 'larger 'not-larger))
TEST

If you look above, the Lisp system only printed the function name. The macro did not run - otherwise the macro would have printed something. So the code is not expanded.

Let's run the TEST function using the Interpreter:

CL-USER 47 > (loop for i below 5 collect (test i 3))

Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
(NOT-LARGER NOT-LARGER NOT-LARGER NOT-LARGER LARGER)

So you see that for some reason the macro expansion is run twice for each of the five calls to test. The macro is expanded by the interpreter every time the function TEST is called.

Now let's compile the function TEST:

CL-USER 48 > (compile 'test)

Expanding (my-if (> X Y) (QUOTE LARGER) (QUOTE NOT-LARGER))
TEST
NIL
NIL

You can see above that the compiler runs the macro once.

If we now run the function TEST, no macro expansion will happen. The macro form (MY-IF ...) has already been expanded by the compiler:

CL-USER 49 > (loop for i below 5 collect (test i 3))
(NOT-LARGER NOT-LARGER NOT-LARGER NOT-LARGER LARGER)

If you used some other Lisps like SBCL or CCL, they will compile everything by default. SBCL has in new versions also an interpreter. Let's do the example from above in a recent SBCL:

Let's use the new SBCL interpreter:

CL-USER> (setf sb-ext:*evaluator-mode* :interpret)
:INTERPRET

CL-USER> (defmacro my-if (test yes no)
           (format t "~%Expanding (my-if ~a ~a ~a)" test yes no)
           `(if ,test ,yes ,no))
MY-IF
CL-USER> (defun test (x y)
           (my-if (> x y) 'larger 'not-larger))
TEST
CL-USER> (loop for i below 5 collect (test i 3))

Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
(NOT-LARGER NOT-LARGER NOT-LARGER NOT-LARGER LARGER)
CL-USER> (compile 'test)

Expanding (my-if (> X Y) 'LARGER 'NOT-LARGER)
TEST
NIL
NIL
CL-USER> (loop for i below 5 collect (test i 3))
(NOT-LARGER NOT-LARGER NOT-LARGER NOT-LARGER LARGER)
CL-USER>

Related Solutions

General Rules for Writing a Compiler in Python

q1. pypy is the interpreter, a RPython program which can interpret Python code, there is no output language, so we can't consider it as a compiler, right?

PyPy is similar to CPython, both has a compiler+interpreter. CPython has a compiler written in C that compiles Python to Python VM bytecode then executes the bytecode in an interpreter written in C. PyPy has a compiler written in RPython that compiles Python to Python VM bytecode, then executes it in PyPy Interpreter written in RPython.

q2. Can compiler py2rpy exist, transforming all Python programs to RPython? In which language it's written is irrelevant. If yes, we get another compiler py2c. What's the difference between pypy and py2rpy in nature? Is py2rpy much harder to write than pypy?

Can a compiler py2rpy exists? Theoretically yes. Turing completeness guarantees so.

One method to construct py2rpy is to simply include the source code of a Python interpreter written in RPython in the generated source code. An example of py2rpy compiler, written in Bash:

// suppose that /pypy/source/ contains the source code for pypy (i.e. Python -> Nothing RPython)
cp /pypy/source/ /tmp/py2rpy/pypy/

// suppose $inputfile contains an arbitrary Python source code
cp $inputfile /tmp/py2rpy/prog.py

// generate the main.rpy
echo "import pypy; pypy.execfile('prog.py')" > /tmp/py2rpy/main.rpy

cp /tmp/py2rpy/ $outputdir

now whenever you need to translate a Python code to RPython code, you call this script, which produces -- in the $outputdir -- an RPython main.rpy, the RPython's Python Interpreter source code, and a binary blob prog.py. And then you can execute the generated RPython script by calling rpython main.rpy.

(note: since I'm not familiar with rpython project, the syntax for calling the rpython interpreter, the ability to import pypy and do pypy.execfile, and the .rpy extension is purely made up, but I think you get the point)

q3. Is there some general rules or theory available about this?

Yes, any Turing Complete language can theoretically be translated to any Turing Complete language. Some languages may be much more difficult to translate than other languages, but if the question is "is it possible?", the answer is "yes"

q4. ...

There is no question here.

Garbage Collection – How It Works in Natively Compiled Languages

Garbage collection in a compiled language works the same way as in an interpreted language. Languages like Go use tracing garbage collectors even though their code is usually compiled to machine code ahead-of-time.

(Tracing) garbage collection usually starts by walking the call stacks of all threads that are currently running. Objects on those stacks are always live. After that, the garbage collector traverses all objects that are pointed to by live objects, until the entire live object graph is discovered.

It is clear that doing this requires extra information that languages like C do not provide. In particular, it requires a map of the stack frame of each function that contains the offsets of all pointers (and probably their datatypes) as well as maps of all object layouts that contain the same information.

It is however easy to see that languages that have strong type guarantees (e.g. if pointer casts to different datatypes are disallowed) can indeed compute those maps at compile time. They simply store a association between instruction addresses and stack frame maps and an association between datatypes and object layout maps inside the binary. This information then allows them to do the object graph traversal.

The garbage collector itself is nothing more than a library that is linked to the program, similar to the C standard library. For example, this library could provide a function similar to malloc() that runs the collection algorithm if memory pressure is high.

Best Answer

Related Solutions

General Rules for Writing a Compiler in Python

Garbage Collection – How It Works in Natively Compiled Languages

Related Topic