Type Inference in Interpreters – How Should It Change the Parsed AST?

functional programminginterpreterstype-systems

When writing an interpreter, how should the type inference algorithm change the parsed AST? Should it? Or parsing and inference are done necessarily simultaneously?

I have implemented a strongly typed functional language interpreter, in which each function can be declared only one time (i.e., with no signature variants).
As soon as the non-typed implementation parser was concluded and working, I have written the type inference algorithm, feeded by the parser's resulting AST.

Hindley-Damas-Milner is working like a charm, but the program use it and if the type checking passes, it just forgets about the matter and pass the AST to the interpreter. To the moment, the "target language" (i.e., the one in which the interpreter is written) have satisfied the interpreter type needs, but I am wondering if this approach is usual: to discard all the type inference fruits after an "ok" result.

Also, would writing a compiler instead of an interpreter make any difference is this sense?

Best Answer

No, it is definitely not the usual to throw away good analysis results. For a stark difference you may want to check out the Scala compiler. It is based on execution phases and you can add plugins which will be executed after certain phases. It is particularly interesting to place your plugin after the typing phase, because that gives you all of the inferred types in a very handy way.

The AST, however, is not so much changed as it is annotated. The basic structure of your statements, expressions, function definitions, etc. that are held in your AST do remain unchanged of course. What you really do during type inference is to annotate the tree nodes with their respective types.

For example, before typing inference ran you would have an AST with a function node for the function f, that is located somewhere within the AST and has some child nodes, which resemble arguments and statements/expressions. During type inference you determine the types of these child nodes, i.e. of the statements/expressions, and from them you determine the type of the function f. Now you could be satisfied detecting, that f is returning an int and it is only ever used in places, where int is used, but there is no reason to lose this information.

As for interpreter vs compiler, there are huge benefits from having types annotated to your AST, which go far beyond that. Consider for example IDEs.. plugin authors are very thankful for having a type-annotated AST available. It is so much more efficient to simply know that f returns an int, when you want to implement code completion, instead of having to run type inference yet again.

You may want to extend your language at one point. If the extension is anywhere past the type inference, you typically need to rely on its results. For example, consider adding type-safe macros, or some kind of semantic checks.

In summary: Type information in a strongly typed lanugage is so important, that it resembles blasphemy if you perform type inference only to throw away its results. You rob yourself and others of information that is both, extremely powerful when available, and rather costly to generate.

Related Solutions

Haskell – Type Checking and Recursive Types (Writing the Y Combinator in Haskell/Ocaml)

First, the GHC error,

GHC is attempting to unify a few constraints with x, first, we use it as a function so

x :: a -> b

Next we use it as a value to that function

x :: a

And finally we unify it with the original argument expression so

x :: (a -> b) -> c -> d

Now x x becomes an attempt to unify t2 -> t1 -> t0, however, We can't unify this since it would require unifying t2, the first argument of x, with x. Hence our error message.

Next, why not general recursive types. Well the first point worth noting is the difference between equi and iso recursive types,

equi-recursive are what you'd expect mu X . Type is exactly equivalent to expanding or folding it arbitrarily.
iso-recursive types provide a pair of operators, fold and unfold which fold and unfold the recursive definitions of types.

Now equi-recursive types sound ideal, but are absurdly hard to get right in complex types systems. It can actually make type checking undecidable. I'm not familiar with every detail of OCaml's type system but fully equirecursive types in Haskell can cause the typechecker to loop arbitrarily trying to unify types, by default, Haskell makes sure that type checking terminates. Further more, in Haskell, type synonyms are dumb, the most useful recursive types would be defined like type T = T -> (), however are inlined almost immediately in Haskell, but you can't inline a recursive type, it's infinite! Therefore, recursive types in Haskell would demand a huge overhaul to how synonyms are handled, probably not worth the effort to put even as a language extension.

Iso-recursive types are a bit of a pain to use, you more or less have to explicitly tell the type checker how to fold and unfold your types, making your programs more complex to read and write.

However, this is very similar to what your doing with your Mu type. Roll is fold, and unroll is unfold. So actually, we do have iso-recursive types baked in. However, equi-recursive types are just too complex so systems like OCaml and Haskell force you to pass recurrences through type level fixpoints.

Now if this interests you, I'd recommend Types and Programming Languages. My copy is sitting open in my lap as I'm writing this to make sure I've got the right terminology :)

How to alter the code at runtime in an interpreter

Side note: use compiler/interpreter distinction carefully, as it has its caveats.

Some folks use this distinction to assert that their language is faster because it is compiled/interpreted. The real answer is that a language cannot be faster/slower than another one: is German faster than Japanese?
Many other assertions related to this distinction don't make sense neither.
Many languages use more complicated approaches. For example, C# is compiled to IL, which is then “translated into native code or executed by a virtual machine”.
Some languages are “interpreted, but not really”.

Now to answer your question:

Altering the code:

Since the code is interpreted, it's easier to change it on the fly when executing it. One of such examples is eval(), used in some languages to dynamically inject source code available only as a string.

This being said, this point is both incomplete and confusing. C# is a compiled language (aside DLR), and still, there are ways to inject custom code during the execution of the app.

Code optimization:

Often, compilers are not limited to simply converting code written in source language to target language. They also do optimizations. Imagine you write:

const int ratio = 14;

function getFactor()
{
    int factor = 2 * ratio;
    string debug = "The factor is " + factor;
    return factor;
}

function computeSomething(int expenses)
{
    int factor;
    float result = (float)(expenses + factor) / 2;
    return result;
}

A compiler can use a few basic tricks to optimize this code:

Inline

The constant can be inlined, as well as the first function. This gives:

function computeSomething(int expenses)
{
    int factor = 2 * 14;
    string debug = "The factor is " + factor;
    float result = (float)(expenses + factor) / 2;
    return result;
}

Remove unused code

Here, this will result in an important improvement, since string concatenation is often an expensive operation.

function computeSomething(int expenses)
{
    int factor = 2 * 14;
    float result = (float)(expenses + factor) / 2;
    return result;
}

Optimize mathematical operations

Float multiplication is often cheaper than division.

function computeSomething(int expenses)
{
    int factor = 2 * 14;
    float result = (float)(expenses + factor) * 0.5;
    return result;
}

Remove intermediary variables

function computeSomething(int expenses)
{
    return (float)(expenses + (2 * 14)) * 0.5;
}

Note that many today's interpreters also do those optimizations, which means that when you use an interpreted language, you should write your code for humans, not for computers, letting the interpreter do work.

Best Answer

Related Solutions

Haskell – Type Checking and Recursive Types (Writing the Y Combinator in Haskell/Ocaml)

How to alter the code at runtime in an interpreter

Related Topic