Programming Languages – What is a Type System?

inheritancelanguage-designprogramming-languagestype-systemsvirtual machine

Background

I am designing a language, as a side project. I have a working assembler, static analyser, and virtual machine for it. Since I can already compile and run non-trivial programs using the infrastructure I've built I thought about giving a presentation at my university.

During my talk I mentioned that the VM provides a type system, was asked "What is your type system for?". After answering I got laughed at by the person asking the question.

Thus, even though I am almost certainly going to lose reputation for asking this question, I turn to Programmers.

My understanding

As I understand them, type systems are used to provide additional layer of information about entities in a program, so that the runtime, or the compiler, or any other piece of machinery, knows what to do with the strings of bits it operates on.
They also help maintain contracts – the compiler (or code analyser, or runtime, or any other program) can verify that at any given point the program operates on values programmers expect it to operate on.

Types can also be used to provide information to those human programmers.
For example, I find this declaration:

function sqrt(double n) -> double;

more useful than this one

sqrt(n)

The former gives plenty of information: that the sqrt identifier is a function, takes a single double as input, and produces another double as output.
The latter tells you that it is probably a function taking a single parameter.

My answer

So, after being asked "What is your type system for?" I answered as follows:

The type system is dynamic (types are assigned to values, not to variables holding them), but strong without surprising coercion rules (you can't add string to integer as they represent incompatible types, but you can add integer to floating point number).

The type system is used by the VM to ensure that operands for instructions are valid; and can be used by programmers to ensure that parameters passed to their functions are valid (i.e. of correct type).
The type system supports subtyping and multiple inheritance (both features are available to programmers), and types are considered when dynamic dispatch of methods on objects is used – VM uses types to check by what function is a given message implemented for given type.

The follow-up question was "And how is type assigned to a value?". So I explained that all values are boxed, and have a pointer pointing to a type definition structure which provides information about name of the type, what messages it responds to, and what types it inherits from.

After that, I got laughed at, and my answer was dismissed with the comment "That is not a real typesystem.".

So – if what I described does not qualify as a "real typesystem", what would? Was that person right that what I provide cannot be considered a typesystem?

Best Answer

That all seems like a fine description of what type systems provide. And your implementation sounds like a reasonable enough one for what it's doing.

For some languages, you won't need the runtime information since your language doesn't do runtime dispatch (or you do single dispatch via vtables or another mechanism, so don't need the type information). For some languages, just having a symbol/placeholder is sufficient since you only care about type equality, not its name or inheritance.

Depending on your environment, the person may have wanted more formalism in your type system. They want to know what you can prove with it, not what programmers can do with it. This is pretty common in academia unfortunately. Though academics do such things because it's pretty easy to have flaws in your type system that allow things to escape correctness. It's possible they spotted one of these.

If you had further questions, Types and Programming Languages is the canonical book on the subject and can help you to learn some of the rigor needed by academics, as well as some of the terminology to help describe things.

Type systems prevent errors

Type systems eliminates illegal programs. Consider the following Python code.

 a = 'foo'
 b = True
 c = a / b

In Python, this program fails; it throws an exception. In a language like Java, C#, Haskell, whatever, this isn't even a legal program. You entirely avoid these errors because they simply aren't possible in the set of input programs.

Similarly, a better type system rules out more errors. If we jump up to super advanced type systems we can say things like this:

 Definition divide x (y : {x : integer | x /= 0}) = x / y

Now the type system guarantees that there aren't any divide-by-0 errors.

What sort of errors

Here's a brief list of what errors type systems can prevent

Out-of-range errors
SQL injection
Generalizing 2, many safety issues (what taint checking is for in Perl)
Out-of-sequence errors (forgetting to call init)
Forcing a subset of values to be used (for example, only integers greater than 0)
~~Nefarious kittens~~ (Yes, it was a joke)
Loss-of-precision errors
Software transactional memory (STM) errors (this needs purity, which also requires types)
Generalizing 8, controlling side effects
Invariants over data structures (is a binary tree balanced?)
Forgetting an exception or throwing the wrong one

And remember, this is also at compile time. No need to write tests with 100% code coverage to simply check for type errors, the compiler just does it for you :)

Case study: Typed lambda calculus

Alright, let's examine the simplest of all type systems, simply typed lambda calculus.

Basically there are two types,

Type = Unit | Type -> Type

And all terms are either variables, lambdas, or application. Based on this, we can prove that any well typed program terminates. There is never a situation where the program will get stuck or loop forever. This isn't provable in normal lambda calculus because well, it isn't true.

Think about this, we can use type systems to guarentee that our program doesn't loop forever, rather cool right?

Detour into dynamic types

Dynamic type systems can offer identical guarantees as static type systems, but at runtime rather than compile time. Actually, since it's runtime, you can actually offer more information. You lose some guarantees however, particularly about static properties like termination.

So dynamic types don't rule out certain programs, but rather route malformed programs to well-defined actions, like throwing exceptions.

TLDR

So the long and the short of it, is that type systems rule out certain programs. Many of the programs are broken in some way, therefore, with type systems we avoid these broken programs.

Type Systems – Characteristics of a Good Generic Type System

While Generics have been mainstream in the functional programming community for decades, adding generics to object oriented programming languages offers some unique challenges, specifically the interaction of subtyping and generics.

However, even if we focus on object oriented programming languages, and Java in particular, a far better generics system could have been designed:

Generic types should be admissible wherever other types are. In particular, if T is a type parameter, the following expressions should compile without warnings:
```
object instanceof T; 
T t = (T) object;
T[] array = new T[1];
```
Yes, this requires generics to be reified, just like every other type in the language.
Covariance and contravariance of a generic type should be specified in (or inferred from) its declaration, rather than every time the generic type is used, so we can write
```
Future<Provider<Integer>> s;
Future<Provider<Number>> o = s; 
```
rather than
```
Future<? extends Provider<Integer>> s;
Future<? extends Provider<? extends Number>> o = s;
```

As generic types can get rather long, we should not need to specify them redundantly. That is, we should be able to write

Map<String, Map<String, List<LanguageDesigner>>> map;
for (var e : map.values()) {
    for (var list : e.values()) {
        for (var person : list) {
            greet(person);
        }
    }
}

rather than

Map<String, Map<String, List<LanguageDesigner>>> map;
for (Map<String, List<LanguageDesigner>> e : map.values()) {
    for (List<LanguageDesigner> list : e.values()) {
        for (LanguageDesigner person : list) {
            greet(person);
        }
    }
}

Any type should be admissible as a type parameter, not just reference types. (If we can have an int[], why can we not have a List<int>)?

All of this is possible in C#.