Duck Typing – Type Inference with Duck Typing

duck-typingprogramming-languagestype-systems

Suppose we have a functional language where objects don't have explicitly defined types, but where named properties can nonetheless be accessed on objects. Is it then possible for the compiler to trace throughout the program which properties could be accessed on which variables and do full type inference on the program so that if the program compiles, it's guaranteed that all accessed properties must exist? For example, each property name could correspond to a Haskell typeclass and the compiler could check the soundness using Hindley-Milner.

Note that like with most compilers, the compiler would be checking types on variables, not objects. Therefore it would reject some programs all of whose property accesses are valid. Accepting a program if and only if all its accesses are valid is obviously uncomputable, and that is not what I am suggesting the compiler do.

Does this method of type checking work? Would it be practical? If so, why is it not used as a compromise between dynamically and statically typed languages?

Best Answer

Suppose we have a functional language where objects don't have explicitly defined types, but where named properties can nonetheless be accessed on objects. Is it then possible for the compiler to trace throughout the program which properties could be accessed on which variables and do full type inference on the program so that if the program compiles, it's guaranteed that all accessed properties must exist? For example, each property name could correspond to a Haskell typeclass and the compiler could check the soundness using Hindley-Milner.

Yeah, we can do something a lot like that! But I doubt that it would be practical.

Let's take a look at what this might look like. Consider the following ordinary Python function (taken from https://github.com/fchollet/keras/blob/master/tests/keras/test_sequential_model.py):

def test_sequential_pop():
    model = Sequential()
    model.add(Dense(num_hidden, input_dim=input_dim))
    model.add(Dense(num_class))
    model.compile(loss='mse', optimizer='sgd')
    x = np.random.random((batch_size, input_dim))
    y = np.random.random((batch_size, num_class))
    model.fit(x, y, epochs=1)
    model.pop()
    assert len(model.layers) == 1
    assert model.output_shape == (None, num_hidden)
    model.compile(loss='mse', optimizer='sgd')
    y = np.random.random((batch_size, num_hidden))
    model.fit(x, y, epochs=1)

Suppose we want to do duck typing, and we want type inference as well. Python doesn't have type inference, of course, but Haskell does, and we can convince Haskell to do something a lot like duck typing.

Duck typing means that we don't really care about the actual types of all the things we're using; all we care about is that the things can be used together, in the way we're using them. In order to make Haskell happy with that, we'll use implicit parameters in order to create objects and access their properties. Just like we want, the compiler will trace what properties are being accessed on which variables, and so forth.

Our Haskell code might look like this:

testSequentialPop = do
    model <- ?newSequential
    ?newDense numHidden (Just inputDim) >>= ?addLayer model
    ?newDense numClass Nothing >>= ?addLayer model
    ?compileModel model "mse" "sgd"
    x <- ?getRandom [batchSize, inputDim]
    y <- ?getRandom [batchSize, numClass]
    ?fitModel model x y 1
    ?popModel model
    ?assert (?length (?layers model) == 1)
    ?assert (?outputShape model == [Nothing, Just numHidden])
    ?compileModel model "mse" "sgd"
    y2 <- ?getRandom [batchSize, numHidden]
    ?fitModel model x y 1

So far, so good. This code will compile just fine. If we write the rest of the program, it will run just fine, too.

So what's the problem? Let's ask GHC what the type of testSequentialPop is. GHC says:

testSequentialPop
  :: (Eq a5, Eq a7, Monad m, Num a2, Num a3, Num a5, Num a7, Num t2,
      Num a9, ?addLayer::t -> a1 -> m a, ?assert::Bool -> m a6,
      ?compileModel::t -> [Char] -> [Char] -> m a8,
      ?fitModel::t -> t1 -> t1 -> a9 -> m b, ?getRandom::[t2] -> m t1,
      ?layers::t -> t3, ?length::t3 -> a5,
      ?newDense::a2 -> Maybe a3 -> m a1, ?newSequential::m t,
      ?outputShape::t -> [Maybe a7], ?popModel::t -> m a4) =>
     m b

Ooh, that's pretty complicated.

The problem here is that the function has to mention every operation it ever performs on the input objects. If you had a large program that does complicated things with input objects, you could end up with a type which contains hundreds, maybe thousands of constraints.

Type inference will help things a little bit, but not that much. Type inference sometimes saves programmers from having to calculate types themselves, or from having to type them out in full. Programmers will still have to understand types in order to figure out how functions can be used, and to diagnose what's causing type errors.

That said, there are tools which do something similar to this. For example, Checkmarx makes a static code analysis tool which can detect certain security vulnerabilities, such as SQL injection attacks, by tracing how objects are created and used. A SQL injection attack is essentially a duck typing error: you're creating an object (a string containing user input), and then performing an operation (using it as a SQL query) on it, even though that object does not support that operation. Checkmarx traces the entire path of this object, from creation to use, and, if it finds any problems, it shows you the entire path.

I don't know if it would be feasible to extend this idea so that it works on all operations, rather than just operations that are a security hazard.

Related Solutions

Java Type Systems – Why Java Does Not Use Type Inference

Technically speaking, Java does have type inferencing when using generics. With a generic method like

public <T> T foo(T t) {
  return t;
}

The compiler will analyze and understand that when you write

// String
foo("bar");
// Integer
foo(new Integer(42));

A String is going to be returned for the first call and an Integer for the second call based on what was input as an argument. You will get the proper compile-time checking as a result. Additionally, in Java 7, one can get some additional type inferencing when instantiating generics like so

Map<String, String> foo = new HashMap<>();

Java is kind enough to fill in the blank angle brackets for us. Now why doesn't Java support type inferencing as a part of variable assignment? At one point, there was an RFE for type inferencing in variable declarations, but this was closed as "Will not fix" because

Humans benefit from the redundancy of the type declaration in two ways. First, the redundant type serves as valuable documentation - readers do not have to search for the declaration of getMap() to find out what type it returns. Second, the redundancy allows the programmer to declare the intended type, and thereby benefit from a cross check performed by the compiler.

The contributor who closed this also noted that it just feels "un-java-like", which I am one to agree with. Java's verbosity can be both a blessing and a curse, but it does make the language what it is.

Of course that particular RFE was not the end of that conversation. During Java 7, this feature was again considered, with some test implementations being created, including one by James Gosling himself. Again, this feature was ultimately shot down.

With the release of Java 8, we now get type inference as a part of lambdas as such:

List<String> names = Arrays.asList("Tom", "Dick", "Harry");
Collections.sort(names, (first, second) -> first.compareTo(second));

The Java compiler is able to look at the method Collections#sort(List<T>, Comparator<? super T>) and then the interface of Comparator#compare(T o1, T o2) and determine that first and second should be a String thus allowing the programmer to forgo having to restate the type in the lambda expression.

Type Inference in Interpreters – How Should It Change the Parsed AST?

No, it is definitely not the usual to throw away good analysis results. For a stark difference you may want to check out the Scala compiler. It is based on execution phases and you can add plugins which will be executed after certain phases. It is particularly interesting to place your plugin after the typing phase, because that gives you all of the inferred types in a very handy way.

The AST, however, is not so much changed as it is annotated. The basic structure of your statements, expressions, function definitions, etc. that are held in your AST do remain unchanged of course. What you really do during type inference is to annotate the tree nodes with their respective types.

For example, before typing inference ran you would have an AST with a function node for the function f, that is located somewhere within the AST and has some child nodes, which resemble arguments and statements/expressions. During type inference you determine the types of these child nodes, i.e. of the statements/expressions, and from them you determine the type of the function f. Now you could be satisfied detecting, that f is returning an int and it is only ever used in places, where int is used, but there is no reason to lose this information.

As for interpreter vs compiler, there are huge benefits from having types annotated to your AST, which go far beyond that. Consider for example IDEs.. plugin authors are very thankful for having a type-annotated AST available. It is so much more efficient to simply know that f returns an int, when you want to implement code completion, instead of having to run type inference yet again.

You may want to extend your language at one point. If the extension is anywhere past the type inference, you typically need to rely on its results. For example, consider adding type-safe macros, or some kind of semantic checks.

In summary: Type information in a strongly typed lanugage is so important, that it resembles blasphemy if you perform type inference only to throw away its results. You rob yourself and others of information that is both, extremely powerful when available, and rather costly to generate.

Best Answer

Related Solutions

Java Type Systems – Why Java Does Not Use Type Inference

Type Inference in Interpreters – How Should It Change the Parsed AST?

Related Topic