Duck Typing – Type Inference with Duck Typing

duck-typingprogramming-languagestype-systems

Suppose we have a functional language where objects don't have explicitly defined types, but where named properties can nonetheless be accessed on objects. Is it then possible for the compiler to trace throughout the program which properties could be accessed on which variables and do full type inference on the program so that if the program compiles, it's guaranteed that all accessed properties must exist? For example, each property name could correspond to a Haskell typeclass and the compiler could check the soundness using Hindley-Milner.

Note that like with most compilers, the compiler would be checking types on variables, not objects. Therefore it would reject some programs all of whose property accesses are valid. Accepting a program if and only if all its accesses are valid is obviously uncomputable, and that is not what I am suggesting the compiler do.

Does this method of type checking work? Would it be practical? If so, why is it not used as a compromise between dynamically and statically typed languages?

Best Answer

Suppose we have a functional language where objects don't have explicitly defined types, but where named properties can nonetheless be accessed on objects. Is it then possible for the compiler to trace throughout the program which properties could be accessed on which variables and do full type inference on the program so that if the program compiles, it's guaranteed that all accessed properties must exist? For example, each property name could correspond to a Haskell typeclass and the compiler could check the soundness using Hindley-Milner.

Yeah, we can do something a lot like that! But I doubt that it would be practical.

Let's take a look at what this might look like. Consider the following ordinary Python function (taken from https://github.com/fchollet/keras/blob/master/tests/keras/test_sequential_model.py):

def test_sequential_pop():
    model = Sequential()
    model.add(Dense(num_hidden, input_dim=input_dim))
    model.add(Dense(num_class))
    model.compile(loss='mse', optimizer='sgd')
    x = np.random.random((batch_size, input_dim))
    y = np.random.random((batch_size, num_class))
    model.fit(x, y, epochs=1)
    model.pop()
    assert len(model.layers) == 1
    assert model.output_shape == (None, num_hidden)
    model.compile(loss='mse', optimizer='sgd')
    y = np.random.random((batch_size, num_hidden))
    model.fit(x, y, epochs=1)

Suppose we want to do duck typing, and we want type inference as well. Python doesn't have type inference, of course, but Haskell does, and we can convince Haskell to do something a lot like duck typing.

Duck typing means that we don't really care about the actual types of all the things we're using; all we care about is that the things can be used together, in the way we're using them. In order to make Haskell happy with that, we'll use implicit parameters in order to create objects and access their properties. Just like we want, the compiler will trace what properties are being accessed on which variables, and so forth.

Our Haskell code might look like this:

testSequentialPop = do
    model <- ?newSequential
    ?newDense numHidden (Just inputDim) >>= ?addLayer model
    ?newDense numClass Nothing >>= ?addLayer model
    ?compileModel model "mse" "sgd"
    x <- ?getRandom [batchSize, inputDim]
    y <- ?getRandom [batchSize, numClass]
    ?fitModel model x y 1
    ?popModel model
    ?assert (?length (?layers model) == 1)
    ?assert (?outputShape model == [Nothing, Just numHidden])
    ?compileModel model "mse" "sgd"
    y2 <- ?getRandom [batchSize, numHidden]
    ?fitModel model x y 1

So far, so good. This code will compile just fine. If we write the rest of the program, it will run just fine, too.

So what's the problem? Let's ask GHC what the type of testSequentialPop is. GHC says:

testSequentialPop
  :: (Eq a5, Eq a7, Monad m, Num a2, Num a3, Num a5, Num a7, Num t2,
      Num a9, ?addLayer::t -> a1 -> m a, ?assert::Bool -> m a6,
      ?compileModel::t -> [Char] -> [Char] -> m a8,
      ?fitModel::t -> t1 -> t1 -> a9 -> m b, ?getRandom::[t2] -> m t1,
      ?layers::t -> t3, ?length::t3 -> a5,
      ?newDense::a2 -> Maybe a3 -> m a1, ?newSequential::m t,
      ?outputShape::t -> [Maybe a7], ?popModel::t -> m a4) =>
     m b

Ooh, that's pretty complicated.

The problem here is that the function has to mention every operation it ever performs on the input objects. If you had a large program that does complicated things with input objects, you could end up with a type which contains hundreds, maybe thousands of constraints.

Type inference will help things a little bit, but not that much. Type inference sometimes saves programmers from having to calculate types themselves, or from having to type them out in full. Programmers will still have to understand types in order to figure out how functions can be used, and to diagnose what's causing type errors.


That said, there are tools which do something similar to this. For example, Checkmarx makes a static code analysis tool which can detect certain security vulnerabilities, such as SQL injection attacks, by tracing how objects are created and used. A SQL injection attack is essentially a duck typing error: you're creating an object (a string containing user input), and then performing an operation (using it as a SQL query) on it, even though that object does not support that operation. Checkmarx traces the entire path of this object, from creation to use, and, if it finds any problems, it shows you the entire path.

I don't know if it would be feasible to extend this idea so that it works on all operations, rather than just operations that are a security hazard.