Programming languages classification / taxonothe

programming-languages

Is there a rigorous way to classify programming languages ?

If so, can the various "dimensions" be quantified ? (degree of purity)

For instance, I just went on the Shade language website (I am not affiliated with it in any way) and saw :

"semi-functional" -> But how much is that language semi-functional ? -> quantification need
"full type checking" -> So type checking can be partial -> can be quantified too ?
Objectif model / no object model…

Best Answer

You can find a lot of information here:
http://en.wikipedia.org/wiki/Programming_language
http://en.wikipedia.org/wiki/Comparison_of_programming_languages

I am not going to rewrite the Wikipedia here, just a few simple explanations, so you may better understand what the Wikipedia articles are trying to say.

One thing is how the programs are constructed. Sometimes it is about the programmer's style, but the programming language or the library can make styles easier, harder, or impossible. Multiple styles can be used in the same program.

imperative programming -- "first do this, then do this, then do that". You always write what is supposed to happen next. Even if there are more choices, e.g. based on the user's input, it is always: you ask about the input when you want to ask, then you choose what happens next based on the answer you received. This programming style is nice for short non-interactive programs; but with large programs it becomes very difficult to read and maintain. Also this is how the computer works on the hardware level (all other programming languages have to be compiled or interpreted to this), therefore this is what many programming languages use.
event-based programming -- you are not doing the whole program, but rather some part of it, which must fit into a given framework. The framework tells you "now this happened", and you are supposed to react on the event. You don't decide what happens later. The framework calls you when your reaction is needed, and you only react to that. (For example, if you make a computer game, you react on "game started", "mouse clicked", "key pressed", "screen needs to be displayed" and "game is quitting" events. You can remember e.g. the coordinates of your spaceship; the reaction to pressing a key would be changing the coordinate, and the reaction to screen needing to be display would be drawing the spaceship on the current coordinates.)
object-oriented programming -- you split the program into many smaller parts, and you describe each part separately; from inside: what data it contains, and from outside: what actions it can do. (For example, your object Spaceship has data: x coordinate, y coordinate, number of lives. It can: move up, move down, move left, move right, return the number of lives, remove one life.) The important part is polymorphism, which means that multiple objects can have the same interface, i.e. can do the same kind of action. (E.g. Spaceship, Missile, and Asteroid all have coordinates, and can paint themselves on a screen.) So a part of program can treat all of them in a unique way. (E.g. a gravity field could influence the coordinates of all MovableObjects without having to know their details.)
aspect-oriented programming -- another way to split the program into smaller pieces, but this time the pieces represent the functionality of the program. (Sorry, I don't know much about this.)
functional programming -- most natural for mathematical calculations. You define what "f(x)" means and what "g(x, y)" means, and then you ask program to calculate "g(f(3), f(5))". Each function depends only on its input data, nothing else; e.g. the function "f(x)" can only use the value of "x", so if you call "f(7)" twice, you get the same result. Also the function "f(x)" cannot modify the value of "x". If you want to make a program where data change in time, you have to somehow include the time or other events as parameters of the functions. (For example, the position of spaceship in time T+1 could be described as a function of position of the spaceship in time T and the state of keyboard buttons.) These limitations give the advantage of avoiding come bugs and making concurrent programming (multiple processors or even computers running parts of the same program) much easier.

Another thing is how you use the data. Do you use only input and output values of the functions, or can you also store the values in variables a access them randomly? If you specify a value, can you change it later? If you have multiple types of data (e.g. texts, numbers, coordinates, spaceship description), can any kind of data go to any kind of a variable, or do you have to specify which variable contains which kind of data? -- Typically, the more freedom you have here, the more opportunities you have to make bugs.

But I guess that to understand this fully, one has to become familiar with a few programming languages representing the various styles, see the advantages and disadvantages of each, and also how each language can try to overcome its specific problems. Some languages give you more choices. (For example in Java you can write imperative, event-based, object-oriented and even kind-of-functional programs. The language itself is object-oriented, some objects use events, and there is nothing preventing you to put all your code in one function. If you use immutable objects and avoid side effects, you get some advantages of functional programming.)

Related Solutions

Programming Languages – Dealing with Unknown Parameter Names in Functions

I agree that the way functions are often used can be a confusing part of writing code, and especially reading code.

The answer to this problem partly depends on the language. As you mentioned, C# has named parameters. Objective-C's solution to this problem involves more descriptive method names. For example, stringByReplacingOccurrencesOfString:withString: is a method with clear parameters.

In Groovy, some functions take maps, allowing for a syntax like the following:

restClient.post(path: 'path/to/somewhere',
            body: requestBody,
            requestContentType: 'application/json')

In general, you can solve this issue by limiting the number of parameters you pass to a function. I think 2-3 is a good limit. If it appears that a function needs more parameters, it causes me to re-think the design. But, this can be harder to answer generally. Sometimes you are trying to do too much in a function. Sometimes it makes sense to consider a class for storing your parameters. Also, in practice, I often find that functions which take large numbers of parameters normally have many of them as optional.

Even in a language like Objective-C it makes sense to limit the number of parameters. One reason is that many parameters are optional. For an example, see rangeOfString: and its variations in NSString.

A pattern I often use in Java is to use a fluent-style class as a parameter. For example:

something.draw(new Box().withHeight(5).withWidth(20))

This uses a class as a parameter, and with a fluent-style class, makes for easily readable code.

The above Java snippet also helps where the ordering of parameters may not be so obvious. We normally assume with coordinates that X comes before Y. And I normally see height before width as a convention, but that is still not very clear (something.draw(5, 20)).

I've also seen some functions like drawWithHeightAndWidth(5, 20) but even these can't take too many parameters, or you'd start to lose readability.

Duck Typing – Type Inference with Duck Typing

Suppose we have a functional language where objects don't have explicitly defined types, but where named properties can nonetheless be accessed on objects. Is it then possible for the compiler to trace throughout the program which properties could be accessed on which variables and do full type inference on the program so that if the program compiles, it's guaranteed that all accessed properties must exist? For example, each property name could correspond to a Haskell typeclass and the compiler could check the soundness using Hindley-Milner.

Yeah, we can do something a lot like that! But I doubt that it would be practical.

Let's take a look at what this might look like. Consider the following ordinary Python function (taken from https://github.com/fchollet/keras/blob/master/tests/keras/test_sequential_model.py):

def test_sequential_pop():
    model = Sequential()
    model.add(Dense(num_hidden, input_dim=input_dim))
    model.add(Dense(num_class))
    model.compile(loss='mse', optimizer='sgd')
    x = np.random.random((batch_size, input_dim))
    y = np.random.random((batch_size, num_class))
    model.fit(x, y, epochs=1)
    model.pop()
    assert len(model.layers) == 1
    assert model.output_shape == (None, num_hidden)
    model.compile(loss='mse', optimizer='sgd')
    y = np.random.random((batch_size, num_hidden))
    model.fit(x, y, epochs=1)

Suppose we want to do duck typing, and we want type inference as well. Python doesn't have type inference, of course, but Haskell does, and we can convince Haskell to do something a lot like duck typing.

Duck typing means that we don't really care about the actual types of all the things we're using; all we care about is that the things can be used together, in the way we're using them. In order to make Haskell happy with that, we'll use implicit parameters in order to create objects and access their properties. Just like we want, the compiler will trace what properties are being accessed on which variables, and so forth.

Our Haskell code might look like this:

testSequentialPop = do
    model <- ?newSequential
    ?newDense numHidden (Just inputDim) >>= ?addLayer model
    ?newDense numClass Nothing >>= ?addLayer model
    ?compileModel model "mse" "sgd"
    x <- ?getRandom [batchSize, inputDim]
    y <- ?getRandom [batchSize, numClass]
    ?fitModel model x y 1
    ?popModel model
    ?assert (?length (?layers model) == 1)
    ?assert (?outputShape model == [Nothing, Just numHidden])
    ?compileModel model "mse" "sgd"
    y2 <- ?getRandom [batchSize, numHidden]
    ?fitModel model x y 1

So far, so good. This code will compile just fine. If we write the rest of the program, it will run just fine, too.

So what's the problem? Let's ask GHC what the type of testSequentialPop is. GHC says:

testSequentialPop
  :: (Eq a5, Eq a7, Monad m, Num a2, Num a3, Num a5, Num a7, Num t2,
      Num a9, ?addLayer::t -> a1 -> m a, ?assert::Bool -> m a6,
      ?compileModel::t -> [Char] -> [Char] -> m a8,
      ?fitModel::t -> t1 -> t1 -> a9 -> m b, ?getRandom::[t2] -> m t1,
      ?layers::t -> t3, ?length::t3 -> a5,
      ?newDense::a2 -> Maybe a3 -> m a1, ?newSequential::m t,
      ?outputShape::t -> [Maybe a7], ?popModel::t -> m a4) =>
     m b

Ooh, that's pretty complicated.

The problem here is that the function has to mention every operation it ever performs on the input objects. If you had a large program that does complicated things with input objects, you could end up with a type which contains hundreds, maybe thousands of constraints.

Type inference will help things a little bit, but not that much. Type inference sometimes saves programmers from having to calculate types themselves, or from having to type them out in full. Programmers will still have to understand types in order to figure out how functions can be used, and to diagnose what's causing type errors.

That said, there are tools which do something similar to this. For example, Checkmarx makes a static code analysis tool which can detect certain security vulnerabilities, such as SQL injection attacks, by tracing how objects are created and used. A SQL injection attack is essentially a duck typing error: you're creating an object (a string containing user input), and then performing an operation (using it as a SQL query) on it, even though that object does not support that operation. Checkmarx traces the entire path of this object, from creation to use, and, if it finds any problems, it shows you the entire path.

I don't know if it would be feasible to extend this idea so that it works on all operations, rather than just operations that are a security hazard.