Simple explanation of clojure protocols

clojureprotocols

I'm trying to understand clojure protocols and what problem they are supposed to solve. Does anyone have a clear explanation of the whats and whys of clojure protocols?

Best Answer

The purpose of Protocols in Clojure is to solve the Expression Problem in an efficient manner.

So, what's the Expression Problem? It refers to the basic problem of extensibility: our programs manipulate data types using operations. As our programs evolve, we need to extend them with new data types and new operations. And particularly, we want to be able to add new operations which work with the existing data types, and we want to add new data types which work with the existing operations. And we want this to be true extension, i.e. we don't want to modify the existing program, we want to respect the existing abstractions, we want our extensions to be separate modules, in separate namespaces, separately compiled, separately deployed, separately type checked. We want them to be type-safe. [Note: not all of these make sense in all languages. But, for example, the goal to have them type-safe makes sense even in a language like Clojure. Just because we can't statically check type-safety doesn't mean that we want our code to randomly break, right?]

The Expression Problem is, how do you actually provide such extensibility in a language?

It turns out that for typical naive implementations of procedural and/or functional programming, it is very easy to add new operations (procedures, functions), but very hard to add new data types, since basically the operations work with the data types using some sort of case discrimination (switch, case, pattern matching) and you need to add new cases to them, i.e. modify existing code:

func print(node):
  case node of:
    AddOperator => print(node.left) + '+' + print(node.right)
    NotOperator => '!' + print(node)

func eval(node):
  case node of:
    AddOperator => eval(node.left) + eval(node.right)
    NotOperator => !eval(node)

Now, if you want to add a new operation, say, type-checking, that's easy, but if you want to add a new node type, you have to modify all the existing pattern matching expressions in all operations.

And for typical naive OO, you have the exact opposite problem: it is easy to add new data types which work with the existing operations (either by inheriting or overriding them), but it is hard to add new operations, since that basically means modifying existing classes/objects.

class AddOperator(left: Node, right: Node) < Node:
  meth print:
    left.print + '+' + right.print

  meth eval
    left.eval + right.eval

class NotOperator(expr: Node) < Node:
  meth print:
    '!' + expr.print

  meth eval
    !expr.eval

Here, adding a new node type is easy, because you either inherit, override or implement all required operations, but adding a new operation is hard, because you need to add it either to all leaf classes or to a base class, thus modifying existing code.

Several languages have several constructs for solving the Expression Problem: Haskell has typeclasses, Scala has implicit arguments, Racket has Units, Go has Interfaces, CLOS and Clojure have Multimethods. There are also "solutions" which attempt to solve it, but fail in one way or another: Interfaces and Extension Methods in C# and Java, Monkeypatching in Ruby, Python, ECMAScript.

Note that Clojure actually already has a mechanism for solving the Expression Problem: Multimethods. The problem that OO has with the EP is that they bundle operations and types together. With Multimethods they are separate. The problem that FP has is that they bundle the operation and the case discrimination together. Again, with Multimethods they are separate.

So, let's compare Protocols with Multimethods, since both do the same thing. Or, to put it another way: Why Protocols if we already have Multimethods?

The main thing Protocols offer over Multimethods is Grouping: you can group multiple functions together and say "these 3 functions together form Protocol Foo". You cannot do that with Multimethods, they always stand on their own. For example, you could declare that a Stack Protocol consists of both a push and a pop function together.

So, why not just add the capability to group Multimethods together? There's a purely pragmatic reason, and it is why I used the word "efficient" in my introductory sentence: performance.

Clojure is a hosted language. I.e. it is specifically designed to be run on top of another language's platform. And it turns out that pretty much any platform that you would like Clojure to run on (JVM, CLI, ECMAScript, Objective-C) has specialized high-performance support for dispatching solely on the type of the first argument. Clojure Multimethods OTOH dispatch on arbitrary properties of all arguments.

So, Protocols restrict you to dispatch only on the first argument and only on its type (or as a special case on nil).

This is not a limitation on the idea of Protocols per se, it is a pragmatic choice to get access to the performance optimizations of the underlying platform. In particular, it means that Protocols have a trivial mapping to JVM/CLI Interfaces, which makes them very fast. Fast enough, in fact, to be able to rewrite those parts of Clojure which are currently written in Java or C# in Clojure itself.

Clojure has actually already had Protocols since version 1.0: Seq is a Protocol, for example. But until 1.2, you couldn't write Protocols in Clojure, you had to write them in the host language.

Related Solutions

Java – Calling clojure from java

Update: Since this answer was posted, some of the tools available have changed. After the original answer, there is an update including information on how to build the example with current tools.

It isn't quite as simple as compiling to a jar and calling the internal methods. There do seem to be a few tricks to make it all work though. Here's an example of a simple Clojure file that can be compiled to a jar:

(ns com.domain.tiny
  (:gen-class
    :name com.domain.tiny
    :methods [#^{:static true} [binomial [int int] double]]))

(defn binomial
  "Calculate the binomial coefficient."
  [n k]
  (let [a (inc n)]
    (loop [b 1
           c 1]
      (if (> b k)
        c
        (recur (inc b) (* (/ (- a b) b) c))))))

(defn -binomial
  "A Java-callable wrapper around the 'binomial' function."
  [n k]
  (binomial n k))

(defn -main []
  (println (str "(binomial 5 3): " (binomial 5 3)))
  (println (str "(binomial 10042 111): " (binomial 10042 111)))
)

If you run it, you should see something like:

(binomial 5 3): 10
(binomial 10042 111): 49068389575068144946633777...

And here's a Java program that calls the -binomial function in the tiny.jar.

import com.domain.tiny;

public class Main {

    public static void main(String[] args) {
        System.out.println("(binomial 5 3): " + tiny.binomial(5, 3));
        System.out.println("(binomial 10042, 111): " + tiny.binomial(10042, 111));
    }
}

It's output is:

(binomial 5 3): 10.0
(binomial 10042, 111): 4.9068389575068143E263

The first piece of magic is using the :methods keyword in the gen-class statement. That seems to be required to let you access the Clojure function something like static methods in Java.

The second thing is to create a wrapper function that can be called by Java. Notice that the second version of -binomial has a dash in front of it.

And of course the Clojure jar itself must be on the class path. This example used the Clojure-1.1.0 jar.

Update: This answer has been re-tested using the following tools:

Clojure 1.5.1
Leiningen 2.1.3
JDK 1.7.0 Update 25

The Clojure Part

First create a project and associated directory structure using Leiningen:

C:\projects>lein new com.domain.tiny

Now, change to the project directory.

C:\projects>cd com.domain.tiny

In the project directory, open the project.clj file and edit it such that the contents are as shown below.

(defproject com.domain.tiny "0.1.0-SNAPSHOT"
  :description "An example of stand alone Clojure-Java interop"
  :url "http://clarkonium.net/2013/06/java-clojure-interop-an-update/"
  :license {:name "Eclipse Public License"
  :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.5.1"]]
  :aot :all
  :main com.domain.tiny)

Now, make sure all of the dependencies (Clojure) are available.

C:\projects\com.domain.tiny>lein deps

You may see a message about downloading the Clojure jar at this point.

Now edit the Clojure file C:\projects\com.domain.tiny\src\com\domain\tiny.clj such that it contains the Clojure program shown in the original answer. (This file was created when Leiningen created the project.)

Much of the magic here is in the namespace declaration. The :gen-class tells the system to create a class named com.domain.tiny with a single static method called binomial, a function taking two integer arguments and returning a double. There are two similarly named functions binomial, a traditional Clojure function, and -binomial and wrapper accessible from Java. Note the hyphen in the function name -binomial. The default prefix is a hyphen, but it can be changed to something else if desired. The -main function just makes a couple of calls to the binomial function to assure that we are getting the correct results. To do that, compile the class and run the program.

C:\projects\com.domain.tiny>lein run

You should see output shown in the original answer.

Now package it up in a jar and put it someplace convenient. Copy the Clojure jar there too.

C:\projects\com.domain.tiny>lein jar
Created C:\projects\com.domain.tiny\target\com.domain.tiny-0.1.0-SNAPSHOT.jar
C:\projects\com.domain.tiny>mkdir \target\lib

C:\projects\com.domain.tiny>copy target\com.domain.tiny-0.1.0-SNAPSHOT.jar target\lib\
        1 file(s) copied.

C:\projects\com.domain.tiny>copy "C:<path to clojure jar>\clojure-1.5.1.jar" target\lib\
        1 file(s) copied.

The Java Part

Leiningen has a built-in task, lein-javac, that should be able to help with the Java compilation. Unfortunately, it seems to be broken in version 2.1.3. It can't find the installed JDK and it can't find the Maven repository. The paths to both have embedded spaces on my system. I assume that is the problem. Any Java IDE could handle the compilation and packaging too. But for this post, we're going old school and doing it at the command line.

First create the file Main.java with the contents shown in the original answer.

To compile java part

javac -g -cp target\com.domain.tiny-0.1.0-SNAPSHOT.jar -d target\src\com\domain\Main.java

Now create a file with some meta-information to add to the jar we want to build. In Manifest.txt, add the following text

Class-Path: lib\com.domain.tiny-0.1.0-SNAPSHOT.jar lib\clojure-1.5.1.jar
Main-Class: Main

Now package it all up into one big jar file, including our Clojure program and the Clojure jar.

C:\projects\com.domain.tiny\target>jar cfm Interop.jar Manifest.txt Main.class lib\com.domain.tiny-0.1.0-SNAPSHOT.jar lib\clojure-1.5.1.jar

To run the program:

C:\projects\com.domain.tiny\target>java -jar Interop.jar
(binomial 5 3): 10.0
(binomial 10042, 111): 4.9068389575068143E263

The output is essentially identical to that produced by Clojure alone, but the result has been converted to a Java double.

As mentioned, a Java IDE will probably take care of the messy compilation arguments and the packaging.

Binary protocols v. text protocols

Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.

But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:

Request:

GET /files/image.jpg HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.01 [en] (Win95; I)
Host: hal.etc.com.au
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

Response:

HTTP/1.1 200 OK
Date: Mon, 19 Jan 1998 03:52:51 GMT
Server: Apache/1.2.4
Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT
ETag: "61a85-17c3-343b08dc"
Content-Length: 60830
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg

<binary data goes here>

Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like