Java – Erlang Processes vs Java Threads

elixirerlangjava

I am reading "Elixir in Action" book by Saša Jurić, and in the first chapter it says:

Erlang processes are completely isolated from each other. They share
no memory, and a crash of one process doesn’t cause a crash of other
processes.

Isn't that true for Java threads as well? I mean when Java thread crashes, it too does not crash other threads – especially, if we are looking at request-processing threads (lets exclude the main thread from this disucussion)

Best Answer

Repeat after me: "These are different paradigms"

Say that aloud 20 times or so -- it is our mantra for the moment.

If we really must compare apples and oranges, let's at least consider where the common aspects of "being fruit" intersect.

Java "objects" are a Java programmer's basic unit of computation. That is, an object (basically a struct with arms and legs that has encapsulation somewhat more strictly enforced than in C++) is the primary tool with which you model the world. You think "This object knows/has Data {X,Y,Z} and performs Functions {A(),B(),C()} over it, carries the Data everywhere it goes, and can communicate with other objects by calling functions/methods defined as part of their public interface. It is a noun, and that noun does stuff.". That is to say, you orient your thought process around these units of computation. The default case is that things that happen amongst the objects occur in sequence, and a crash interrupts that sequence. They are called "objects" and hence (if we disregard Alan Kay's original meaning) we get "object orientation".

Erlang "processes" are an Erlang programmer's basic unit of computation. A process (basically a self-contained sequential program running in its own time and space) is the primary tool with which an Erlanger models the world(1). Similar to how Java objects define a level of encapsulation, Erlang processes also define the level of encapsulation, but in the case of Erlang the units of computation are completely cut off from one another. You cannot call a method or function on another process, nor can you access any data that lives within it, nor does one process even run within the same timing context as any other processes, and there is no guarantee about the ordering of message reception relative to other processes which may be sending messages. They may as well be on different planets entirely (and, come to think of it, this is actually plausible). They can crash independently of one another and the other processes are only impacted if they have deliberately elected to be impacted (and even this involves messaging: essentially registering to receive a suicide note from the dead process which itself is not guaranteed to arrive in any sort of order relative to the system as a whole, to which you may or may not choose to react).

Java deals with complexity directly in compound algorithms: how objects work together to solve a problem. It is designed to do this within a single execution context, and the default case in Java is sequential execution. Multiple threads in Java indicates multiple running contexts and is a very complex topic because of the impact activity in different timing contexts have on one another (and the system as a whole: hence defensive programming, exception schemes, etc.). Saying "multi-threaded" in Java means something different than it does in Erlang, in fact this is never even said in Erlang because it is always the base case. Note here that Java threads imply segregation as pertains to time, not memory or visible references -- visibility in Java is controlled manually by choosing what is private and what is public; universally accessible elements of a system must be either designed to be "threadsafe" and reentrant, sequentialized via queueing mechanisms, or employ locking mechanisms. In short: scheduling is a manually managed issue in threaded/concurrent Java programs.

Erlang separates each processes' running context in terms of execution timing (scheduling), memory access and reference visibility and in doing so simplifies each component of an algorithm by isolating it completely. This is not just the default case, this is the only case available under this model of computation. This comes at the cost of never knowing exactly the sequence of any given operation once a part of your processing sequences crosses a message barrier -- because messages are all essentially network protocols and there are no method calls that can be guaranteed to execute within a given context. This would be analogous to creating a JVM instance per object, and only permitting them to communicate across sockets -- that would be ridiculously cumbersome in Java, but is the way Erlang is designed to work (incidentally, this is also the basis of the concept of writing "Java microservices" if one ditches the web-oriented baggage the buzzword tends to entail -- Erlang programs are, by default, swarms of microservices). Its all about tradeoffs.

These are different paradigms. The closest commonality we can find is to say that from the programmer's perspective, Erlang processes are analogous to Java objects. If we must find something to compare Java threads to... well, we're simply not going to find something like that in Erlang, because there is no such comparable concept in Erlang. To beat a dead horse: these are different paradigms. If you write a few non-trivial programs in Erlang this will become readily apparent.

Note that I'm saying "these are different paradigms" but have not even touched the topic of OOP vs FP. The difference between "thinking in Java" and "thinking in Erlang" is more fundamental than OOP vs FP. (In fact, one could write an OOP language for the Erlang VM that works like Java -- for example: An implementation of OOP objects in Erlang.)

While it is true that Erlang's "concurrency oriented" or "process oriented" foundation is closer to what Alan Kay had in mind when he coined the term "object oriented"(2), that is not really the point here. What Kay was getting at was that one can reduce the cognitive complexity of a system by cutting your computrons into discrete chunks, and isolation is necessary for that. Java accomplishes this in a way that leaves it still fundamentally procedural in nature, but structures code around a special syntax over higher-order dispatching closures called "class definitions". Erlang does this by splitting the running context up per object. This means Erlang thingies can't call methods on one another, but Java thingies can. This means Erlang thingies can crash in isolation but Java thingies can't. A vast number of implications flow from this basic difference -- hence "different paradigms". Tradeoffs.

Footnotes:

Incidentally, Erlang implements a version of "the actor model", but we don't use this terminology as Erlang predates the popularization of this model. Joe was unaware of it when he designed Erlang and wrote his thesis.
Alan Kay has said quite a bit about what he meant when he coined the term "object oriented", the most interesting being his take on messaging (one-way notification from one independent process with its own timing and memory to another) VS calls (function or method calls within a sequential execution context with shared memory) -- and how the lines blur a bit between programming interface as presented by the programming language and the implementation underneath.

Related Solutions

Java – What are the differences between a HashMap and a Hashtable in Java

There are several differences between HashMap and Hashtable in Java:

Hashtable is synchronized, whereas HashMap is not. This makes HashMap better for non-threaded applications, as unsynchronized Objects typically perform better than synchronized ones.
Hashtable does not allow null keys or values. HashMap allows one null key and any number of null values.
One of HashMap's subclasses is LinkedHashMap, so in the event that you'd want predictable iteration order (which is insertion order by default), you could easily swap out the HashMap for a LinkedHashMap. This wouldn't be as easy if you were using Hashtable.

Since synchronization is not an issue for you, I'd recommend HashMap. If synchronization becomes an issue, you may also look at ConcurrentHashMap.

Java – Is Java “pass-by-reference” or “pass-by-value”

Java is always pass-by-value. Unfortunately, when we deal with objects we are really dealing with object-handles called references which are passed-by-value as well. This terminology and semantics easily confuse many beginners.

It goes like this:

public static void main(String[] args) {
    Dog aDog = new Dog("Max");
    Dog oldDog = aDog;

    // we pass the object to foo
    foo(aDog);
    // aDog variable is still pointing to the "Max" dog when foo(...) returns
    aDog.getName().equals("Max"); // true
    aDog.getName().equals("Fifi"); // false
    aDog == oldDog; // true
}

public static void foo(Dog d) {
    d.getName().equals("Max"); // true
    // change d inside of foo() to point to a new Dog instance "Fifi"
    d = new Dog("Fifi");
    d.getName().equals("Fifi"); // true
}

In the example above aDog.getName() will still return "Max". The value aDog within main is not changed in the function foo with the Dog "Fifi" as the object reference is passed by value. If it were passed by reference, then the aDog.getName() in main would return "Fifi" after the call to foo.

Likewise:

public static void main(String[] args) {
    Dog aDog = new Dog("Max");
    Dog oldDog = aDog;

    foo(aDog);
    // when foo(...) returns, the name of the dog has been changed to "Fifi"
    aDog.getName().equals("Fifi"); // true
    // but it is still the same dog:
    aDog == oldDog; // true
}

public static void foo(Dog d) {
    d.getName().equals("Max"); // true
    // this changes the name of d to be "Fifi"
    d.setName("Fifi");
}

In the above example, Fifi is the dog's name after call to foo(aDog) because the object's name was set inside of foo(...). Any operations that foo performs on d are such that, for all practical purposes, they are performed on aDog, but it is not possible to change the value of the variable aDog itself.

For more information on pass by reference and pass by value, consult the following SO answer: https://stackoverflow.com/a/430958/6005228. This explains more thoroughly the semantics and history behind the two and also explains why Java and many other modern languages appear to do both in certain cases.

Best Answer

Related Solutions

Java – What are the differences between a HashMap and a Hashtable in Java

Java – Is Java “pass-by-reference” or “pass-by-value”

Related Topic