Scala – Is Scala idiomatic coding style just a cool trap for writing inefficient code

performancescala

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.

While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.

Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.

Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).

def removeMaxCool(xs: List[Int]) = {
  val maxIndex = xs.indexOf(xs.max);
  xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}

It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.

Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.

def removeMaxFast(xs: List[Int]) = {
    var res = ArrayBuffer[Int]()
    var max = xs.head
    var first = true;   
    for (x <- xs) {
        if (first) {
            first = false;
        } else {
            if (x > max) {
                res.append(max)
                max = x
            } else {
                res.append(x)
            }
        }
    }
    res.toList
}

Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!

So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge
obstacle to cross for greater Scala adoption. Is there a way out of this trap?

I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.

Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)

Best Answer

Let's discuss a fallacy in the question:

So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge obstacle to cross for greater Scala adoption. Is there a way out of this trap?

This is presumed, with absolutely no evidence backing it up. If false, the question is moot.

Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.

Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!

Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.

The code calls the following methods on it:

res.append(max)
res.append(x)

and

res.toList

Let's consider first append.

append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.
Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!

So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.

Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.

One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.

Now let's consider toList. To put it simply, it traverses the whole list to create a new list.

So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.

The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!

So, let's consider improvements.

On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.

On the functional algorithm, it could be rewritten as:

val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after

That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.

And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.

So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.

But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.

EDIT

First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.

Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:

Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.
Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.

The benchmark code is here, and the results are here.

Related Solutions

Java – Efficiency of Java “Double Brace Initialization”

Here's the problem when I get too carried away with anonymous inner classes:

2009/05/27  16:35             1,602 DemoApp2$1.class
2009/05/27  16:35             1,976 DemoApp2$10.class
2009/05/27  16:35             1,919 DemoApp2$11.class
2009/05/27  16:35             2,404 DemoApp2$12.class
2009/05/27  16:35             1,197 DemoApp2$13.class

/* snip */

2009/05/27  16:35             1,953 DemoApp2$30.class
2009/05/27  16:35             1,910 DemoApp2$31.class
2009/05/27  16:35             2,007 DemoApp2$32.class
2009/05/27  16:35               926 DemoApp2$33$1$1.class
2009/05/27  16:35             4,104 DemoApp2$33$1.class
2009/05/27  16:35             2,849 DemoApp2$33.class
2009/05/27  16:35               926 DemoApp2$34$1$1.class
2009/05/27  16:35             4,234 DemoApp2$34$1.class
2009/05/27  16:35             2,849 DemoApp2$34.class

/* snip */

2009/05/27  16:35               614 DemoApp2$40.class
2009/05/27  16:35             2,344 DemoApp2$5.class
2009/05/27  16:35             1,551 DemoApp2$6.class
2009/05/27  16:35             1,604 DemoApp2$7.class
2009/05/27  16:35             1,809 DemoApp2$8.class
2009/05/27  16:35             2,022 DemoApp2$9.class

These are all classes which were generated when I was making a simple application, and used copious amounts of anonymous inner classes -- each class will be compiled into a separate class file.

The "double brace initialization", as already mentioned, is an anonymous inner class with an instance initialization block, which means that a new class is created for each "initialization", all for the purpose of usually making a single object.

Considering that the Java Virtual Machine will need to read all those classes when using them, that can lead to some time in the bytecode verfication process and such. Not to mention the increase in the needed disk space in order to store all those class files.

It seems as if there is a bit of overhead when utilizing double-brace initialization, so it's probably not such a good idea to go too overboard with it. But as Eddie has noted in the comments, it's not possible to be absolutely sure of the impact.

Just for reference, double brace initialization is the following:

List<String> list = new ArrayList<String>() {{
    add("Hello");
    add("World!");
}};

It looks like a "hidden" feature of Java, but it is just a rewrite of:

List<String> list = new ArrayList<String>() {

    // Instance initialization block
    {
        add("Hello");
        add("World!");
    }
};

So it's basically a instance initialization block that is part of an anonymous inner class.

Joshua Bloch's Collection Literals proposal for Project Coin was along the lines of:

List<Integer> intList = [1, 2, 3, 4];

Set<String> strSet = {"Apple", "Banana", "Cactus"};

Map<String, Integer> truthMap = { "answer" : 42 };

Sadly, it didn't make its way into neither Java 7 nor 8 and was shelved indefinitely.

Experiment

Here's the simple experiment I've tested -- make 1000 ArrayLists with the elements "Hello" and "World!" added to them via the add method, using the two methods:

Method 1: Double Brace Initialization

List<String> l = new ArrayList<String>() {{
  add("Hello");
  add("World!");
}};

Method 2: Instantiate an ArrayList and add

List<String> l = new ArrayList<String>();
l.add("Hello");
l.add("World!");

I created a simple program to write out a Java source file to perform 1000 initializations using the two methods:

Test 1:

class Test1 {
  public static void main(String[] s) {
    long st = System.currentTimeMillis();

    List<String> l0 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    List<String> l1 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    /* snip */

    List<String> l999 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    System.out.println(System.currentTimeMillis() - st);
  }
}

Test 2:

class Test2 {
  public static void main(String[] s) {
    long st = System.currentTimeMillis();

    List<String> l0 = new ArrayList<String>();
    l0.add("Hello");
    l0.add("World!");

    List<String> l1 = new ArrayList<String>();
    l1.add("Hello");
    l1.add("World!");

    /* snip */

    List<String> l999 = new ArrayList<String>();
    l999.add("Hello");
    l999.add("World!");

    System.out.println(System.currentTimeMillis() - st);
  }
}

Please note, that the elapsed time to initialize the 1000 ArrayLists and the 1000 anonymous inner classes extending ArrayList is checked using the System.currentTimeMillis, so the timer does not have a very high resolution. On my Windows system, the resolution is around 15-16 milliseconds.

The results for 10 runs of the two tests were the following:

Test1 Times (ms)           Test2 Times (ms)
----------------           ----------------
           187                          0
           203                          0
           203                          0
           188                          0
           188                          0
           187                          0
           203                          0
           188                          0
           188                          0
           203                          0

As can be seen, the double brace initialization has a noticeable execution time of around 190 ms.

Meanwhile, the ArrayList initialization execution time came out to be 0 ms. Of course, the timer resolution should be taken into account, but it is likely to be under 15 ms.

So, there seems to be a noticeable difference in the execution time of the two methods. It does appear that there is indeed some overhead in the two initialization methods.

And yes, there were 1000 .class files generated by compiling the Test1 double brace initialization test program.

Rogue DHCP Server Can’t be found

On one of the affected Windows clients start a packet capture (Wireshark, Microsoft Network Monitor, Microsoft Message Analyzer, etc.), then from an elevated command prompt run ipconfig /release. The DHCP client will send a DHCPRELEASE message to the DHCP server that it obtained it's ip address from. This should allow you to obtain the MAC address of the rogue DHCP server, which you can then track down in your switch MAC address table to find out which switch port it's connected to, then trace that switch port to the network jack and the device plugged into it.

Best Answer

Related Solutions

Java – Efficiency of Java “Double Brace Initialization”

Rogue DHCP Server Can’t be found

Related Topic