Java vs C++ – Why Java Can Be Faster Than C++

cjavaperformance

Sometimes Java outperforms C++ in benchmarks. Of course, sometimes C++ outperforms.

See the following links:

But how is this even possible? It boggles my mind that interpreted bytecode could ever be faster than a compiled language.

Can someone please explain? Thanks!

Best Answer

First, most JVMs include a compiler, so "interpreted bytecode" is actually pretty rare (at least in benchmark code -- it's not quite as rare in real life, where your code is usually more than a few trivial loops that get repeated extremely often).

Second, a fair number of the benchmarks involved appear to be quite biased (whether by intent or incompetence, I can't really say). Just for example, years ago I looked at some of the source code linked from one of the links you posted. It had code like this:

  init0 = (int*)calloc(max_x,sizeof(int));
  init1 = (int*)calloc(max_x,sizeof(int));
  init2 = (int*)calloc(max_x,sizeof(int));
  for (x=0; x<max_x; x++) {
    init2[x] = 0;
    init1[x] = 0;
    init0[x] = 0;
  }

Since calloc provides memory that's already zeroed, using the for loop to zero it again is obviously useless. This was followed (if memory serves) by filling the memory with other data anyway (and no dependence on it being zeroed), so all the zeroing was completely unnecessary anyway. Replacing the code above with a simple malloc (like any sane person would have used to start with) improved the speed of the C++ version enough to beat the Java version (by a fairly wide margin, if memory serves).

Consider (for another example) the methcall benchmark used in the blog entry in your last link. Despite the name (and how things might even look), the C++ version of this is not really measuring much about method call overhead at all. The part of the code that turns out to be critical is in the Toggle class:

class Toggle {
public:
    Toggle(bool start_state) : state(start_state) { }
    virtual ~Toggle() {  }
    bool value() {
        return(state);
    }
    virtual Toggle& activate() {
        state = !state;
        return(*this);
    }
    bool state;
};

The critical part turns out to be the state = !state;. Consider what happens when we change the code to encode the state as an int instead of a bool:

class Toggle {
    enum names{ bfalse = -1, btrue = 1};
    const static names values[2];
    int state;

public:
    Toggle(bool start_state) : state(values[start_state]) 
    { }
    virtual ~Toggle() {  }
    bool value() {  return state==btrue;    }

    virtual Toggle& activate() {
        state = -state;
        return(*this);
    }
};

This minor change improves the overall speed by about a 5:1 margin. Even though the benchmark was intended to measure method call time, in reality most of what it was measuring was the time to convert between int and bool. I'd certainly agree that the inefficiency shown by the original is unfortunate -- but given how rarely it seems to arise in real code, and the ease with which it can be fixed when/if it does arise, I have a difficult time thinking of it as meaning much.

In case anybody decides to re-run the benchmarks involved, I should also add that there's an almost equally trivial modification to the Java version that produces (or at least at one time produced -- I haven't re-run the tests with a recent JVM to confirm they still do) a fairly substantial improvement in the Java version as well. The Java version has an NthToggle::activate() that looks like this:

public Toggle activate() {
this.counter += 1;
if (this.counter >= this.count_max) {
    this.state = !this.state;
    this.counter = 0;
}
return(this);
}

Changing this to call the base function instead of manipulating this.state directly gives quite a substantial speed improvement (though not enough to keep up with the modified C++ version).

So, what we end up with is a false assumption about interpreted byte codes vs. some of the worst benchmarks (I've) ever seen. Neither is giving a meaningful result.

My own experience is that with equally experienced programmers paying equal attention to optimizing, C++ will beat Java more often than not -- but (at least between these two), the language will rarely make as much difference as the programmers and design. The benchmarks being cited tell us more about the (in)competence/(dis)honesty of their authors than they do about the languages they purport to benchmark.

[Edit: As implied in one place above but never stated as directly as I probably should have, the results I'm quoting are those I got when I tested this ~5 years ago, using C++ and Java implementations that were current at that time. I haven't rerun the tests with current implementations. A glance, however, indicates that the code hasn't been fixed, so all that would have changed would be the compiler's ability to cover up the problems in the code.]

If we ignore the Java examples, however, it is actually possible for interpreted code to run faster than compiled code (though difficult and somewhat unusual).

The usual way this happens is that the code being interpreted is much more compact than the machine code, or it's running on a CPU that has a larger data cache than code cache.

In such a case, a small interpreter (e.g., the inner interpreter of a Forth implementation) may be able to fit entirely in the code cache, and the program it's interpreting fits entirely in the data cache. The cache is typically faster than main memory by a factor of at least 10, and often much more (a factor of 100 isn't particularly rare any more).

So, if the cache is faster than main memory by a factor of N, and it takes fewer than N machine code instructions to implement each byte code, the byte code should win (I'm simplifying, but I think the general idea should still be apparent).