I think the main reason is that BlueJ runs your code in a VM with a debugger attached. BlueJ actually has two VMs running: the main one, and the one with your code inside (the user VM, aka the debug VM). The main VM has a debugger attached to the user VM, which allows it to do things like pause user code, inspect the state of objects and all sorts. I imagine this probably adds a fair bit of overhead (perhaps inhibiting the JIT compilation?); the speed that code executes is not a major concern for a learning environment like BlueJ, as long as it's reasonable.
One could go and make wild speculations here, or even educated guesses, but the truth is that it probably doesn't matter, and if it does, it's going to depend on a lot of other factors. The only way to find out is to profile, and when you do, first be sure to rule out other factors (system load, fragmentation, etc.).
My bet would be that the difference is going to be negligible, and that there's much more to win elsewhere - set up sensible indexes, profile joins vs. subqueries, use caching where possible, experiment with many small queries vs. few large queries, avoid unnecessary trips to the database, buffer accumulated results, etc.
The thing is, an integer vs. bool optimization will never give you more than a linear speed increase, and the increase won't be large - if you get 5%, consider yourself lucky. Those 5% will get smashed if your query does an outer join requiring full scans on both joined tables when one table is really large and you add one row to the other.
Best Answer
First, most JVMs include a compiler, so "interpreted bytecode" is actually pretty rare (at least in benchmark code -- it's not quite as rare in real life, where your code is usually more than a few trivial loops that get repeated extremely often).
Second, a fair number of the benchmarks involved appear to be quite biased (whether by intent or incompetence, I can't really say). Just for example, years ago I looked at some of the source code linked from one of the links you posted. It had code like this:
Since
calloc
provides memory that's already zeroed, using thefor
loop to zero it again is obviously useless. This was followed (if memory serves) by filling the memory with other data anyway (and no dependence on it being zeroed), so all the zeroing was completely unnecessary anyway. Replacing the code above with a simplemalloc
(like any sane person would have used to start with) improved the speed of the C++ version enough to beat the Java version (by a fairly wide margin, if memory serves).Consider (for another example) the
methcall
benchmark used in the blog entry in your last link. Despite the name (and how things might even look), the C++ version of this is not really measuring much about method call overhead at all. The part of the code that turns out to be critical is in the Toggle class:The critical part turns out to be the
state = !state;
. Consider what happens when we change the code to encode the state as anint
instead of abool
:This minor change improves the overall speed by about a 5:1 margin. Even though the benchmark was intended to measure method call time, in reality most of what it was measuring was the time to convert between
int
andbool
. I'd certainly agree that the inefficiency shown by the original is unfortunate -- but given how rarely it seems to arise in real code, and the ease with which it can be fixed when/if it does arise, I have a difficult time thinking of it as meaning much.In case anybody decides to re-run the benchmarks involved, I should also add that there's an almost equally trivial modification to the Java version that produces (or at least at one time produced -- I haven't re-run the tests with a recent JVM to confirm they still do) a fairly substantial improvement in the Java version as well. The Java version has an NthToggle::activate() that looks like this:
Changing this to call the base function instead of manipulating
this.state
directly gives quite a substantial speed improvement (though not enough to keep up with the modified C++ version).So, what we end up with is a false assumption about interpreted byte codes vs. some of the worst benchmarks (I've) ever seen. Neither is giving a meaningful result.
My own experience is that with equally experienced programmers paying equal attention to optimizing, C++ will beat Java more often than not -- but (at least between these two), the language will rarely make as much difference as the programmers and design. The benchmarks being cited tell us more about the (in)competence/(dis)honesty of their authors than they do about the languages they purport to benchmark.
[Edit: As implied in one place above but never stated as directly as I probably should have, the results I'm quoting are those I got when I tested this ~5 years ago, using C++ and Java implementations that were current at that time. I haven't rerun the tests with current implementations. A glance, however, indicates that the code hasn't been fixed, so all that would have changed would be the compiler's ability to cover up the problems in the code.]
If we ignore the Java examples, however, it is actually possible for interpreted code to run faster than compiled code (though difficult and somewhat unusual).
The usual way this happens is that the code being interpreted is much more compact than the machine code, or it's running on a CPU that has a larger data cache than code cache.
In such a case, a small interpreter (e.g., the inner interpreter of a Forth implementation) may be able to fit entirely in the code cache, and the program it's interpreting fits entirely in the data cache. The cache is typically faster than main memory by a factor of at least 10, and often much more (a factor of 100 isn't particularly rare any more).
So, if the cache is faster than main memory by a factor of N, and it takes fewer than N machine code instructions to implement each byte code, the byte code should win (I'm simplifying, but I think the general idea should still be apparent).