Be extremely careful using any of the other suggestions. It all depends on context.
I have spent a long time tracing a bugs in a system that presumed a==b
if |a-b|<epsilon
. The underlying problems were:
The implicit presumption in an algorithm that if a==b
and b==c
then a==c
.
Using the same epsilon for lines measured in inches and lines measured in mils (.001 inch). That is a==b
but 1000a!=1000b
. (This is why AlmostEqual2sComplement asks for the epsilon or max ULPS).
The use of the same epsilon for both the cosine of angles and the length of lines!
Using such a compare function to sort items in a collection. (In this case using the builtin C++ operator == for doubles produced correct results.)
Like I said: it all depends on context and the expected size of a
and b
.
BTW, std::numeric_limits<double>::epsilon()
is the "machine epsilon". It is the difference between 1.0 and the next value representable by a double. I guess that it could be used in the compare function but only if the expected values are less than 1. (This is in response to @cdv's answer...)
Also, if you basically have int
arithmetic in doubles
(here we use doubles to hold int values in certain cases) your arithmetic will be correct. For example 4.0/2.0 will be the same as 1.0+1.0. This is as long as you do not do things that result in fractions (4.0/3.0) or do not go outside of the size of an int.
It depends how exceptions are implemented. The simplest way is using setjmp and longjmp. That means all registers of the CPU are written to the stack (which already takes some time) and possibly some other data needs to be created... all this already happens in the try statement. The throw statement needs to unwind the stack and restore the values of all registers (and possible other values in the VM). So try and throw are equally slow, and that is pretty slow, however if no exception is thrown, exiting the try block takes no time whatsoever in most cases (as everything is put on the stack which cleans up automatically if the method exists).
Sun and others recognized, that this is possibly suboptimal and of course VMs get faster and faster over the time. There is another way to implement exceptions, which makes try itself lightning fast (actually nothing happens for try at all in general - everything that needs to happen is already done when the class is loaded by the VM) and it makes throw not quite as slow. I don't know which JVM uses this new, better technique...
...but are you writing in Java so your code later on only runs on one JVM on one specific system? Since if it may ever run on any other platform or any other JVM version (possibly of any other vendor), who says they also use the fast implementation? The fast one is more complicated than the slow one and not easily possible on all systems. You want to stay portable? Then don't rely on exceptions being fast.
It also makes a big difference what you do within a try block. If you open a try block and never call any method from within this try block, the try block will be ultra fast, as the JIT can then actually treat a throw like a simple goto. It neither needs to save stack-state nor does it need to unwind the stack if an exception is thrown (it only needs to jump to the catch handlers). However, this is not what you usually do. Usually you open a try block and then call a method that might throw an exception, right? And even if you just use the try block within your method, what kind of method will this be, that does not call any other method? Will it just calculate a number? Then what for do you need exceptions? There are much more elegant ways to regulate program flow. For pretty much anything else but simple math, you will have to call an external method and this already destroys the advantage of a local try block.
See the following test code:
public class Test {
int value;
public int getValue() {
return value;
}
public void reset() {
value = 0;
}
// Calculates without exception
public void method1(int i) {
value = ((value + i) / i) << 1;
// Will never be true
if ((i & 0xFFFFFFF) == 1000000000) {
System.out.println("You'll never see this!");
}
}
// Could in theory throw one, but never will
public void method2(int i) throws Exception {
value = ((value + i) / i) << 1;
// Will never be true
if ((i & 0xFFFFFFF) == 1000000000) {
throw new Exception();
}
}
// This one will regularly throw one
public void method3(int i) throws Exception {
value = ((value + i) / i) << 1;
// i & 1 is equally fast to calculate as i & 0xFFFFFFF; it is both
// an AND operation between two integers. The size of the number plays
// no role. AND on 32 BIT always ANDs all 32 bits
if ((i & 0x1) == 1) {
throw new Exception();
}
}
public static void main(String[] args) {
int i;
long l;
Test t = new Test();
l = System.currentTimeMillis();
t.reset();
for (i = 1; i < 100000000; i++) {
t.method1(i);
}
l = System.currentTimeMillis() - l;
System.out.println(
"method1 took " + l + " ms, result was " + t.getValue()
);
l = System.currentTimeMillis();
t.reset();
for (i = 1; i < 100000000; i++) {
try {
t.method2(i);
} catch (Exception e) {
System.out.println("You'll never see this!");
}
}
l = System.currentTimeMillis() - l;
System.out.println(
"method2 took " + l + " ms, result was " + t.getValue()
);
l = System.currentTimeMillis();
t.reset();
for (i = 1; i < 100000000; i++) {
try {
t.method3(i);
} catch (Exception e) {
// Do nothing here, as we will get here
}
}
l = System.currentTimeMillis() - l;
System.out.println(
"method3 took " + l + " ms, result was " + t.getValue()
);
}
}
Result:
method1 took 972 ms, result was 2
method2 took 1003 ms, result was 2
method3 took 66716 ms, result was 2
The slowdown from the try block is too small to rule out confounding factors such as background processes. But the catch block killed everything and made it 66 times slower!
As I said, the result will not be that bad if you put try/catch and throw all within the same method (method3), but this is a special JIT optimization I would not rely upon. And even when using this optimization, the throw is still pretty slow. So I don't know what you are trying to do here, but there is definitely a better way of doing it than using try/catch/throw.
Best Answer
In response to some of the other answers and comments:
It is true that as a programmer you should generally avoid premature optimization. But. This is not so true for scripting languages where the compiler does not optimize much -- or at all.
So, whenever you write something in Lua, and that is executed very often, is run in a time-critical environment or could run for a while, it is a good thing to know things to avoid (and avoid them).
This is a collection of what I found out over time. Some of it I found out over the net, but being of a suspicious nature when the interwebs are concerned I tested all of it myself. Also, I have read the Lua performance paper at Lua.org.
Some reference:
Avoid globals
This is one of the most common hints, but stating it once more can't hurt.
Globals are stored in a hashtable by their name. Accessing them means you have to access a table index. While Lua has a pretty good hashtable implementation, it's still a lot slower than accessing a local variable. If you have to use globals, assign their value to a local variable, this is faster at the 2nd variable access.
(Not that simple testing may yield different results. eg.
local x; for i=1, 1000 do x=i; end
here the for loop header takes actually more time than the loop body, thus profiling results could be distorted.)Avoid string creation
Lua hashes all strings on creation, this makes comparison and using them in tables very fast and reduces memory use since all strings are stored internally only once. But it makes string creation more expensive.
A popular option to avoid excessive string creation is using tables. For example, if you have to assemble a long string, create a table, put the individual strings in there and then use
table.concat
to join it onceIf
foo()
would return only the characterA
, this loop would create a series of strings like""
,"A"
,"AA"
,"AAA"
, etc. Each string would be hashed and reside in memory until the application finishes -- see the problem here?This method does not create strings at all during the loop, the string is created in the function
foo
and only references are copied into the table. Afterwards, concat creates a second string"AAAAAA..."
(depending on how largeC
is). Note that you could usei
instead of#ret+1
but often you don't have such a useful loop and you won't have an iterator variable you can use.Another trick I found somewhere on lua-users.org is to use gsub if you have to parse a string
This looks odd at first, the benefit is that gsub creates a string "at once" in C which is only hashed after it is passed back to lua when gsub returns. This avoids table creation, but possibly has more function overhead (not if you call
foo()
anyway, but iffoo()
is actually an expression)Avoid function overhead
Use language constructs instead of functions where possible
function
ipairs
When iterating a table, the function overhead from ipairs does not justify it's use. To iterate a table, instead use
It does exactly the same without the function call overhead (pairs actually returns another function which is then called for every element in the table while
#tbl
is only evaluated once). It's a lot faster, even if you need the value. And if you don't...Note for Lua 5.2: In 5.2 you can actually define a
__ipairs
field in the metatable, which does makeipairs
useful in some cases. However, Lua 5.2 also makes the__len
field work for tables, so you might still prefer the above code toipairs
as then the__len
metamethod is only called once, while foripairs
you would get an additional function call per iteration.functions
table.insert
,table.remove
Simple uses of
table.insert
andtable.remove
can be replaced by using the#
operator instead. Basically this is for simple push and pop operations. Here are some examples:For shifts (eg.
table.remove(foo, 1)
), and if ending up with a sparse table is not desirable, it is of course still better to use the table functions.Use tables for SQL-IN alike compares
You might - or might not - have decisions in your code like the following
Now this is a perfectly valid case, however (from my own testing) starting with 4 comparisons and excluding table generation, this is actually faster:
And since hash tables have constant look up time, the performance gain increases with every additional comparison. On the other hand if "most of the time" one or two comparisons match, you might be better off with the Boolean way or a combination.
Avoid frequent table creation
This is discussed thoroughly in Lua Performance Tips. Basically the problem is that Lua allocates your table on demand and doing it this way will actually take more time than cleaning it's content and filling it again.
However, this is a bit of a problem, since Lua itself does not provide a method for removing all elements from a table, and
pairs()
is not the performance beast itself. I have not done any performance testing on this problem myself yet.If you can, define a C function that clears a table, this should be a good solution for table reuse.
Avoid doing the same over and over
This is the biggest problem, I think. While a compiler in a non-interpreted language can easily optimize away a lot of redundancies, Lua will not.
Memoize
Using tables this can be done quite easily in Lua. For single-argument functions you can even replace them with a table and __index metamethod. Even though this destroys transparancy, performance is better on cached values due to one less function call.
Here is an implementation of memoization for a single argument using a metatable. (Important: This variant does not support a nil value argument, but is pretty damn fast for existing values.)
You could actually modify this pattern for multiple input values
Partial application
The idea is similar to memoization, which is to "cache" results. But here instead of caching the results of the function, you would cache intermediate values by putting their calculation in a constructor function that defines the calculation function in it's block. In reality I would just call it clever use of closures.
This way it is possible to easily create flexible functions that cache some of their work without too much impact on program flow.
An extreme variant of this would be Currying, but that is actually more a way to mimic functional programming than anything else.
Here is a more extensive ("real world") example with some code omissions, otherwise it would easily take up the whole page here (namely
get_color_values
actually does a lot of value checking and recognizes accepts mixed values)You can see that once the blender was created, the function only has to sanity-check a single value instead of up to eight. I even extracted the difference calculation, though it probably does not improve a lot, I hope it shows what this pattern tries to achieve.