Algorithm Analysis – How Meaningful Is Big-O Time Complexity?

algorithm-analysisalgorithmsbig o

Programmers often talk about the time complexity of an algorithm, e.g. O(log n) or O(n^2).

Time complexity classifications are made as the input size goes to infinity, but ironically infinite input size in computation is not used.

Put another way, the classification of an algorithm is based on a situation that algorithm will never be in: where n = infinity.

Also, consider that a polynomial time algorithm where the exponent is huge is just as useless as an exponential time algorithm with tiny base (e.g., 1.00000001^n) is useful.

Given this, how much can I rely on the Big-O time complexity to advise choice of an algorithm?

Best Answer

With small n Big O it is just about useless and it's the hidden constants or even actual implementation that will more likely be the deciding factor for which algorithm is better. This is why most sorting functions in standard libraries will switch to a faster insertion sort for those last 5 elements. The only way to figure out which one will be better is benchmarking with realistic data sets.

Big O is good for large data sets and discussing on how an algorithm will scale, it's better to have a O(n log n) algorithm than a O(n^2) when you expect the data to grow in the future, but if the O(n^2) works fine the way it is and the input sizes will likely remain constant, just make note that you can upgrade it but leave it as is, there are likely other things you need to worry about right now.

(Note all "large" and "smalls" in the previous paragraphs are meant to be taken relatively; small can be a few million and big can be a hundred it all depends on each specific case)

Often times there will be a trade-off between time and space: for example quicksort requires O(log n) extra memory while heapsort can use O(1) extra memory, however the hidden constants in heapsort makes it less attractive (there's also the stability issue which make mergesort more attractive if you don't mind payign the extra memory costs).

Another thing to consider is database indexes, these are additional tables that require log(n) time to update when a record is added, removed or modified, but lets lookups happen much faster (O(log n) instead of O(n)). deciding on whether to add one is a constant headache for most database admins: will I have enough lookups on the index compared to the amount of time I spend updating the index?

One last thing to keep in mind: the more efficient algorithms are nearly always more complicated than the naive straight-forward one (otherwise it would be the one you would have used from the start). This means a larger surface area for bugs and code that is harder to follow, both are non-trivial issues to deal with.

Related Topic