Performance – Complexities of a Binary Search

binarycomplexityconceptsperformancesearch

I recently asked a question on CodeReview SE where using a binary search was recommended, and the author of the answer proved that it was faster than my implementation. During my research on the topic, I came across a table that shows the complexities of a binary search:

These are the complexities of a binary search −

Worst-case Best-case Average Worst-case space complexity
O(log n) O(1) O(log n) O(1)

I think I'm misunderstanding the information this table is meant to relay. I'm perceiving it as:

The average performance of a binary search is the worst-case performance.

However, if this were true, wouldn't a binary search be avoided in many cases? I'm new to this level of optimization so my impediment could just be some underlying prerequisite that I don't have knowledge of yet.

Edit: For clarity in hindsight, the question I sought answers to is below this edit. The one above it is strictly a questioning of my reasoning during my research phase that I included to demonstrate my lack of understanding. My apologies for any confusion on the matter.


What is meant by "the complexities of a binary search", what are those complexities, and what is this table actually telling me relative to those complexities?

Best Answer

The full term for this is "Time Complexity" and you'll want to use that if you are searching. This is a core concept in computer science. I am not going to attempt to explain it in detail but here's a high-level explanation:

The idea of time complexity is to understand how the performance of an algorithm relates to its input size(s). A time complexity of O(1) means 'constant time'. In other words, the performance of the algorithm doesn't change with the size of the input. I think in this case, the best case is when the item you are looking for happens to be the middle item (the first one examined). Best case performance tends to be the least interesting or useful of these measures in my experience - with the exception of when the best case is really bad.

O(log n) means that the time it takes is related to a logarithm of the input. In this case the base of that is 2. So if the list has 16 elements, it will take on average about 4 checks to find an item. If the list has 30,000 elements, on average you should expect it take something like 15 checks.

The fact that the worst case is the same as the average means that you'll never need more than that many log₂(n) checks to find something. It could be better but it will never be worse. That's actually a good thing. Some algorithms have good average times but really bad worst case scenarios. That makes it hard to predict performance in practice. It's like driving on a freeway; it might be the fastest route most of the time but if there's a bad wreck, you could be stuck for hours. If you want to make sure you get somewhere on time, you might be better off with a slower (on average) route that is more predictable.

Addendum: I failed to address the 'space complexity' entry in the table. This is a very similar concept to 'time complexity' (or more properly 'computational complexity') except that it refers to the amount of memory or storage that is required to execute the algorithm. In this case, it's constant which essentially means it doesn't take anymore memory (or other storage) to search a list of a billion items than it does to search one of 16 items. The lists themselves obviously do, but the memory allocated to the execution of the search algorithm remains the same. In other algorithms, however, it's not uncommon to trade space for time e.g. an index on a database table requires extra storage but can greatly improve the performance of lookups.

Related Topic