Big O Notation – Expressing Complexity of Nested Loops Over Different Datasets

big o

I'm more or less teaching myself big O notation, so please forgive me if this is a duplicate of a question which applied to my question without me having the wisdom to realise it. For my own amusement/personal development, I'm trying to express the complexity of a function which works on 2 different datasets which relate to each other: I need to iterate over a list, and for each item in a list, iterate over another list checking if the items are related.

Pseudocode:

for things in list_a {
  for things in list_b {
    if (list_a.thing relates to list_b.thing) go ping
  }
}

What I have on paper currently is O(n) * O(m), but I'm wondering if it should be expressed as O(n*m) instead, where n = size of set b and m = size of set a? Or is this something else entirely? Again, apologies if this is a stupid/duplicate question, but I couldn't find anyone specifically discussing a nested loop over two different datasets. This answer would suggest that it's O(n^2), but that feels wrong to me, since the sizes of the two datasets themselves are different and independent.

Best Answer

It'd be O(n*m) where the worst case is n = m or n*n thus O(n^2). We are interested in worst case run time for Big O. If the data sets sizes are different then it will still be in O(n^2) since we can't guarantee that, for example, the second dataset will be a logarithmic relation to the first. If we could, they wouldn't necessarily be independent. They might be but again, we are looking at worst case and O(nlogn) is within O(n^2).

An example would be an algorithm involving edges and nodes in a graph. Worst case, you might have every node connected to every other node but you'll never be able to be certain that there is a specific relation between the two beforehand without some kind of preprocessing.

Related Solutions

Algorithm Analysis – How Meaningful Is Big-O Time Complexity?

With small n Big O it is just about useless and it's the hidden constants or even actual implementation that will more likely be the deciding factor for which algorithm is better. This is why most sorting functions in standard libraries will switch to a faster insertion sort for those last 5 elements. The only way to figure out which one will be better is benchmarking with realistic data sets.

Big O is good for large data sets and discussing on how an algorithm will scale, it's better to have a O(n log n) algorithm than a O(n^2) when you expect the data to grow in the future, but if the O(n^2) works fine the way it is and the input sizes will likely remain constant, just make note that you can upgrade it but leave it as is, there are likely other things you need to worry about right now.

(Note all "large" and "smalls" in the previous paragraphs are meant to be taken relatively; small can be a few million and big can be a hundred it all depends on each specific case)

Often times there will be a trade-off between time and space: for example quicksort requires O(log n) extra memory while heapsort can use O(1) extra memory, however the hidden constants in heapsort makes it less attractive (there's also the stability issue which make mergesort more attractive if you don't mind payign the extra memory costs).

Another thing to consider is database indexes, these are additional tables that require log(n) time to update when a record is added, removed or modified, but lets lookups happen much faster (O(log n) instead of O(n)). deciding on whether to add one is a constant headache for most database admins: will I have enough lookups on the index compared to the amount of time I spend updating the index?

One last thing to keep in mind: the more efficient algorithms are nearly always more complicated than the naive straight-forward one (otherwise it would be the one you would have used from the start). This means a larger surface area for bugs and code that is harder to follow, both are non-trivial issues to deal with.

Big O Notation – Average Time Complexity of Algorithms

f ∈ Θ(g)

is the same as

f ∈ Ο(g) ∧ f ∈ Ω(g)

So, if you can prove that f ∈ Ο(g) ∧ f ∈ Ω(g), then by all means use Θ.

Note that this has absolutely nothing whatsoever to do with time complexity of algorithms, average or otherwise. Θ, Ο, ο, Ω and ω are about the growth rate of functions; whether those functions describe average time/step/space complexity of an algorithm, worst-case time/step/space complexity of an algorithm, best-case time/step/space complexity of an algorithm, expected case time/step/space complexity of an algorithm, the population of China, the number of users on StackOverflow, or the size of Lara Croft's bra is completely irrelevant.

Best Answer

Related Solutions

Algorithm Analysis – How Meaningful Is Big-O Time Complexity?

Big O Notation – Average Time Complexity of Algorithms

Related Topic