Big O Notation – Expressing Complexity of Nested Loops Over Different Datasets

big o

I'm more or less teaching myself big O notation, so please forgive me if this is a duplicate of a question which applied to my question without me having the wisdom to realise it. For my own amusement/personal development, I'm trying to express the complexity of a function which works on 2 different datasets which relate to each other: I need to iterate over a list, and for each item in a list, iterate over another list checking if the items are related.

Pseudocode:

for things in list_a {
  for things in list_b {
    if (list_a.thing relates to list_b.thing) go ping
  }
}

What I have on paper currently is O(n) * O(m), but I'm wondering if it should be expressed as O(n*m) instead, where n = size of set b and m = size of set a? Or is this something else entirely? Again, apologies if this is a stupid/duplicate question, but I couldn't find anyone specifically discussing a nested loop over two different datasets. This answer would suggest that it's O(n^2), but that feels wrong to me, since the sizes of the two datasets themselves are different and independent.

Best Answer

It'd be O(n*m) where the worst case is n = m or n*n thus O(n^2). We are interested in worst case run time for Big O. If the data sets sizes are different then it will still be in O(n^2) since we can't guarantee that, for example, the second dataset will be a logarithmic relation to the first. If we could, they wouldn't necessarily be independent. They might be but again, we are looking at worst case and O(nlogn) is within O(n^2).

An example would be an algorithm involving edges and nodes in a graph. Worst case, you might have every node connected to every other node but you'll never be able to be certain that there is a specific relation between the two beforehand without some kind of preprocessing.

Related Topic