How to Estimate Run Time of an Algorithm

algorithm-analysisalgorithms

For instance if i have an algorithm that is O(n²) and it will run for 10 seconds for a problem size of 1000. Now if i were to double the problem size to 2000 i want to know the approximate run time in seconds. I know the answer for it but i want to understand the logic on how to get to the answer. So here is what i am thinking.

N = 1000 , Therefore 1000^2 = 10 seconds
N = 2000,  Therefore (2*1000)^2, // stuck here

Now i am not sure i mean i know it will be 40 seconds because 2 to the power of 2 is 4 and then you multiply the 10 seconds by 4 and get 40 seconds but not entirely sure if this is right thinking. Was wondering if someone can break it down in simple terms?

Best Answer

Complexity classes such as O(n) or O(n²) are not meant to calculate actual running time. Instead, they are meant to compare how different algorithms scale when we change the input size.

For example, here we have two algorithms that apply frobnicate(a, b) to each matching item:

void algorithm1(Set<int> items) {
  for (int i in items) {
    for (int j in items) {
      if (i == j) {
        frobnicate(i, j);
      }
    }
  }
}

void algorithm2(Set<int> items) {
  for (int i in items) {
    frobnicate(i, i);
  }
}

Obviously, the 1st algorithm has O(n²) complexity while the second is O(n). Therefore, we can conclude that algorithm2 scales better for large inputs. But here is another O(n) algorithm:

void algorithm3(Set<int> items) {
  sleep(9 seconds);
  for (int i in items) {
    sleep(1 millisecond);
    frobnicate(i, i);
  }
}

This scales just as well as algorithm2 but will always be slower! In fact, the O(n²) algorithm1 will outperform this algorithm for small inputs! Because algorithm3 has a fixed 9-second overhead, we can't plug different numbers into the O(n) complexity formula to find out how long a different input size will take.

Example:

n=1000 => algorithm3 takes 10 seconds
n=2000 => algorithm3 takes 11 seconds, not 20 seconds!

In reality, the running time is not possible to reliably calculate. Even if you can figure out a cost for each atomic operation, hardware quirks such as cache boundaries will get in your way. The only reliable option is to do statistical sampling over different input sizes, and then fit an estimation function matching the expected complexity class to the collected data.

Let's assume we have a run time estimation function (!= complexity class) of f(n) = a·n² + b·n + c, and let's assume that b = c = 0 for simplicity (i.e. no linear factors, no constant overhead). Then, we can calculate the parameter a from the data point f(1000) = 10 seconds:

10 seconds = f(1000)
           = a·1000²
           = a·1000000
 <=> a = 10/1000000 seconds = 1/100000 seconds

Therefore, f(n) = n²/100000 seconds. If we plug in n = 2000, we get

f(2000) = 2000²/100000 seconds
        = 4000000/100000 seconds
        = 40 seconds

This is consistent with the shortcut that if we multiply the input size by n (here n = 2), the output is increased by n² (here n² = 4).

Best Answer

Related Solutions

C++ Algorithms – Fastest String Search Algorithm

Algorithm for assigning people to time slots based on preference

Related Topic