Python – How to Systematically Evaluate Python Script Performance

performancepython

How do I know if my code is running fast enough? Is there a measurable way to test the speed & performance of my code?

For example, I have script that is reading CSV files and writing new CSV files while using Numpy to calculate statistics. Below, I'm using cProfiler for my Python script but after seeing resulting stats, what do I do next? In this case, I can see that the methods mean, astype, reduce from numpy, method writerow from csv and method append of python lists is taking a significant portion of the time.

How can I know if my code can improve or not?

  python -m cProfile -s cumulative OBSparser.py
     176657699 function calls (176651606 primitive calls) in 528.419 seconds
  Ordered by: cumulative time
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.003    0.003  528.421  528.421 OBSparser.py:1(<module>)
       1    0.000    0.000  526.874  526.874 OBSparser.py:45(start)
       1  165.767  165.767  526.874  526.874 OBSparser.py:48(parse)
 7638018    6.895    0.000  179.890    0.000 {method 'mean' of 'numpy.ndarray' objects}
 7638018   56.780    0.000  172.995    0.000 _methods.py:53(_mean)
 7628171   57.232    0.000   57.232    0.000 {method 'writerow' of '_csv.writer' objects}
 7700878   52.580    0.000   52.580    0.000 {method 'reduce' of 'numpy.ufunc' objects}
 7615219   50.640    0.000   50.640    0.000 {method 'astype' of 'numpy.ndarray' objects}
 7668436   28.595    0.000   36.853    0.000 _methods.py:43(_count_reduce_items)
15323753   31.503    0.000   31.503    0.000 {numpy.core.multiarray.array}
45751805   13.439    0.000   13.439    0.000 {method 'append' of 'list' objects}

Can somebody explain the best practices?

Best Answer

How do I know if my code is running fast enough?

That very much depends on your use case -- your program runs for 1.4 hours which might or might not be fast enough. If this is a one-time process 1.4 hours is not that much - spending any time on optimization is hardly worth the investment. On the other hand, if this is a process that should run e.g. once every hour, clearly it is worth finding a less time-consuming approach

Is there a measurable way to test the speed & performance of my code?

yes, profiling - and you've already done that. That's a good start.

what do I do next?

Best practices include:

  1. measure baseline performance (before any optimization)
  2. analyze the parts where the program spends most of its time
  3. reduce run-time complexity (the Big-O type)
  4. check for the potential of parallel computation
  5. compare against baseline performance

You have already done 1. So let's move to 2.

Analysis

In your case the program spends most of it's time in line OBSparser.py:48, of which a third is spent calculating the mean 7638018 times.

As the profiler output shows, this is on an ndarray, i.e. using numpy, and it doesn't look like it's taking a lot of time on a per-call basis. A quick calculation confirms that:

179' / 7.638.018 = 23.6 microseconds per call

Since that's already implemented in C-code (numpy), there is likely not much you can do to improve the per-call performance by changing the actual mean code (or using another library).

However, ask yourself several questions:

  1. How can the number of calls to .mean() be reduced?
  2. Can the calls to .mean() be implemented more efficiently?
  3. Could the data be grouped and each group be processed independently?
  4. ask more questions

Other calls worth looking at are to .astype() and reduce, I focused on .mean()simply for illustration.

Reducing complexity

Not knowing what your code actually does, here's my 5cents on the specifics, anyway:

On 2., a quick check on my i7 core reveals that for ndarray.mean() to take 20-odd microseconds, this takes around 50 values. So I'm guessing your are grouping values and then calling .mean() on every group. There might be more efficient ways - a search on numpy group aggregate performance or some variant of that might find you some helpful pointers.

Parallel computation

On 3. I'm guessing multi-processing is unlikely to be a solution here, since your computations seem mostly CPU-bound and the overhead of launching seperate tasks and exchanging data probably outweighs the benefits.

However there might be some use of SIMD-approach, i.e. vectorization. Again, just a hunch.

Compare against baseline performance

To reduce the time it takes to re-profile, consider subsetting your data such that the performance behavior is still visible (i.e. 23 us per call to .mean()) but where the total running time is under maybe 1-2 minutes, or even less. This will help you evaluate several approaches before applying them to your program in full. There is no use in running the full process over and over again just to test some small optimization.

Related Topic