How do I know if my code is running fast enough? Is there a measurable way to test the speed & performance of my code?
For example, I have script that is reading CSV files and writing new CSV files while using Numpy to calculate statistics. Below, I'm using cProfiler for my Python script but after seeing resulting stats, what do I do next? In this case, I can see that the methods mean, astype, reduce from numpy, method writerow from csv and method append of python lists is taking a significant portion of the time.
How can I know if my code can improve or not?
python -m cProfile -s cumulative OBSparser.py
176657699 function calls (176651606 primitive calls) in 528.419 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.003 0.003 528.421 528.421 OBSparser.py:1(<module>)
1 0.000 0.000 526.874 526.874 OBSparser.py:45(start)
1 165.767 165.767 526.874 526.874 OBSparser.py:48(parse)
7638018 6.895 0.000 179.890 0.000 {method 'mean' of 'numpy.ndarray' objects}
7638018 56.780 0.000 172.995 0.000 _methods.py:53(_mean)
7628171 57.232 0.000 57.232 0.000 {method 'writerow' of '_csv.writer' objects}
7700878 52.580 0.000 52.580 0.000 {method 'reduce' of 'numpy.ufunc' objects}
7615219 50.640 0.000 50.640 0.000 {method 'astype' of 'numpy.ndarray' objects}
7668436 28.595 0.000 36.853 0.000 _methods.py:43(_count_reduce_items)
15323753 31.503 0.000 31.503 0.000 {numpy.core.multiarray.array}
45751805 13.439 0.000 13.439 0.000 {method 'append' of 'list' objects}
Can somebody explain the best practices?
Best Answer
That very much depends on your use case -- your program runs for 1.4 hours which might or might not be fast enough. If this is a one-time process 1.4 hours is not that much - spending any time on optimization is hardly worth the investment. On the other hand, if this is a process that should run e.g. once every hour, clearly it is worth finding a less time-consuming approach
yes, profiling - and you've already done that. That's a good start.
Best practices include:
You have already done 1. So let's move to 2.
Analysis
In your case the program spends most of it's time in line OBSparser.py:48, of which a third is spent calculating the mean 7638018 times.
As the profiler output shows, this is on an ndarray, i.e. using numpy, and it doesn't look like it's taking a lot of time on a per-call basis. A quick calculation confirms that:
179' / 7.638.018 = 23.6 microseconds per call
Since that's already implemented in C-code (numpy), there is likely not much you can do to improve the per-call performance by changing the actual
mean
code (or using another library).However, ask yourself several questions:
.mean()
be reduced?.mean()
be implemented more efficiently?Other calls worth looking at are to
.astype() and reduce
, I focused on.mean()
simply for illustration.Reducing complexity
Not knowing what your code actually does, here's my 5cents on the specifics, anyway:
On 2., a quick check on my i7 core reveals that for
ndarray.mean()
to take 20-odd microseconds, this takes around 50 values. So I'm guessing your are grouping values and then calling.mean()
on every group. There might be more efficient ways - a search on numpy group aggregate performance or some variant of that might find you some helpful pointers.Parallel computation
On 3. I'm guessing multi-processing is unlikely to be a solution here, since your computations seem mostly CPU-bound and the overhead of launching seperate tasks and exchanging data probably outweighs the benefits.
However there might be some use of SIMD-approach, i.e. vectorization. Again, just a hunch.
Compare against baseline performance
To reduce the time it takes to re-profile, consider subsetting your data such that the performance behavior is still visible (i.e. 23 us per call to
.mean()
) but where the total running time is under maybe 1-2 minutes, or even less. This will help you evaluate several approaches before applying them to your program in full. There is no use in running the full process over and over again just to test some small optimization.