C++ – How to measure performance indepently of the machine used

cperformanceprofiler

I had a routine that was performing good. However, I had to make a change to it. The change improved the routine's precision but hurt the performance.

The routine is lots of math calculations and is probably CPU bond (I still have to do more rigorous testing on this, but I'm 99% sure). It is written in C++ (compiler is Borland C++ 6).

I want to measure the performance of the routine now, first I thought about measuring the execution time, but that is a kind of flawed approach in my opinion since there could be much more things going on.

I ran into this topic then: Techniques to measure application performance – Stack Overflow. I liked the idea of measuring through MFlops.

My boss suggested to try to use some kind of measurement by cpu clock cycles, so the tests would be machine-independent, however, I think this approach falls into the MFlops testing.

In my opinion measuring both things (time of execution and MFlops) is the way to go, but I would like to hear from the stackoverflow experts what do you guys think.

What is the way to go to measure performance of a routine that is known as CPU bond?

Best Answer

CPU clock cycles don't mean that much either, if your application is memory-bound. On a faster CPU, you'll just spend more CPU cycles waiting on the same cache miss. (Mathematical apps are probably not I/O bound).

Another problem is that the number of clock cycles for a certain instruction sequence will still vary across architectures (and that even includes between Intel Core1 / Core2). So, as an absolute measure of performance, clock cycles on one CPU is hardly an improvement.

I would argue they're in fact worse as a measure. Unlike time, users don't care about cycles. This matters especially with modern multi-core CPUs. An "inefficient" algorithms using twice the number of cycles and 3 cores will finish in 67% of the time. Users will probably like that.