I was coding some functions in C++ and wondered how different versions of those functions would affect generated assembly code. I put different versions into the Godbolt Compiler Explorer Tool and looked at the generated assembly. It was an interesting experience to see those differences and how some versions that seemed to be more efficient take up much more assembly lines than more verbose ones and in contrast some more low level versions take up more than some "mid high level" versions (contrary to my expectations).
As one cannot judge the performance of those outputs by just looking at the line count I wondered how one can roughly estimate the performance difference between different versions?
How can I analyse different outputs to see more easily if some code output contains more potential expensive ASM calls than another output or do I have to learn assembly first in order to do this?
Best Answer
As others have pointed out, just counting clock cycles of assembler instructions will not give you a decent result for most modern CPU architectures. The execution time of a fixed piece of machine code can vary on different CPU platforms, even if the code is exactly the same. So the only reliable way of comparing the performance of such code snippets is
Obviously, learning assembly is not mandatory for this process (but it will probably be required if you want to understand the root cause for the performance differences you will observe).
That is probably not the answer you wanted to hear, I guess you were expecting a reference to some kind of web service like Mr Godbolt's site which provides you with the tools and hardware, so you don't have to buy and install them by yourself. However, I don't think there is a free service available: sensible performance comparisons need real, expensive hardware, not some cheap virtual machine replacement, and any company which can do this for you will probably try to get a decent return on their investment.