Considering the design limitation of hardware (clock speed, memory, etc…), is there any better way to write IF statements more efficiently in C language (e.g. run on Arduino), more specifically the following piece of code of an embedded software?
if (A >= 476 && A <= 524) A = 500;
else if (A > 524 && A <= 860) A = (A * 1.4) - 760;
else if (A > 136 && A < 476) A = (A * 1.4) - 685;
else if (A > 860) A = 1000;
else if (A <= 136) A = 100;
A is passed as value by a sub-routine and this value is then transformed into a different value according to its position in a set range (of values). It has to run as fast as possible!
UPDATE!
A is a variable of integer type i.e. int
SUMMARY:
I summarized here the options I learned from you:
a) improve structure
- consider first comparison for most probable inputs;
- eliminate && operators.
b) eliminate floating point calculation
c) create look up tables
d) assign more adequate variable type
e) write code in abbreviated form (*= , += , -= )
a) Indeed, I already put the comparison 'if (A >= 476 && A <= 524)' first, because I expect most of the values passed by the subroutine to be within the first range. So most of the time the program doesn't execute the next if statements.
b) I reviewed the floating point calculation and consider it a bad choice for this piece of code. It definitely can be avoided by using a divider of integers.
c) The lookup table seems a good idea but not for this application, because the program has at least 3 more of such clustered comparisons. Memory might become a problem.
d) The variable was assigned as int although it would never really become negative. Unsigned int is what I assume is better. But I would say it doesn't make the code faster.
e) Writing the code in abbreviated form seems also another improvement idea. I just don't know for sure how different compilers might interpret that or if it really saves so much time.
Thank you.
Best Answer
The floating point is the killer here. It appears to be doubling the amount of code generated, not that number of instructions is necessarily slower.
It has to make a number of calls to what I assume is a float to int int to float. Interestingly changing from double float (1.4) to single float (1.4F) didnt change the overall result, although again the code is making a lot of calls (I assume to float conversions or operations).
So even the times 7 has to be synthesized.
so one might try this as an alternate.
same number of instructions. to see this easier
and using shifting and adding
Duh, right A*7 = (A<<3)-1;
The (A<<2)|(A<<1)+A source gives this
But they missed a possible optimization
same number of instructions at least, ideally execute the same. Looking at these trip points
A is obviously not an 8 bit number is at least 16 bit, if it is a double or a float I give up, game over. Assuming 16 bits...
Here are the constants in hex.
Looking at this
does this help
ehh, man this instruction set is painful.
so looking at this
the compiler is smart enough to lump the A*7's together.
Like this though
It does not combine the A*7s note the 16 bit A*7 looks like this
for ease of reading and coding the comparisons in order
Might shave some instructions for the worst case path, but if your data happens to fall in a bell curve of some sort, most of the samples are say between 500 and 600, then you want to have those ranges compared first then let the others fall through. it may or may not be worth trying to get the ranges that end with A*1.4 as the last two, so that before those compares you do a single B = A * 1.4. you do the compare of one of the ranges then either subtract A = B - 685 or A = B - 760, saving some codes space, not necessarily saving on performance because with an if-then-else tree like this re-ordering, etc is not going to save you much, some paths are shorter some are longer, you may add a few or save a few instructions here and there, but it is still a pretty long path for this code on this architecture.
I didnt do a full link with a library, but looking at readelf:
Those floating point libraries are going to kill you and that answers something, even though the C standard says that A * 1.4 is a double operation the compiler is clearly doing single float, ignoring the standard.
Changing the A * 1.4 to A * 7 / 5 implements the long string of adds to implement the times 7 and does the divide with a gcc library
That is going to save you on overall code space and should save on execution when you would have normally had to deal with generic floating point libraries.
Man, I didnt realize you were using signed numbers.
both result in negative numbers, I assume coming in it is positive and going out signed?
Okay as I expected this actually gets us somewhere
we know that it is using a register for the upper half and a register for the lower half so the shift of 8 with or without the and for completeness simply tells the compiler just use the upper 8 bit register.
Now I am assuming an unsigned incoming and signed outgoing since you dont really have anything in the negative range coming in except for the everything less than or equal to 136.
if you make a table of all the answers. (it comes out to 679 unique answers, if you could only limit it to that).
Ugh, nothing is nice about these numbers.
If your input is unsigned which I suspect it is not, you could look at the upper byte using the B=A>>8 thing and if it is greater than 4 then return 1000. And that only really saves you one 16 bit compare becoming an 8 bit compare with this instruction set and the nature of your numbers.
The next optimization is get rid of the floating point and replace it with (A*7)/5. That is the number one thing you can do with this code to make it faster.
If you have a range of values you expect more than others, put those ranges first and it has to hop through less code for the more common values. If it is an even distribution then either of jonks if then else trees makes it easier to read and might save an instruction here and there. The compiler may still move the tree around to suit its needs or an optimization it may think it sees.
The killers are the two ranges that have the math, if you are willing to burn 960 bytes of program space you can pre-build look up tables for these
The lookups look something like this
which is not much more math than the A*7 the loads on a platform like this are dependent on the speed of the flash relative to the clock it is possibly a single cycle per if not then probably two. But the instruction fetches plus the code to do the divide by 5 with a generic division library for 16 bit numbers should be way way way more costly speed wise. Resource consumption-wise the float libraries are probably smaller than the 900 some bytes, maybe, you can try it and find out. If you are doing float elsewhere then you are burning that flash anyway but that doesnt excuse using float on an AVR esp in code you want to go fast, so still do the integer math if you dont use a look up table.
Short answer get rid of the A * 1.4 and replace with (A * 7) / 5. If you know the data is going to be heavy in certain ranges more than others put those ranges first for performance, otherwise make it more readable with one of jonk's if-then-else trees: The smallest range to largest or largest to smallest. You might save an instruction or two on the deeper cases.