Electronic – AVR GCC: How to improve code optimization

attinyavrcgccoptimization

I tried to compile the following C code:

period = TCNT0L;
period |= ((unsigned int)TCNT0H<<8);

The assembler code I'm getting is the following:

    period = TCNT0L;
  d2:   22 b7           in  r18, 0x32   ; 50
  d4:   30 e0           ldi r19, 0x00   ; 0
  d6:   30 93 87 00     sts 0x0087, r19
  da:   20 93 86 00     sts 0x0086, r18
    period |= ((unsigned int)TCNT0H<<8);
  de:   44 b3           in  r20, 0x14   ; 20
  e0:   94 2f           mov r25, r20
  e2:   80 e0           ldi r24, 0x00   ; 0
  e4:   82 2b           or  r24, r18
  e6:   93 2b           or  r25, r19
  e8:   90 93 87 00     sts 0x0087, r25
  ec:   80 93 86 00     sts 0x0086, r24

So instead of 4 instructions it gets 11!

I tried to choose O1, O2, O3 and Os optimization options. The result is the same (except that O3 option optimized away this code at all).

I could write the source code in the following way:

period = TCNT0L | ((unsigned int)TCNT0H<<8);

I will get smaller, but still not optimal code:

  de:   22 b7           in  r18, 0x32   ; 50
  e0:   34 b3           in  r19, 0x14   ; 20
  e2:   93 2f           mov r25, r19
  e4:   80 e0           ldi r24, 0x00   ; 0
  e6:   82 2b           or  r24, r18
  e8:   90 93 87 00     sts 0x0087, r25
  ec:   80 93 86 00     sts 0x0086, r24

However I will not have a guaranty that the lower byte will be accessed first any more (this is essential requirement to keep 16-bit reading correct). And still the code has many extra unnecessary instructions.

Am I able to change compiler options and/or change the source code to make it better? I'd avoid go to assembler.

UPDATE1:

I tried the code @caveman suggested:

((unsigned char*)(&period))[0] = TCNT0L;
((unsigned char*)(&period))[1] = TCNT0H;

But the result is also not very good:

    ((unsigned char*)(&period))[0] = TCNT0L;
  dc:   82 b7           in  r24, 0x32   ; 50
  de:   e6 e8           ldi r30, 0x86   ; 134
  e0:   f0 e0           ldi r31, 0x00   ; 0
  e2:   80 83           st  Z, r24
    ((unsigned char*)(&period))[1] = TCNT0H;
  e4:   84 b3           in  r24, 0x14   ; 20
  e6:   81 83           std Z+1, r24    ; 0x01

Best Answer

One method is to use direct loads to the halves of period. While this looks complicated in C, it usually will generate very tight assembly, i.e. 2 loads and 2 stores.

((uint8_t*)(&period))[0] = TCNT0L;
((uint8_t*)(&period))[1] = TCNT0H;

Sometimes using the array math can cause issues so you could try this:

*((uint8_t*)(&period)) = TCNT0L;
*((uint8_t*)(&period) + 1) = TCNT0H;

This actually produces optimal code. Look at how there are 12 bytes used.

  ((unsigned char*)(&period))[0] = TCNT0L;
  dc:   82 b7           in  r24, 0x32   ; 50
  de:   e6 e8           ldi r30, 0x86   ; 134
  e0:   f0 e0           ldi r31, 0x00   ; 0
  e2:   80 83           st  Z, r24
    ((unsigned char*)(&period))[1] = TCNT0H;
  e4:   84 b3           in  r24, 0x14   ; 20
  e6:   81 83           std Z+1, r24    ; 0x01

If you did this with assembly, it would probably seem better to do it like this. It is also 12 bytes, so they are equivalent.

  dc:   82 b7           in  r24, 0x32   ; 50
  de:   80 93 86 00     sts 0x0086, r24
  e2:   84 b3           in  r24, 0x14   ; 20
  e4:   80 93 87 00     sts 0x0087, r24

Of course, when I say "equivalent", I mean regarding code size. If time is more important, then you have to look at the cycles. In this case it looks like the assembly version is 6 cycles and the compiler's version is 8 cycles.

Related Solutions

Electronic – AVR GCC : Global / Static Array not getting initialized properly

Found the solution. As suggested by members at avrfreaks.net here the problem was that by using the default makefile the .data part of code was not getting included in the final hex file. as a result the ram was getting initialized by default value (0xFF) since it could not find the array values (in the .data part). Using a custom makefile with the flag -j .data in avr-objcopy solved the problem.

Electronic – Why GCC compiler omitting some code

Since in one comment you state that "each CPU tick is worthy" I suggest using some inline assembly to make your delays loop just as you want. This solution is superior to the various volatile or -O0 because it makes clear what your intent is.

unsigned char i = 10;
__asm__ volatile ( "loop: subi    %0, 0x01\n\t"
                   "      brne    loop"
                   : "+rm" (i)
                   : /* no inputs */
                   : /* no dirty registers to decleare*/);

That should do the trick. The volatile thing is there to tell the compiler "I know this does not do anything, just keep it and trust me". The three asm "statements" are quite self explanatory, you can use any register instead of r24, I believe the compiler likes lower registers so you might want to use a high one. After the first : you should list output (read and write) c variables, and there's none, after the second : you should list input (ronly) c variables, again, there is none, and the third parameter is a comma separated list of modified registers, in this case r24. I am not sure if you should include also the status register since the ZERO flag changes of course, I did not include it.

edit edited answer as OP requested. Some notes.

The "+rm" before (i) means that you are letting the compiler decide to place i in memory or in a register. That's a good thing in most cases since the compiler can optimize better if it's free. In your case I believe you want to keep only the r constraint to force i to be a register.

Best Answer

Related Solutions

Electronic – AVR GCC : Global / Static Array not getting initialized properly

Electronic – Why GCC compiler omitting some code

Related Topic