Electronic – AVR assembly: Most fastest way to increment two combined bytes

assemblyavrefficiencyoptimizationspeed

What could be the fastest way to increment two combined bytes in assembler (assuming I'm working on a 8-bit CPU)? Currently I'm doing this:

OVF1_handler: ; TIMER1 overflow ISR

lds r21, timerhl ; load low byte into working register; 2 cycles
add r21, counter_inc ; add 1 to working register (value of counter_inc is 1); 1 cycle

brbs 0, OVF1_handler_carry ; branch if bit 0 (carry flag bit) of SREG is set; 1 cycle if false . 2 cycles if true
sts timerhl, r21 ; otherwise write value back to variable; 2 cycles
reti ; we're done

OVF1_handler_carry: ; in case of carry bit is set
    sts timerhl, r21 ; write value of low byte back to variable; 2 cycles

    lds r21, timerhh ; load high byte into working register; 2 cycles
    inc r21 ; increment it by 1 (no carry check needed here); 1 cycle
    sts timerhh, r21 ; write value of high byte back to variable; 2 cycles

reti ; we're done

So in sum there are

255 * (2+1+1+2) + (2+1+2+2+2+1+2) = 1542 cycles

to count from 0 to 256 (255 times (2+1+1+2) because no overflow plus 1 time (2+1+2+2+2+1+2) when overflow occurs).

Is my calculation correct and is there a faster way?

Best Answer

Have a bit more trust in your compiler. Write the code in C, compile it and look at the disassembly. Unsure which toolchain you use, but avr-gcc creates pretty well optimized code.

lds     r24 , lowbyte   ; 2 clocks
lds     r25 , highbyte  ; 2 clocks
adiw    r24 , 0x01      ; 2 clocks - Add Immediate to Word (= 16 bit)
sts     lowbyte  , r24  ; 2 clocks
sts     highbyte , r25  ; 2 clocks

You can disassemble the .elf file with the following command (provided you use the gcc toolchain):

avr-objdump -C -d $(src).elf

BTW: You probably need to push the used registers to stack beforehand and pop them afterwards (2 cycles each). Also remember that an interrupt (including reti) lasts at least 8 clock cycles apart from the instructions being executed.

; TIMER1_OVF            ;  4 clocks
push    r24             ;  2 clocks
IN      r24 , SREG      ;  1 clock  - save CPU flags
push    r24             ;  2 clocks
push    r25             ;  2 clocks
; do the addition above - 10 clocks
pop     r25             ;  2 clocks
pop     r24             ;  2 clocks
OUT     SREG , r24      ;  1 clock
pop     r24             ;  2 clocks
reti                    ;  4 clocks
; total 32 clock ticks