One method is to use direct loads to the halves of period. While this looks complicated in C, it usually will generate very tight assembly, i.e. 2 loads and 2 stores.
((uint8_t*)(&period))[0] = TCNT0L;
((uint8_t*)(&period))[1] = TCNT0H;
Sometimes using the array math can cause issues so you could try this:
*((uint8_t*)(&period)) = TCNT0L;
*((uint8_t*)(&period) + 1) = TCNT0H;
This actually produces optimal code. Look at how there are 12 bytes used.
((unsigned char*)(&period))[0] = TCNT0L;
dc: 82 b7 in r24, 0x32 ; 50
de: e6 e8 ldi r30, 0x86 ; 134
e0: f0 e0 ldi r31, 0x00 ; 0
e2: 80 83 st Z, r24
((unsigned char*)(&period))[1] = TCNT0H;
e4: 84 b3 in r24, 0x14 ; 20
e6: 81 83 std Z+1, r24 ; 0x01
If you did this with assembly, it would probably seem better to do it like this. It is also 12 bytes, so they are equivalent.
dc: 82 b7 in r24, 0x32 ; 50
de: 80 93 86 00 sts 0x0086, r24
e2: 84 b3 in r24, 0x14 ; 20
e4: 80 93 87 00 sts 0x0087, r24
Of course, when I say "equivalent", I mean regarding code size. If time is more important, then you have to look at the cycles. In this case it looks like the assembly version is 6 cycles and the compiler's version is 8 cycles.
Best Answer
On many AVRs, this can be made faster (but not smaller) via the status register's T bit:
This requires only 4 cycles (vs. 5: either 2+0+1+2 or 1+2+2+0) and always updates
PORTD
at the fourth cycle regardless of the bit value.Caveats:
PORTD
between thein
andout
instructions, that update will be reverted by theout
.r0
's value must be preserved (r2
in this example, but it can be any).cbi
andsbi
instructions, so there's no speed difference on those targets.