Electronic – 32-bit unsigned binary integer to 8-bit BCD in AVR ASM for ATtiny. How to make it more efficient

assemblyavr

I wrote a program in AVR ASM for converting 32-bit unsigned binary numbers to 8 digit decimals based on the shift-add-3. (I know that 32-bit is more than 8 digit, but I only need 8.)

The 32-bit input is in R16-R19 (low-high).

The 8 digit output is in R20-R24 (low-high), 2 number / byte, one in the lower nibble, one in the higher nibble.

My problem: It takes ~1500 cycles to compute a 16-bit number and ~2000 cycles to compute a 32-bit.

Can anybody suggest me a faster, more professional method for this? Running a 2000 cycle procedure on a ATtiny at 32,768 Khz is not something I am comfortable with.

Memory usage map:

Memory map for BinaryToBCD

Definitions:

.def    a0  =   r16
.def    a1  =   r17
.def    a2  =   r18
.def    a3  =   r19

.def    b0  =   r20
.def    b1  =   r21
.def    b2  =   r22
.def    b3  =   r23

.def    i   =   r24
.def    j   =   r25

The code:

BinaryToBCD:
    clr     b0
    clr     b1
    clr     b2
    clr     b3
    ldi     i,  32
    sts     0x0068, i       ;(SRAM s8)

BinaryToBCD_1:
    clc
    rol     a0
    rol     a1
    rol     a2
    rol     a3
    rol     b0
    rol     b1
    rol     b2
    rol     b3

    lds     i, 0x0068       ;(SRAM s8)
    dec     i
    sts     0x0068, i       ;(SRAM s8)
    brne    BinaryToBCD_2
    ret

BinaryToBCD_2:
    cpi     b0,     0
    breq    BinaryToBCD_3
    mov     i,      b0
    rcall   Add3ToNibbles
    mov     b0,     i

BinaryToBCD_3:
    cpi     b1,     0
    breq    BinaryToBCD_4
    mov     i,      b1
    rcall   Add3ToNibbles
    mov     b1,     i

BinaryToBCD_4:
    cpi     b2,     0
    breq    BinaryToBCD_5
    mov     i,      b2
    rcall   Add3ToNibbles
    mov     b2,     i

BinaryToBCD_5:
    cpi     b3,     0
    breq    BinaryToBCD_1
    mov     i,      b3
    rcall   Add3ToNibbles
    mov     b3,     i
    rjmp    BinaryToBCD_1


Add3ToNibbles:
    mov     j,      i
    andi    j,      0b00001111
    cpi     j,      5
    in      j,      SREG
    sbrs    j,      0
    subi    i,      -3

    mov     j,      i
    swap    j
    andi    j,      0b00001111
    cpi     j,      5
    in      j,      SREG
    sbrs    j,      0
    subi    i,      -48
    ret

Best Answer

This is based on venny's approach (venny called it triangulation), expressed on a "pseudo-C":

uint32 x; // input variable to convert

w = { 2, 1, 4, 7, 4, 8, 3, 6, 4, 8 }; // 2^31
r = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; // initial result = 0

for (i = 31; i >= 0; i --)
{
   if ( 2^i  AND x )  // is x's bit i up?
      add(r, w);      // if yes, 1 ASCII ADD and 9 ASCII ADD w/CARRY MAX
   divide(w, 2)       // 10 SHIFT RIGHT MAX
}

Routines add and divide are not needed explanation, imo.