Electrical – Divide 32-bit by 16-bit number on a dsPIC

compilermicrocontroller

I'm working with a dsPIC33EP64GS506, and here you can find its datasheet: link.

In the dsPIC datasheet (page 294) it says that it supports 4 divide instructions:

DIV.S    Wm,Wn    Signed 16/16-bit integer divide
DIV.SD   Wm,Wn    Signed 32/16-bit integer divide
DIV.U    Wm,Wn    Unsigned 16/16-bit integer divide
DIV.UD   Wm,Wn    Unsigned 32/16-bit integer divide

Now, when I try to divide two variables in C, a 32-bit by a 16-bit variable, compiler for some reason decides not to use these instructions, but the ___divsi3 function instead, which is a software implementation of the divide algorithm. I use Microchip's free xc16 compiler version. Take this simple C code for example:

volatile int32_t num = 65537;
volatile int16_t den = 16384;
volatile int16_t res = num/den;

compiles to the following assembly code:

MOV [W15-22], W0
MOV [W15-20], W1
MOV [W15-18], W2
ASR W2, #15, W3
RCALL ___divsi3
MOV W0, [W15-16]

How can I convince a compiler to use DIV.SD instruction to divide these two variables?

Best Answer

I managed to find a solution. Apparently, there is a dozen of __bultin functions used to force the compiler (xc16) to use specific assembly instructions or series of instructions. For example, to force the compiler to use specific DIV instructions, one can use one of the following __builtin functions:

signed int __builtin_divf(signed int num, signed int den);
signed int __builtin_divmodsd(signed long dividend, signed int divisor, signed int *remainder);
unsigned int __builtin_divmodud(unsigned long dividend, unsigned int divisor, unsigned int *remainder);
int __builtin_divsd(const long num, const int den);
unsigned int __builtin_divud(const unsigned long num, const unsigned int den);

However, one must be certain that the result (i.e., the quotient) fits to a 16-bit register, otherwise, results are unexpected. All these builtin_div functions execute within 18 clock cycles, while ___divsi3 function which compiler normally uses for 32/16-bit division takes up to 500 clock cycles.

To go back to my example from the beginning, the C code would look like this:

volatile int32_t num = 65537;
volatile int16_t den = 16384;
volatile int16_t res = __builtin_divsd(num, den);

The compiler produces the following assembly code:

MOV [W15-16], W0
MOV [W15-20], W1
MOV [W15-18], W2
REPEAT #0x11
DIV.SD W4, W2
MOV W0, [W15-14]

Case closed.