Electronic – How are numbers with decimal point handled in an MCU

floating pointmicrocontroller

Please tell me how do MCUs handle decimal numbers like '23.3', '3.24' etc? How is it stored in a memory register? I know that I have to use float data type when handling these numbers while programming. But actually what is happening inside an MCU while handling these types. Also tell me how do MCUs without FPU unit handle Float datatype.

Best Answer

Numbers inside typical micrcontrollers don't have decimal points at all. They are binary integers. There is no decimal going on inside the machine. The compiler or assembler may let you specify constants that way, but they get converted to binary before the machine sees them.

However, you can decide whatever units you like for the integer values. For example, suppose you wanted to represent dollars inside a micro. It can't natively do $3.21, but it could do 321 cents. The micro is just operating on the value 321, but you know that it represents units of 1/100 dollars.

That's just one example to illustrate the concept of arbitrary units. Often numbers are represented with several binary fraction bits. That's the same as saying each count represents a value of 2-N, where N is the number of fraction bits. This representation is called "fixed point". You decide up front how much resolution you need, and pretend there are enough bits to the right of the imagined binary point to support that resolution. For example, lets say you need to represent something to at least a resolution of 1/100. In that case you'd use at least 7 fraction bits since 27 = 128. That will actually give you a resolution of 1/128.

The machine has no idea this is going on. It will add and subtract these numbers as ordinary integers, but everything still works out. It gets a little tricky when you multiply and divide fixed point values. The product of two fixed point values with N fraction bits will have 2N fraction bits. Sometimes you just keep track of the fact that the new number has 2N fraction bits, or sometimes you might shift it right by N bits to get back to the same representation as before.

Floating point is the same thing, but the number of fraction bits are stored along with the integer part so that this adjustment can be made at runtime. Performing math operations on floating point numbers can take a bunch of cycles. Floating point hardware does all this for you so that the operations complete quickly. However, the same manipulations can be performed in software too. There is no reason you can't write a subroutine to add two floating point numbers, just that it would take a lot longer than dedicated hardware doing the same thing.

I have defined a 3-byte floating point format for 8 bit PICs and written a bunch of routines to manipulate them. Microcontrollers are usually dealing with real world values with 10 or 12 bits precision at most. My floating point format uses 16 bits of precision, which is good enough for several intermediate calculations.

I also have a 32-bit format for the 16 bit PICs. This uses one 16-bit word for the mantissa, which speeds calculations since these PICs can operate on 16 bits at a time.

These routines are included in my PIC Development Tools release. After installation, look at files with "fp24" in their name in the SOURCE > PIC directory, and "fp32f" in the SOURCE > DSPIC directory.