Why Floating Point Numbers are so Weird

#float #ieee #double #javascript

If you've written any JavaScript before (which uses floating point numbers internally), or you've dealt with double or single precision floats in other languages then you've probably come across some version of this:

return (0.1 + 0.2 == 0.3); // Returns FALSE !!! ... and the walls in your office float away as the laws of mathematics begin to crumble

Or, maybe, you've done some addition or subtraction on a couple of reasonable-looking numbers (with one or two decimal places), then printed the result to screen and been met with something like 10.66666666666669 when you were expecting a far more reasonable 10.7.

If you haven't gone through the whole university shebang and had floats explained from top to bottom, then you may have had a "WTF" moment or two. Here's a bit of a rundown on what is going on ...

What the floating in "floating point" means

In short, floating-point numbers are stored in memory using a form of scientific notation, which allows for a limited number of "significant digits" and a limited "scale". Scientific notation looks like this (remember back to high-school):

1,200,000,000,000,000,000,000 = 1.2 x 10^21

There are two significant digits in that number (1, and 2), which form the "mantissa" (or the "meat" of the number). All the zeros after the "12" are created by the exponent on base-10, which just moves the decimal point some number of places to the right. The exponent can add a lot of zeros (for a very low storage-cost), but it can't hold any "meat".

A negative exponent can be used to shift the decimal point to the left and make a really tiny number.

0.000,000,000,000,000,000,001,2 = 1.2 x 10^-21

It's all about the precision

Imagine that we have a data type that can accept 2 significant (decimal) digits and allows (decimal) exponents up to +/-21. The two example numbers above would be getting near to the largest, and the smallest, that I could represent with that data type (the largest and smallest would actually be 9.9x10^21 and 0.1x10^-21 respectively).

Following on from that, if I tried to hold the number 1,210,000,000,000,000,000,000 with this mythical 2-digit-precision floating-point data type, then I would be S.O.L as they say, and it would end up as 1,200,000,000,000,000,000,000, since my two-digit precision doesn't allow for 1.21 x 10^21 (that's three significant digits, or a digit-too-far).

This is one source of so-called "loss of precision" errors with floating point numbers.

Recurring Fractions

The other source of of lost precision (which accounts for the 0.1 + 0.2 != 0.3 hilarity) is due to what can and can't be precisely represented by a base-2 number system.

It's the same problem that the decimal number system has with numbers such as one-third (0.33333333333333333333333... anyone?).

Computers don't store numbers as decimal, so everything that goes on inside a floating-point number in a computer is stored using a base-2 number system.

Just replace all the x10^n references in the examples above with x2^n and you may start to see how some decimal (base-10) numbers fit well, while others just don't play nice. 0.1 might be a nice easy number for you or I to work with (being decimal creatures), but to a two-fingered binary bean-counter it's as awkward as 1/3 or 3/7 are in decimal.

A bit of wordy fun to illustrate

The Problem: Recurring Fractions

To recreate that (binary) 0.1 + 0.2 != 0.3 problem in decimal, let's say we write a program for some mythical decimal-based computer, using a numeric data type that can store 4 significant decimal digits. Now let's try to get that program to figure out if 1/3 + 2/3 equals 1.

Here we go:

Statement: Store this number: 1/3rd — for this example we're going to say that the human operator doesn't understand the decimal system and deals only in fractions. The decimal system is for deci-puters: real men use fractions!
Action: Stores .3333 — this is the kind of thing that happens when you declare a number in your code using decimal digits, or you take decimal user input and it gets placed into memory as a binary floating point number
Statement: Store this number: 2/3rds
Action Stores .6666
Statement: Add those two numbers together
Action: Calculates .9999

Now lets' try to get some sense out of what we've put in:

Question: Does the total (.9999) equal 1.000?**
Answer: Hell no! (false)
Programmer: Tears out a few hairs and says out loud "WTF? 1/3 plus 2/3 definitely equals 1! This deci-puter is on crack!"

The Solution

The way around this lack of precision is to stop trying to precisely compare something that can't (and shouldn't) be precisely compared. Instead, we must decide how close we need two things to be in order for us to consider them "equal" for our purpose.

Here's the correct workaround in deci-puter pseudo-speak:

Question: Is .9999 close_enough to 1.000?
Error: Undefined Constant: WTF? What have you been smoking? How close is close_enough?

Oops! Let's try again:

Statement: close_enough (my chosen tolerance) is plus-or-minus .1000
Question: Is .9999 close_enough to 1.000?
Answer: Yes (true) — the difference between .9999 and 1.000 is .0001: that's really damned close, which is closer than close_enough

And so you can see, if thirds were really important to people (as a species), then we'd probably be using a base-3 or a base-9 number system, because dealing with them in decimal (and binary) is inconvenient!

Also, because these are recurring fractions, it doesn't matter whether we can hold 4 significant digits or 4,000 significant digits: 1/3 + 2/3 will never precisely equal 1 when fed into our "deci-puter". We'll always need to allow some tolerance, and the built-in equality operator will always (accurately) reflect the fact that (0.3333... + 0.6666... != 1).

Extending our Example to other floating-point quirks

If you were super-observant, you might have noticed that - in the previous example - there were only three decimal places in the 1.000 number, yet there were four in the .9999 number. Our pretend "decimal-system storage type" here only supports 4 significant digits, so we can't know what might be in the fourth decimal place if we also try to store a digit in the "ones" place.

You can probably imagine some of the issues you might have with this pretend 4-digit floating point type if you try to compare 4,123,134 with 4,123,000. There are only 4 significant digits available to us, so these two numbers will become 4.123 x 10^3 and 4.123 x 10^3 respectively — the same number!

If you start trying to store large integers in a double-precision float type then at some point (above 9,007,199,254,740,991) you'll start to run into this problem. It kicks in with a much smaller number for single-precision floats.

Similarly you'll hit problems if you try to work with numbers at very different scales (try subtracting .0001 from 4356 using our pretend 4-significant-digit data type!).

So, now you know the reasons why, you're not necessarily stuck with the only options being to do or die: there are workarounds!

Another article in this series deals with how to choose a sensible tolerance for comparing floating-point numbers in your program (and also when it's best to avoid them altogether).

Although it's written with JavaScript in mind, the same guidelines apply to all languages with a floating point type.

How to compare numbers correctly in JavaScript