Understanding IEEE 754 Floating-Point Numbers and also exploring how 0.1+0.2 is 0.30000000000000004 step-by-step
While working on an open-source calculator repository during Hacktoberfest, I noticed that certain decimal calculations were not producing the expected results, like 0.6+0.3 will not result in 0.9 and was wondering if there was an issue with the code. However, upon further analysis discovered that it was the actual behavior of JavaScript. And delved deep into that to understand its internal workings.
In this blog post, I will share my insights and discuss a few approaches to solve this.
In everyday math, we know adding 0.6 + 0.3
equals 0.9
, right? But when we turn to computers it results in 0.8999999999999999
. Surprisingly, this doesn’t just happen only in JavaScript; it’s the same in many programming languages like Python, Java, C too. Also, it’s not just about this specific calculation. There are many more decimal calculations showing similar not-quite-right answers.
Why does this happen?
It’s all about how computers handle floating-point numbers. Decimal numbers are stored in the computer’s memory using a format called IEEE 754 standard. IEEE Standard 754 floating point is the most common representation today for real numbers on computers. This standard includes different types of representation, mainly Single precision(32 bits) and Double precision(64 bits). JavaScript follows IEEE 754 Double Precision Floating Point Standard.
IEEE 754 Double Precision Floating Point Format
Double precision is composed of 64bits including 1 sign bit, 11 exponent bits , and 52 mantissa (fractional part) bits.
Any decimal number given is stored in this double precision IEEE 754 binary floating point format only. Finite 64-bit representations in computer systems can’t accurately express all decimal values, especially those with infinite decimal expansions, leading to minor discrepancies in results when working with certain numbers in binary.
Let’s understand with an example how decimal numbers are stored and unveil why 0.6+0.3 equals 0.8999999999999999
Represent 0.6
in IEEE 754 Double Precision Floating Point Format.
# Step 1 : Converting the decimal (0.6)₁₀ to its binary representation base 2
Integer Part: 0/2 = 0
Fractional Part:
Repeatedly multiply by 2, noting each integer part of the results, until the fractional part reaches zero.
0.1 cannot be exactly represented as a binary fraction. The highlighted part recurs endlessly, forming an infinite sequence. Also, we didn’t get any fractional part that is zero.
# Step 2: Normalization
A number x, written in scientific notation is normalized if it has a single non-zero digit to the left of the decimal point. i.e. Force the integer part of its mantissa to be exactly 1.
we adjust our sequence based on the IEEE standard requirements, including a finite number of 52-bit mantissa and rounding. That’s why round-off error occurs.
#Step 3: Adjust the Exponent:
For double-precision, exponents in the range -1022 to +1023 are biased by adding 1023 to get a value.
exponent => -1 + 1023 => 1022
Represent the value in 11 bit binary.
(1022)₁₀ => (010111111110)₂
Sign bit is 0
as 0.6
is a positive number. (-1)⁰=> 1
Now we have all the values to represent in IEEE 754 Floating point format.
In the process of normalizing the mantissa, the leading (the leftmost) bit 1 is removed it’s always 1 and adjust its length to 52 bits, only if necessary (not the case here).
Representing 0.6 in 64bit IEEE format.
Similarly, using the same process, 0.3 is represented as:
# Adding the two values
1. Equalizing the exponent
As we have the value of 0.6
and 0.3
we have to add them. But before doing, ensure that the exponents are same. In this case, they are not equal. Therefore, we need to adjust them by matching the smaller exponent with the larger value.
Exponent of 0.6=> -1
Exponent of 0.3 => -2
, we have to match the 0.3
with 0.6
as exponent of 0.6 is greater than 0.3 .
Here the difference is 1, so the mantissa of 0.3 needs to be shifted to the right by 1 bit, and the exponent code is increased by 1 to match 0.6
Shifting the mantissa by 1 bit will cause the least significant bit to be lost to maintain the 64-bit standard, which may introduce precision errors.
Value of 0.3 after equalizing the exponent
2. Add the mantissa
As the exponents are equal now, we need to perform binary addition on the mantissas.
Now the value will be, 0; 01111111110; 1.1100110011001100110011001100110011001100110011001100
3. Normalizing the resulting mantissa and rounding
In this case, the mantissa it already normalized [has 1 as leading bit], so this step is skipped.
And finally, the result of 0.6+0.3
is represented as
Sum of 0.6+0.3 in 64bit IEEE Floating Point Representation
So, now we have the result of 0.6 + 0.3
which is represented in 64bit IEEE format, and is machine-readable. We have to convert this back to decimal for human readability.
#Converting IEEE 754 Floating Point Representation to its Decimal Equivalent
Determining Sign Bit: 0 => (-1)⁰ => +1
Calculate unbiased exponent:
(01111111110)₂ => (1022)₁₀
2^(e-1023) => 2^(1022-1023) => 2^-1
Fraction Part:
Summation the values of each bit position, starting from the leftmost bit of mantissa, and multiplying them by powers of 2. 1×2^-1 + 1×2⁻^-2 + 0×2^-3 + ....... + 0×2^-52
By substituting the values, and solving the equation we get the result as 0.8999999999999999, which is displayed on the console. [Rounding]
=> +1 (1+ 1×2^-1 + 1×2⁻^-2 + 0×2^-3 + ....... + 0×2^-52) x 2^-1
=> +1 (1 + 0.79999999999999982236431605997495353221893310546875) x 2^-1
=> 0.899999999999999911182158029987476766109466552734375
≈ 0.8999999999999999 //numbers are rounded
//Because floating-point numbers have a limited number of digits,
//they cannot represent all real numbers accurately: when there
//are more digits, the leftover ones are omitted,i.e. the number is rounded
Let’s explore one more example to gain a deeper understanding of the well-known expression 0.1 + 0.2 = 0.30000000000000004.
Adding the 64bit IEEE 754 binary floating point value of 0.1 & 0.2
Converting the result back to decimal
How to Solve That?
Let’s see how to achieve accurate results when working with applications that handle currency or financial calculations, where precision is crucial.
i) Inbuilt Function: toFixed() and toPrecision()
-
toFixed()
converts a number into a string and rounds the string to the specified number of decimals. -
toPrecision()
format a number to a specific precision or length and will add trailing zeros to fulfill the specified precision if required. parseFloat() is to remove trailing zeros from a number.
const num1 = 0.6;
const num2 = 0.3;
const result = num1 + num2;
const toFixed = result.toFixed(1);
const toPrecision = parseFloat(result.toPrecision(12));
console.log("Using toFixed(): " + toFixed); // Output: 0.9
console.log("Using toPrecision(): " + toPrecision); // Output: 0.9
Limitation
toFixed() always rounds the number to the given decimal place, which might not align the same in all cases. toPrecision() is also similar, and it may not produce accurate results for very small or very large numbers, as its arguments should be between 1–100.
//1. Adding 0.03 and 0.255 => expected 0.283
console.log((0.03 + 0.253).toFixed(1)) // returns 0.3
//2. Values are added as a string
(0.1).toPrecision()+(0.2).toPrecision() // returns 0.10.2
ii) Third-Party Libraries
There are various libraries like math.js, decimal.js, big.js that solve the problem. Each library functions according to its documentation. This approach is comparatively better.
//Example using big.js
const Big = require('big.js');
Big.PE = 1e6; // Set positive exponent for maximum precision in Big.js
console.log(new Big(0.1).plus(new Big(0.2)).toString()); //0.3
console.log(new Big(0.6).plus(new Big(0.3)).toString()); //0.9
console.log(new Big(0.03).plus(new Big(0.253)).toString()); //0.283
console.log(new Big(0.1).times(new Big(0.4)).toString()); //0.04
Conclusion
The IEEE 754 standard for storing decimal numbers can lead to minor discrepancies. Various libraries can be used to achieve more precise results. Choose the appropriate approach based on application requirements. Equivalent packages exist in other languages like BigDecimal for Java and Decimal for python.
References:
What is Double-precision floating-point format
Convert decimal to IEEE 64 Bit Double Precision
Convert IEEE 64 Bit Double Precision to Decimal
Full Precision Calculator
https://zhuanlan.zhihu.com/
https://0.30000000000000004.com/
Thanks for reading. Feel free to comment your opinion😊. Hope this post was helpful. You can hit me up on Linked In . Few other posts to check here.
- First Attempt at Open Source Contribution: Hacktoberfest'23 Journey
- How to Hide the Source Code in React from Dev Tools [3 different ways]
- How To Create and Set Up a React Project with Create React App in Windows?
Top comments (17)
Great explanation of IEEE754.
However for all precision-sensitive calculations I recommend Raku, because it automatically does fraction rationals. So if you write
0.12824
it will be represented as1603/12500
internally:Fraction calculations will always be performed using integer numerator and denominator, therefore will always be precise:
As a bonus it also handles big nums out of the box. Worth trying.
Interesting! Thanks for adding
Great article and solution for an issue not a lot of modern programmers know about (especially problematic when dealing with finances). Just one thing - I thought it is worth mentioning that
.toFixed()
in Javascript is a purely cosmetic method, it should only be used for display purposes since (like you mentioned) it returns the string representation of the decimal.Thanks for your comment! You're correct that
.toFixed()
is like a cosmetic method as it only affects the way the number is displayed, not its actual value.I didn't know.toPrecision before. Thanks for sharing, you wrote a great article there
Thanks! Glad this helped you discover toPrecision().
Well done, friend.
Thanks, friend!
like
🙌🏻
That was so awesome :)
Thank you for your feedback.
You can enable syntax highlighting in Dev.to using
triple_backtick javascript
code
triple_backtick
Yeah, Sure🙌🏻
One of the best in-depth technical articles I've read so far. I'm sure not many JS enthusiasts will read this or even care about the intricacies of it So I'm lucky to have read it!
Is it just JavaScript or other languages also use the same IEEE 754 standard?
Glad to hear that!
The IEEE 754 standard is not exclusive to JavaScript; Many programming languages like Java, Python etc., use this standard for representing floating-point numbers.
Surprising! I tried this exact same expression on Java, Rust, Python and all of them yielded the same results.
I suppose they all have their own libraries to deal with this inconsistency.
Also, do you know that JavaScript loses number precision after like 18 digits? Could this be the same reason? Or are there different causes for that?